Understanding EPiServer CMS' API is key to any non-trivial EPiServer project. When it comes to performance understanding how different methods are cached, or not cached, is essential.
One of the things that I think have made EPiServer CMS such a successful product is it’s rather simple yet powerful caching strategy. Thanks to it’s caching mechanisms we as developers haven’t had to think too much about performance when using some of the standard API methods for retrieving PageData objects or when using many of the webcontrols shipped with EPiServer CMS. There are however situations, such as sites with high traffic loads or with a lot of content, where it’s vital to know how and what EPiServer caches.
In this article I intend to explain some of the most important things that we as developers should know about caching with regards to the methods used to retrieve PageData objects in EPiServer’s DataFactory class.
How DataFactory’s methods are cached
The DataFactory.GetPage() method which requires (amongst other things) a PageReference as a parameter is cached if the PageReference’s WorkID is zero, that is if no specific version is specified, or it’s IsAnyVersion() method returns false. In other words, GetPage() is usually cached unless we are working with a specific version, something that we normally aren’t if we are building functionality for the public region of a site.
When DataFactory.GetChilden() is invoked it will check if a collection of PageReferences, references to the child pages, exists in the cache. If it doesn’t it will retrieve it from the database. It will then proceed to fetch each of the child pages and add them to a PageDataCollection which will finally be returned. It retrieves the PageData objects in pretty much the same way as GetPage(), using the same cache key for each page as GetPage() would. This means that if a page is updated but not moved the result of GetChildren() for it’s parent page doesn’t have to be cleared from the cache as it’s only the references that has been cached, not the actual PageData objects.
GetDescendents() is not cached when used with the standard page provider but uses a pretty optimized SQL query. It isn’t cached for custom page providers either in EPiServer CMS 5 and for them the standard implementation uses recursive calls to GetChildrenReferences(). That is it may result in several database queries for a custom page provider. This is likely to be fixed in EPiServer CMS 6 so that it uses cached results from GetChildrenReferences() instead.
FindPagesWithCriteria() and FindAllPagesWithCriteria()
DataFactory.FindPagesWithCriteria() and it’s sibling FindAllPagesWithCriteria() that doesn’t filter the result based on access rights works the same way as GetChildren() in the sense that it first retrieves a collection of PageReferences and then fetches each page using GetPage().
There is however one very important difference between FindPagesWithCriteria() and GetChildren(). While each individual PageData object it returns will be cached (as it uses GetPage()) the result of the search, the references to the pages, will not be cached.
Implementing caching for a method that does filtering based on an arbitrary set of criteria is very tricky as it is next to impossible to know when the cache should be invalidated and EPiServer has chosen not to implement it, partly because you can do very much of the same things that FindPagesWithCriteria() can using the other, cached, API methods and in memory filtering.
An interesting note is that if you are building a custom page provider you can implement caching of your own implementation of FindAllPagesWithCriteria(). You will however then face the same problems as EPiServer, knowing when to invalidate the cache, but it is possible. Also, note that if you do do this you should not cache the resulting PageDataCollection but a collection of references to the PageData objects.
Based on the above overview of the different methods in DataFactory we can draw a few important conclusions.
One such conclusion is that it’s a best practice to use GetChildren() often and use FindPagesWithCriteria() very sparsely. With that said FindPagesWithCriteria() can be very useful at times. Let’s say that you for instance want to find all pages of a specific page type and they are scattered all over a site with hundreds of thousands of pages. To do that using GetChildren we’d have to retrieve every single page from the database and filter them in memory while FindPagesWithCriteria() would do the same thing using a fairly efficient database query. Still, you should not have to do that kind of filtering very often and if you find yourself using FindPagesWithCriteria() often or in central places such as the start page or in a user control that is used all over the site you’ve got a potential performance problem.
Another conclusion is that a single PageData object is only cached once. This is similar to how caching works in other great frameworks (such as NHibernate) and it works very well for a CMS where some of the data is likely to be accessed very often in different context’s (when displaying the page, when rendering a menu etc).
Based on the previous conclusion we can also draw a third conclusion. If you implement your own caching of queries, no matter if they are made with FindPagesWithCriteria() or GetChildren() and in-memory filtering you should never cache the PageData objects but instead cache their corresponding PageReferences.
A fourth and final conclusion is that pages aren’t cached in edit mode as we are then working with and viewing specific versions of pages.
The inner workings – classes involved in caching
While this article isn’t really about how EPiSever’s caching functionality is implemented internally it is an interesting topic and I thought I’d at least give you a quick overview. When we invoke API methods whose results are cached four classes that each play a different role are involved.
- The DataFactory class exposes public methods for retrieving PageData objects and fire events during different stages of their execution.
- The DataFactoryCache class listens to events from DataFactory and invalidates caches when the need arises. It is also used to generate cache keys.
- The PageProviderBase class is the abstract super class for EPiServer CMS’s default PageProvider, LocalPageProvider, as well as any custom page providers we might have created for a site with an enterprise license. It checks whether a requested object exists in the cache. If it does it returns the cached object, otherwise it will fetch it from the database (or whatever storage the concrete implementation of the PageProvider uses). Also, if it has retrieved a non-cached object it will add it to the cache.
- The CacheManager class is basically a wrapper for the HttpRuntime.Cache but with one vital addition, it’s Remove() method will raise a remote event instructing other web servers in a load balanced setup to remove the object with the specific cache key from their caches.
There are other players involved as well, such as the OptimisticCache class, but that’s a bit to deep into the inner workings of the API to be covered here.
I hope you found this article useful. If I’ve left any questions unanswered or written something that isn’t correct, don’t hesitate to post a comment!
Finally I would like to emphasize a few points:
- Use FindPagesWithCriteria() very sparsely and be aware of that it will always result in at the very least one database query.
- While you should always profile your application you can trust EPiServer to give you good performance thanks to it’s caching mechanisms as long as you use GetPage() and GetChildren() to retrieve PageData objects and as long as the amount of pages isn’t huge under a specific node (parent page) in the site’s structure.
- If the amount of pages is huge under a specific node, you should consider splitting them up somehow so that you can use the cached API methods without filling up the cache with very sporadically needed data.
- Don’t cache PageData objects. If you implement custom caching, cache PageReferences.
In the post Cache objects in EPiServer with page dependencies by Ted Nyberg you can learn more about working with cache dependencies and why you should use EPiServer’s CacheManager class instead of the HttpRuntime.Cache.
While this post dealt with object caching you should also be aware of how you can use output caching with EPiServer CMS, a topic I address in the post The EPiServer CMS Output Cache Explained.