One of the things that I think have made EPiServer CMS such a successful product is it’s rather simple yet powerful caching strategy. Thanks to it’s caching mechanisms we as developers haven’t had to think too much about performance when using some of the standard API methods for retrieving PageData objects or when using many of the webcontrols shipped with EPiServer CMS. There are however situations, such as sites with high traffic loads or with a lot of content, where it’s vital to know how and what EPiServer caches.
In this article I intend to explain some of the most important things that we as developers should know about caching with regards to the methods used to retrieve PageData objects in EPiServer’s DataFactory class.
The DataFactory.GetPage() method which requires (amongst other things) a PageReference as a parameter is cached if the PageReference’s WorkID is zero, that is if no specific version is specified, or it’s IsAnyVersion() method returns false. In other words, GetPage() is usually cached unless we are working with a specific version, something that we normally aren’t if we are building functionality for the public region of a site.
GetPage() caches PageData objects with sliding expiration for 12 hours as a default but you can modify this with the pageCacheSlidingExpiration attribute in EPiServers configuration in web.config.
When DataFactory.GetChilden() is invoked it will check if a collection of PageReferences, references to the child pages, exists in the cache. If it doesn’t it will retrieve it from the database. It will then proceed to fetch each of the child pages and add them to a PageDataCollection which will finally be returned. It retrieves the PageData objects in pretty much the same way as GetPage(), using the same cache key for each page as GetPage() would. This means that if a page is updated but not moved the result of GetChildren() for it’s parent page doesn’t have to be cleared from the cache as it’s only the references that has been cached, not the actual PageData objects.
GetDescendents() is not cached when used with the standard page provider but uses a pretty optimized SQL query. It isn’t cached for custom page providers either in EPiServer CMS 5 and for them the standard implementation uses recursive calls to GetChildrenReferences(). That is it may result in several database queries for a custom page provider. This is likely to be fixed in EPiServer CMS 6 so that it uses cached results from GetChildrenReferences() instead.
DataFactory.FindPagesWithCriteria() and it’s sibling FindAllPagesWithCriteria() that doesn’t filter the result based on access rights works the same way as GetChildren() in the sense that it first retrieves a collection of PageReferences and then fetches each page using GetPage().
There is however one very important difference between FindPagesWithCriteria() and GetChildren(). While each individual PageData object it returns will be cached (as it uses GetPage()) the result of the search, the references to the pages, will not be cached.
Implementing caching for a method that does filtering based on an arbitrary set of criteria is very tricky as it is next to impossible to know when the cache should be invalidated and EPiServer has chosen not to implement it, partly because you can do very much of the same things that FindPagesWithCriteria() can using the other, cached, API methods and in memory filtering.
An interesting note is that if you are building a custom page provider you can implement caching of your own implementation of FindAllPagesWithCriteria(). You will however then face the same problems as EPiServer, knowing when to invalidate the cache, but it is possible. Also, note that if you do do this you should not cache the resulting PageDataCollection but a collection of references to the PageData objects.
Based on the above overview of the different methods in DataFactory we can draw a few important conclusions.
One such conclusion is that it’s a best practice to use GetChildren() often and use FindPagesWithCriteria() very sparsely. With that said FindPagesWithCriteria() can be very useful at times. Let’s say that you for instance want to find all pages of a specific page type and they are scattered all over a site with hundreds of thousands of pages. To do that using GetChildren we’d have to retrieve every single page from the database and filter them in memory while FindPagesWithCriteria() would do the same thing using a fairly efficient database query. Still, you should not have to do that kind of filtering very often and if you find yourself using FindPagesWithCriteria() often or in central places such as the start page or in a user control that is used all over the site you’ve got a potential performance problem.
Another conclusion is that a single PageData object is only cached once. This is similar to how caching works in other great frameworks (such as NHibernate) and it works very well for a CMS where some of the data is likely to be accessed very often in different context’s (when displaying the page, when rendering a menu etc).
Based on the previous conclusion we can also draw a third conclusion. If you implement your own caching of queries, no matter if they are made with FindPagesWithCriteria() or GetChildren() and in-memory filtering you should never cache the PageData objects but instead cache their corresponding PageReferences.
A fourth and final conclusion is that pages aren’t cached in edit mode as we are then working with and viewing specific versions of pages.
While this article isn’t really about how EPiSever’s caching functionality is implemented internally it is an interesting topic and I thought I’d at least give you a quick overview. When we invoke API methods whose results are cached four classes that each play a different role are involved.
There are other players involved as well, such as the OptimisticCache class, but that’s a bit to deep into the inner workings of the API to be covered here.
I hope you found this article useful. If I’ve left any questions unanswered or written something that isn’t correct, don’t hesitate to post a comment!
Finally I would like to emphasize a few points:
This blog is built with EPiServer Community, EPiServer CMS, ASP.NET MVC and a bunch of other great products. The source code is available for download at the projects page, where you also can read more about this site and my other projects.
read more
Comments
Emil 1 months ago
Good article! Always nice to learn more on the internals of EPiServer CMS. Thanks!
Frederik Vig 1 months ago
Great article! Would be interesting to see how the dynamic data stores uses caching.. :)
Johan Pettersson 1 months ago
Note to self; remember to cache pagereferences!
Stuart Blackburn 1 months ago
Thanks for this Joel. I suppose when you say "If the amount of pages is huge under a specific node, you should consider splitting them up somehow", you could use the archiving feature to move pages to a new archive container page and therefore reduce the number of child pages under the original parent page. However, this can result in 404's from search engine results unless a custom handler is developed. What do you suggest? Multiple container pages?? I'm interested in your thoughts as I'm developing a site currently that potentially will have thousands of pages.
Joel Abrahamsson 1 months ago
Hi Stuart!
I guess it depends on what type of pages it is but multiple container pages, like you suggest, has worked well for me in a few projects. That is, using a sort of archive structure where you group the pages by month and date from the get-go might be a good idea for news and articles.
If one the other hand it's a member registry you might group them by the first letter in the name or the first number in the membership number. In both cases you can easily traverse the tree to find pages and especially in the first case the URL:s will look pretty good as well (/news/2010/01/01/zlatan-scores-again/).
Then of course if you have several hundred thousands, or even millions of pages that might not be enough either. There are of course ways to handle those scenarios as well, sometimes using custom page providers, tricks with URL-rewriting etc.
If you like I'd be happy to discuss your specific scenario in more detail. E-mail me if you don't want to share the details with the world ;)
adeel arshad 16 days ago
Hi Joel,
would you happen to know how often Episerver refreshes its cache? we make changes to pages on our newly launched website and find that the search results still show the old content if a change was made after the page was created. It would be great to know the frequency.
regards
Adeel
Joel Abrahamsson 16 days ago
Hi Adeel!
It depends on the pageCacheSlidingExpiration in the site section in web.config. I think the standard value is 12 hours. However, since it uses sliding expiration it can in reality be way less than that.
It should however clear the cache for you when you save pages. Do you save them in some special way?
Adeel 16 days ago
Hi,
no theyre saved as normal, save and publish, and then previewed in a separate window. Weve had some issues with our firewall so maybe thats working against us. Thanks for the hints on the pageCacheSlidingExpiration .. good to know where I can find it so I can have a look at what else is in there.
regards
Adeel
Rajesh Shelar 12 days ago
Hi Joel,
What you have mentioned is DATA caching and we can also configure our website for OUTPUT caching with the help of httpCacheExpiration attribute.
So can you please give any specific reason why i would be using OUTPUT caching when DATA caching is already in place and the purpose of using them simultaneously
Joel Abrahamsson 12 days ago
Hi Rajesh!
As you say, given that you use only GetPage and GetChildren, the PageData objects are cached using data caching, or as I usually call it, object caching. What's left for the server to do then when processing a request is to render the content using the cached objects and your user controls and ASPX-pages. Normally these in-memory operations are so fast that you have very little to gain from output caching.
There are however a few cases where it can be useful, such as when you A LOT of content on each page (URL) and the server has to render hundreds or thousands of user controls.
One example of what I mean is the front page for newspapers which often contains a lot of teasers.
For instance if I where to build http://aftonbladet.se/ (a big tabloid paper in Sweden) where I can count at least 150 teasers I would definitely output cache the whole page as well each individual user control as rendering that page, even when all of the content objects are cached will put some strain on the web server.
Actually I would probably use a HTTP accelerator like Varnish on top of that, but you get my point :)
Another example is of course sites that have extreme traffic loads, even if they don't have that much content.