How EPiServer CMS caches PageData objects

One of the things that I think have made EPiServer CMS such a successful product is it’s rather simple yet powerful caching strategy. Thanks to it’s caching mechanisms we as developers haven’t had to think too much about performance when using some of the standard API methods for retrieving PageData objects or when using many of the webcontrols shipped with EPiServer CMS. There are however situations, such as sites with high traffic loads or with a lot of content, where it’s vital to know how and what EPiServer caches.

In this article I intend to explain some of the most important things that we as developers should know about caching with regards to the methods used to retrieve PageData objects in EPiServer’s DataFactory class.

How DataFactory’s methods are cached

GetPage()

The DataFactory.GetPage() method which requires (amongst other things) a PageReference as a parameter is cached if the PageReference’s WorkID is zero, that is if no specific version is specified, or it’s IsAnyVersion() method returns false. In other words, GetPage() is usually cached unless we are working with a specific version, something that we normally aren’t if we are building functionality for the public region of a site.

GetPage() caches PageData objects with sliding expiration for 12 hours as a default but you can modify this with the pageCacheSlidingExpiration attribute in EPiServers configuration in web.config.

GetChildren()

When DataFactory.GetChilden() is invoked it will check if a collection of PageReferences, references to the child pages, exists in the cache. If it doesn’t it will retrieve it from the database. It will then proceed to fetch each of the child pages and add them to a PageDataCollection which will finally be returned. It retrieves the PageData objects in pretty much the same way as GetPage(), using the same cache key for each page as GetPage() would. This means that if a page is updated but not moved the result of GetChildren() for it’s parent page doesn’t have to be cleared from the cache as it’s only the references that has been cached, not the actual PageData objects.

GetDescendents()

GetDescendents() is not cached when used with the standard page provider but uses a pretty optimized SQL query. It isn’t cached for custom page providers either in EPiServer CMS 5 and for them the standard implementation uses recursive calls to GetChildrenReferences(). That is it may result in several database queries for a custom page provider. This is likely to be fixed in EPiServer CMS 6 so that it uses cached results from GetChildrenReferences() instead.

FindPagesWithCriteria() and FindAllPagesWithCriteria()

DataFactory.FindPagesWithCriteria() and it’s sibling FindAllPagesWithCriteria() that doesn’t filter the result based on access rights works the same way as GetChildren() in the sense that it first retrieves a collection of PageReferences and then fetches each page using GetPage().

There is however one very important difference between FindPagesWithCriteria() and GetChildren(). While each individual PageData object it returns will be cached (as it uses GetPage()) the result of the search, the references to the pages, will not be cached.

Implementing caching for a method that does filtering based on an arbitrary set of criteria is very tricky as it is next to impossible to know when the cache should be invalidated and EPiServer has chosen not to implement it, partly because you can do very much of the same things that FindPagesWithCriteria() can using the other, cached, API methods and in memory filtering.

An interesting note is that if you are building a custom page provider you can implement caching of your own implementation of FindAllPagesWithCriteria(). You will however then face the same problems as EPiServer, knowing when to invalidate the cache, but it is possible. Also, note that if you do do this you should not cache the resulting PageDataCollection but a collection of references to the PageData objects.

Important conclusions

Based on the above overview of the different methods in DataFactory we can draw a few important conclusions.

One such conclusion is that it’s a best practice to use GetChildren() often and use FindPagesWithCriteria() very sparsely. With that said FindPagesWithCriteria() can be very useful at times. Let’s say that you for instance want to find all pages of a specific page type and they are scattered all over a site with hundreds of thousands of pages. To do that using GetChildren we’d have to retrieve every single page from the database and filter them in memory while FindPagesWithCriteria() would do the same thing using a fairly efficient database query. Still, you should not have to do that kind of filtering very often and if you find yourself using FindPagesWithCriteria() often or in central places such as the start page or in a user control that is used all over the site you’ve got a potential performance problem.

Another conclusion is that a single PageData object is only cached once. This is similar to how caching works in other great frameworks (such as NHibernate) and it works very well for a CMS where some of the data is likely to be accessed very often in different context’s (when displaying the page, when rendering a menu etc).

Based on the previous conclusion we can also draw a third conclusion. If you implement your own caching of queries, no matter if they are made with FindPagesWithCriteria() or GetChildren() and in-memory filtering you should never cache the PageData objects but instead cache their corresponding PageReferences.

A fourth and final conclusion is that pages aren’t cached in edit mode as we are then working with and viewing specific versions of pages.

The inner workings – classes involved in caching

While this article isn’t really about how EPiSever’s caching functionality is implemented internally it is an interesting topic and I thought I’d at least give you a quick overview. When we invoke API methods whose results are cached four classes that each play a different role are involved.

  • The DataFactory class exposes public methods for retrieving PageData objects and fire events during different stages of their execution.
  • The DataFactoryCache class listens to events from DataFactory and invalidates caches when the need arises. It is also used to generate cache keys.
  • The PageProviderBase class is the abstract super class for EPiServer CMS’s default PageProvider, LocalPageProvider, as well as any custom page providers we might have created for a site with an enterprise license. It checks whether a requested object exists in the cache. If it does it returns the cached object, otherwise it will fetch it from the database (or whatever storage the concrete implementation of the PageProvider uses). Also, if it has retrieved a non-cached object it will add it to the cache.
  • The CacheManager class is basically a wrapper for the HttpRuntime.Cache but with one vital addition, it’s Remove() method will raise a remote event instructing other web servers in a load balanced setup to remove the object with the specific cache key from their caches.

There are other players involved as well, such as the OptimisticCache class, but that’s a bit to deep into the inner workings of the API to be covered here.

Final words

I hope you found this article useful. If I’ve left any questions unanswered or written something that isn’t correct, don’t hesitate to post a comment!

Finally I would like to emphasize a few points:

  • Use FindPagesWithCriteria() very sparsely and be aware of that it will always result in at the very least one database query.
  • While you should always profile your application you can trust EPiServer to give you good performance thanks to it’s caching mechanisms as long as you use GetPage() and GetChildren() to retrieve PageData objects and as long as the amount of pages isn’t huge under a specific node (parent page) in the site’s structure.
  • If the amount of pages is huge under a specific node, you should consider splitting them up somehow so that you can use the cached API methods without filling up the cache with very sporadically needed data.
  • Don’t cache PageData objects. If you implement custom caching, cache PageReferences.

Comments

  1. Emil's avatar

    Emil 1 months ago

    Good article! Always nice to learn more on the internals of EPiServer CMS. Thanks!

  2. Frederik Vig's avatar

    Frederik Vig 1 months ago

    Great article! Would be interesting to see how the dynamic data stores uses caching.. :)

  3. Johan Pettersson's avatar

    Johan Pettersson 1 months ago

    Note to self; remember to cache pagereferences!

  4. Stuart Blackburn's avatar

    Stuart Blackburn 1 months ago

    Thanks for this Joel. I suppose when you say "If the amount of pages is huge under a specific node, you should consider splitting them up somehow", you could use the archiving feature to move pages to a new archive container page and therefore reduce the number of child pages under the original parent page. However, this can result in 404's from search engine results unless a custom handler is developed. What do you suggest? Multiple container pages?? I'm interested in your thoughts as I'm developing a site currently that potentially will have thousands of pages.

  5. Joel Abrahamsson's avatar

    Joel Abrahamsson 1 months ago

    Hi Stuart!

    I guess it depends on what type of pages it is but multiple container pages, like you suggest, has worked well for me in a few projects. That is, using a sort of archive structure where you group the pages by month and date from the get-go might be a good idea for news and articles.
    If one the other hand it's a member registry you might group them by the first letter in the name or the first number in the membership number. In both cases you can easily traverse the tree to find pages and especially in the first case the URL:s will look pretty good as well (/news/2010/01/01/zlatan-scores-again/).

    Then of course if you have several hundred thousands, or even millions of pages that might not be enough either. There are of course ways to handle those scenarios as well, sometimes using custom page providers, tricks with URL-rewriting etc.

    If you like I'd be happy to discuss your specific scenario in more detail. E-mail me if you don't want to share the details with the world ;)

  6. adeel arshad's avatar

    adeel arshad 16 days ago

    Hi Joel,

    would you happen to know how often Episerver refreshes its cache? we make changes to pages on our newly launched website and find that the search results still show the old content if a change was made after the page was created. It would be great to know the frequency.

    regards

    Adeel

  7. Joel Abrahamsson's avatar

    Joel Abrahamsson 16 days ago

    Hi Adeel!

    It depends on the pageCacheSlidingExpiration in the site section in web.config. I think the standard value is 12 hours. However, since it uses sliding expiration it can in reality be way less than that.
    It should however clear the cache for you when you save pages. Do you save them in some special way?

  8. Adeel's avatar

    Adeel 16 days ago

    Hi,

    no theyre saved as normal, save and publish, and then previewed in a separate window. Weve had some issues with our firewall so maybe thats working against us. Thanks for the hints on the pageCacheSlidingExpiration .. good to know where I can find it so I can have a look at what else is in there.

    regards

    Adeel

  9. Rajesh Shelar's avatar

    Rajesh Shelar 12 days ago

    Hi Joel,
    What you have mentioned is DATA caching and we can also configure our website for OUTPUT caching with the help of httpCacheExpiration attribute.
    So can you please give any specific reason why i would be using OUTPUT caching when DATA caching is already in place and the purpose of using them simultaneously

  10. Joel Abrahamsson's avatar

    Joel Abrahamsson 12 days ago

    Hi Rajesh!

    As you say, given that you use only GetPage and GetChildren, the PageData objects are cached using data caching, or as I usually call it, object caching. What's left for the server to do then when processing a request is to render the content using the cached objects and your user controls and ASPX-pages. Normally these in-memory operations are so fast that you have very little to gain from output caching.

    There are however a few cases where it can be useful, such as when you A LOT of content on each page (URL) and the server has to render hundreds or thousands of user controls.

    One example of what I mean is the front page for newspapers which often contains a lot of teasers.

    For instance if I where to build http://aftonbladet.se/ (a big tabloid paper in Sweden) where I can count at least 150 teasers I would definitely output cache the whole page as well each individual user control as rendering that page, even when all of the content objects are cached will put some strain on the web server.
    Actually I would probably use a HTTP accelerator like Varnish on top of that, but you get my point :)

    Another example is of course sites that have extreme traffic loads, even if they don't have that much content.

Add a comment

Allowed tags: <b>, <em>, <quote cite="">, <code>, <c-sharp-code>, <css-code>, <sql-code>, <xml-code>, <javascript-code>. If you want to display code examples, please remember to write &lt; for < and &gt; for >.

Follow me on Twitter

  1. Blogged: Automated Testing of EPiServer CMS Sites - Slides http://bit.ly/cGMITN 4 hours ago
  2. At Oslo airport. Been awake since 4 am. Feeling sleepy. 15 hours ago
  3. Doing a 3 hour presentation in 45 minutes not a kick ass idea 17 hours ago
follow me

Latest comments

  1. Joel Abrahamsson wrote "Michiel, I'm sorry to hear that! It's hard to say what coul..." on Twitter style paging with ASP.NET MVC and jQuery
  2. michiel wrote "For love or money I just can't get this to work correctly. ..." on Twitter style paging with ASP.NET MVC and jQuery
  3. Marius wrote "Well, that I can understand. Manual labor it is :)" on How To Disable Updating of Page Types When Using Page Type Builder

About this site

This blog is built with EPiServer Community, EPiServer CMS, ASP.NET MVC and a bunch of other great products. The source code is available for download at the projects page, where you also can read more about this site and my other projects.

read more