In this post I’ll describe a few common patterns when integrating EPiServer CMS sites with external systems with focus on fetching and displaying content. These external systems may be anything from a business system residing on a central mainframe to a simple RSS feed. The executive summary of the post is:
- There are two main patterns with various implementation options: on request fetching and interim storage.
- Fetching and storing data in a local database using a scheduled job helps us build more robust systems than if we fetch data on requests from visitors.
- Storing external data as pages in EPiServer has many benefits.
Integrations, integrations, integrations
Beyond the simplest of sites almost all sites built on EPiServer CMS has some sort of integration with one or more external system. Sometimes these integrations has to do with processing data submitted by visitors, but more common is that data from an external system should be displayed on the website. This data can range from RSS feeds to articles published in a separate content management system that serves many display mediums. Or it can range from basic product information in a catalog management system to large volumes of complex product offerings that resides in a central mainframe.
Over the last five plus years I’ve been involved in many projects that has had to build integrations of various kinds in EPiServer projects. During these I’ve seen, and experimented with, many different solutions. In this post my aim is to provide a short catalog of common ways to build integrations for EPiServer CMS based sites along with comments about what works well and what doesn’t.
On request fetching
The most simplistic of solutions for displaying content from another system when rendering a page on a website is of course to go out and get the data while rendering the page. When a request comes in to a page on our site we make a request to the external system. The request to the external system is executed on the visitors HTTP request thread. Alternatively we can make the request to the external system asynchronously using functionality such as the AddOnPreRenderCompleteAsync method.
A benefit of this approach is that the data is fetched in real-time ensuring that it’s always up to date. In theory it also has the benefit of being quick to implement. In practice however the simplicity of this approach brings a major drawback: coupling. We tie the rendering of pages on our site to the external system. If the external system is unresponsive so will our site be. Even if the external system is guaranteed to never be down and is fast to respond, we add the request/response time of that request to the overall response time of the visitors request to our site.
My general recommendation is to avoid on request fetching of data from external systems at all cost. The strong dependency on the external system means that we actively make that external system a part of our system, making our site unusable both during development and production if it’s unable to connect to the external system.
There are times when we seemingly have no choice but to use on request fetching of data from external systems. For instance, we may have a requirement to display the number of items in stock for a product in real time. In practice however, if we challenge such requirement we may find out that it’s acceptable to the business if the data isn’t entirely real-time. Alternatively we may be able to have the visitors browser fetch the data using a separate AJAX request made either directly to the external system or to a HTTP handler that proxies the request on our webservers. Either way the initial rendering of the page won’t result in an error.
If all other options have been exhausted and we’re forced to implement on request fetching my recommendation is to limit its use to a select few pages on our web site. And one of those pages definitely shouldn’t be the start page or other key pages. It may also be a good idea to implement functionality that allows web site administrators or editors to temporarily disable the functionality that requires fetching of external data.
On request fetching with caching
To avoid the overhead of requests to the external system during processing of requests to our website we may implement a variation of on request fetching where we cache the response from the external system for a short period. This also alleviates hick-ups in the communication with the external system as it’s then allowed to be unresponsive as long as we have an earlier response from it in our cache.
While caching requests to external systems mitigates the drawbacks of on request fetching it doesn’t solve the problem completely. We still have a tight coupling to the external system. There’s also a price to pay in terms of complexity as we have to implement the caching.
As with on request fetching without caching my recommendation is to avoid on request fetching with caching in all but extreme situations where the data that should be fetched needs to be close to real-time and when it’s better that the functionality is broken if the data presented to the user is stale. Over the last few years I’ve only implemented on request fetching, then with caching, once.
To avoid a tight coupling between our web site and external systems we can implement functionality that fetches data from the external system and stores it locally. This eliminates the issues with on request fetching as we have no external dependencies for serving requests from visitors and I usually advocate going down this route 98 times out of 100. Another benefit of storing external data locally is that we make it possible for editors to modify it.
This increased robustness does come at a price though as we need to build functionality that repeatedly fetches changed data from the external system and stores it in some sort of local data store. Unless of course the external system is able to notify us when we need to update it. Unfortunately such systems are currently rare.
Luckily building functionality that is triggered automatically on a regular basis is easy using scheduled jobs in EPiServer. Deciding how to build, and actually building, the functionality that synchronizes the data is slightly trickier and requires some thought.
When integrating with an external system using interim storage one obvious question is where to store fetched data. While there are as many variations available as there are data storage mechanisms they all essentially boil down to either storing the data as pages in the CMS or storing it in a custom database. With the introduction of Dynamic Data Store in EPiServer CMS 6 the latter is often implemented using the DDS.
Using pages as interim storage
Although primarily intended for editorial web page content, pages, or PageData objects, in EPiServer can be used to model pretty much anything. This makes the native PageData storage mechanisms a good candidate for interim storage for data fetched from external systems. By using the native functionality in EPiServer we don’t have to bother with creating database schemas and once a chunk of data has been imported and saved as a page EPiServer takes care of caching, including cache event handling in web farm setups, and access rights for us. We also get an administrative interface for the data for free in the form of EPiServer’s edit mode.
There are caveats with storing external data as pages though. Saving a page in EPiServer is a fairly expensive operation and both the UI and API is optimized for manually created content and we need to be aware of its limitations. The most prominent is perhaps that loading children for a specific page tends to get increasingly slow once it has a couple of hundred children. Therefor we may need to group imported items by date imported, first letter of the items name, category or similar. Another consideration to be ware of is that EPiServer’s output cache is dropped in its entirety whenever a page is saved meaning that if we rely on output caching (which we hopefully don’t) and the data from the external system is updated often we may face performance problems in production.
In my experience using pages as interim storage often works very well. While it does require more work than on request fetching it brings so many benefits that it’s usually the first option that I consider.
Using Dynamic Data Store or a custom database as interim storage
In situations where we are to fetch a lot of data from the external system and/or when the data from the external system changes very often storing it as pages in EPiServer may be to heavyweight. If that’s the case we need to find a different data store and with the introduction of the DDS in EPiServer 6 we have a fairly lightweight alternative.
While using the DDS or custom database tables may seem attractive and fun to build we need to be aware of that we lose a lot of functionality compared to storing the data as pages. Most prominent is the fact that we don’t get any administrative interface to view or edit the imported data in. Another important consideration is caching. With custom database tables we get no caching at all and even if we use an object relational mapper such as NHibernate that brings caching we need to configure it correctly. If we use DDS we get some caching for free as it caches the objects that it stores but we need to remember that it doesn’t cache queries. That’s of course partially the case when storing the data as pages as well as the DataFactory method FindPagesWithCriteria isn’t cached, but on the other hand we can often utilize the cached GetChildren method and do in memory querying instead.
I usually favor interim storage using the DDS when the external data comes in large volumes (tens of thousands or more), changes often, and there’s no requirement that says that editors should be able to modify or link to the imported content.
Search engines or document databases as interim storage
An alternative to using the DDS or tables in a relational database may be to use a document database or a search engine that indexes data structures (such as Truffler ;)). In doing so we lose some degree of support for transactions but may gain benefits in terms of ability to query the data in a fast way. Using a search engine or document database may therefor be a good fit when the integrity of the data isn’t crucial while we need fast querying capabilities.
No matter if we use on request fetching of interim storage to integrate with external systems exposing the external data using a page provider may look like an attractive option. By implementing the interface defined by the abstract base class PageProviderBase and adding a couple of lines to web.config we can expose the external data as pages in EPiServer without having to bother with actually saving, updating or deleting pages. Sounds great right?
I’ve been down this road a couple of times my self and also seen it done in a number of projects. While the page provider model does look like an attractive alternative for exposing external data my experience with doing so has ranged from “Yeah, it works, but we might as well have saved the data as pages instead” to “I’m never, ever building a page provider again!”.
If we look at EPiServer’s native page provider, the LocalPageProvider class, we’ll see that it does a lot. It takes care of different versions of pages, different language versions of pages, access rights etc. It also implements methods that are required to support other functions in the CMS, such as maintaining reference integrity when pages are moved or deleted. Not to mention that it handles a bunch of edge cases. While it’s fairly easy to create page provider that returns pages implementing a full fledged page provider that does everything that the LocalPageProvider does is so much work that I’ve never seen it done completely.
But what if we don’t care about access rights or versions and just want the basic functionality? Certainly creating a page provider is a good fit then? Perhaps. But don’t be surprised if you get some obscure bug three months into the project that has to do with weird URLs on the site that you twelve hours later track down to your implementation of the ResolveLocalPage method.
In fairness I should add that there are ways to tackle some of the obscure methods that one has to implement when building a page provider, such as the Mapped Page Provider project. There’s also cases where building a page provider can bring a lot of value. One such case is when you have local data, for instance content in EPiServer Community, that you would like to expose as pages so that editor’s can link to it using the native CMS functionality. In my experience those cases are fairly rare though and I’d favor storing the external data as “real” EPiServer pages 19 times out of 20. After all EPiServer has done a great job handling almost all conceivable scenarios in its native page provider, why not make use of it?
So far we’ve only discussed ways to expose data from external systems on our web sites. But data may of course flow in the other direction, from visitors to the external system with our web site acting as a proxy with or without some processing of the data. In my experience that’s not as common as exposing external data though which is why I haven’t discussed user input so far.
The ways to handle user input and passing it along to external systems are quite similar to what we’ve already discussed. We can either pass the data along in real-time on the same thread as is processing the visitors HTTP request or we can use some sort of interim storage. There’s two differences though. First of all functionality that involves handling user input is likely to be limited to a few places on the site making it less dangerous to process the input right away. Second, using interim storage brings some risk as we may have data that for some reason isn’t passed along correctly while the visitor perceives that it has been submitted. With that said, using interim storage allows us to accept user input even if the external system is temporarily unreachable and we can hold on to it until it’s available.
If we do decided to use interim storage using EPiServer pages as a storage mechanism most likely isn’t a good idea. Instead we need some sort of queuing software or a simple queue implemented using the DDS.
While there are a lot more to say about building integrations with external systems in EPiServer CMS based web sites this post has presented the most commonly used options. I’ve also discussed the pro’s and con’s and given some recommendations based on my own experience. These recommendations boil down to:
By decoupling fetching of data from external systems from the servicing of requests from users we can build more robust web sites that are able to function even if the external system is unresponsive.
Using a scheduled job to import and store external data as pages offers a robust way of fetching and displaying data from external sources. Doing so offers many benefits: We get all the good stuff that comes with EPiServer pages such as access rights, versioning etc. for free. We’re able to see the imported data in edit mode. Further, editors are able to easily link to the imported content as well as edit or delete it. Storing external content as pages is typically a good idea for RSS feeds, products, stores/offices, editorial content from other systems etc.
Using a scheduled job to import and store external data in the Dynamic Data Store or some other data store tends to work well for data that comes in volumes that aren’t suited for storing as pages in EPiServer or data that is updated very often. Storing external data in the DDS or custom data storage works well for article comments, product packages/offerings, stock data and the like.