Search Engines May 29, 2012

Extending ASP.NET MVC Music Store with elasticsearch

ElasticSearch is a great and powerful open source search engine that can be used to solve a great range of problems. Here we'll take a look at how we can use ElasticSearch in an ASP.NET MVC application.

elasticsearch-logoLast week I held a presentation at DevSum12 titled “elasticsearch For .NET Developers”. In it I talked about how we can use search engines in general and elasticsearch in particular to solve many types of querying problems.

To illustrate this and to demonstrate how elasticsearch can be used from .NET I did a demo where I extended the ASP.NET MVC Music Store project with free text search capabilities. I also rewrote the genre view and the genre menu to use elasticsearch instead of a database queries. This post is a write up of that demo.

If you prefer a language and platform agnostic introduction to elasticsearch check out my getting started with elasticsearch tutorial.

Running elasticsearch

Setting up and running an elasticsearch server couldn’t be easier. Simply download the latest version, unzip it and run bin/elasticsearch.bat from a the console.

console

A .NET client API – NEST

elasticsearch exposes a great JSON based REST API. However, to interact with it easily from .NET we’ll use a client API. There are several such APIs for .NET, not to mention other languages. As I’m one of the co-founders of Truffler, a SaaS solution for search and content retrieval based on elasticsearch I’d of course love to use the Truffler .NET API. But as one key point of that API is to make it even easier than it already is to interact with elasticsearch it isn’t a good fit for this post.

Other .NET APIs include ElasticSearch.NET and PlainElastic.Net. In this post we’ll use a third one, called NEST, which seems to be the open source .NET API for elasticsearch with the most traction. It’s also well suited for this tutorial as it maps closely to elasticsearch’s REST API while utilizing some of the strengths of C#.

To add NEST to the ASP.NET MVC Music Store project we first need to download and compile it. That may seem tedious but it’s actually straight forward. Simply download the source code from GitHub, open it up in Visual Studio and compile. Once compiled reference the Nest, Fasterflect and Newtonsoft.Json assemblies in the Music Store project.

Indexing albums

Before we build search functionality we need to populate an index. For a real production site we’d probably like to index albums when they are added, updated or removed. However, it’s usually also a good idea to have the ability to index all content. As this post is about building search functionality we’ll settle for building just that, functionality to index all albums.

The ASP.NET MVC Music Store project consists of six controllers. One of these, the StoreManagerController class, is used for admin functionality so it seems reasonable to place an action for indexing, or re-indexing, all albums there. We’ll name it ReIndex.

As a first step we modify the StoreManagerController by adding a using statement for NEST and by adding the ReIndex action. We let the action return a redirect to the index view.

//Other, existing using statements
using Nest;

namespace MvcMusicStore.Controllers
{
    [Authorize(Roles = "Administrator")]
    public class StoreManagerController : Controller
    {
        public ActionResult ReIndex()
        {
            return RedirectToAction("Index");
        }
        
        //Other, existing class members
    }
}

With this in place we’re ready to add code for indexing all albums. First we need an instance of NEST’s ElasticClient class which we’ll use to interact with the search engine. To instantiate it we need to provide a URL and port number. Given that we’re running the server of our local machine and haven’t changed the setting for port number we can use localhost and port number 9200.

var setting = new ConnectionSettings("localhost", 9200);
var client = new ElasticClient(setting);

With our client in place we iterate over a list of all albums which we retrieve from the database and index each album. In order to index an album we use the ElasticClient’s Index method. More specifically we use an overload that allows us to specify index name, type name and ID.

foreach (var album in db.Albums)
{
    client.Index(album, "musicstore", "albums", album.AlbumId);
}

Note that we haven’t created the index named musicstore. While elasticsearch provides API methods for explicitly creating indexes it can also automatically create an index if it doesn’t exist when we try to feed a document to it. The full ReIndex method should now look like this:

public ActionResult ReIndex()
{
    var setting = new ConnectionSettings("localhost", 9200);
    var client = new ElasticClient(setting);

    foreach (var album in db.Albums)
    {
        client.Index(album, "musicstore", "albums", album.AlbumId);
    }

    return RedirectToAction("Index");
}

As we haven’t create a button in the StoreManager’s Index view we don’t have any user friendly way to invoke it. We can however use it by going to <siteURL>/StoreManager/ReIndex. There’s just one problem. When indexing albums each album will be serialized to JSON and the the Album class contains the nemesis of most serializers, a circular reference.

The Album class has a property of type Genre, which in turn has a list of Albums, meaning that the serialization process will end up in an endless loop or crash if we don’t prevent that. Luckily doing so is easy. All we have to do is either exclude the Album class’ Genre property, or the Genre class’ Albums property, from being serialized using a JsonIgnore attribute. As we’d like the Genre, except for its list of albums, to be indexed with each album so that albums can be searched by genre we’ll add the attribute to the Genre class.

using System.Collections.Generic;
using Newtonsoft.Json;

namespace MvcMusicStore.Models
{
    public partial class Genre
    {
        public int GenreId { get; set; }
        public string Name { get; set; }
        public string Description { get; set; }

        //The below attribute excludes the Albums
        //property from JSON serialization, preventing
        //circular references when serializing albums.
        [JsonIgnore]
        public List<Album> Albums { get; set; }
    }
}

We’ve now added functionality, in the form of the ReIndex action, to index all albums. After invoking it we have an index containing the site’s albums. To verify that so is actually the case we can use a simple query string query to see what’s in the index by directing a browser to http://localhost:9200/_search?q=*. We’re ready to create a search page.

In a real production scenario we’d probably start out by creating a new view, view model, controller action and possibly also a separate controller for the search page. However, in order to simply illustrate how we can use elasticsearch to build a search page for the ASP.NET MVC Music Store project we can do a simple implementation by borrowing an existing view and model class.

We’ll add a new action named Search to the Store controller which is otherwise used to list albums by genre, display details for a single album and so on. The action will return the same view as the Browse action which lists all albums in a specific genre and feed it an instance of the Genre class as model. As the Browse view display’s the models title as headline we can use it to display a custom headline for search results. Let’s do that before getting down to search business.

using System.Linq;
using System.Web.Mvc;
using MvcMusicStore.Models;
using Nest;

namespace MvcMusicStore.Controllers
{
    public class StoreController : Controller
    {
        MusicStoreEntities storeDB = new MusicStoreEntities();

        public ActionResult Search(string q)
        {
            var genre = new Genre()
            {
                Name = "Search results for " + q
            };

            return View("Browse", genre);
        }

        //Other existing actions
    }
}

Our new Search action will return the Browse view displaying information about what the user searched for in the form of input from the query string parameter q. However, If we tried to invoke the action we’d get a null reference exception from the view as it tries to iterate over the albums in the model’s Albums property. Let’s populate that with albums matching the search query.

As with the ReIndex action we’ll need an instance of the ElasticClient class to interact with the elasticsearch server. We’ll create a static helper property to create it, this time specifying a default index with the same name as we used when indexing, “musicstore”.

private static ElasticClient ElasticClient
{
    get
    {
        var setting = new ConnectionSettings("localhost", 9200);
        setting.SetDefaultIndex("musicstore");
        return = new ElasticClient(setting);
    }
}

To search for albums matching the query from the query string in the Search action we’ll use the ElasticClient’s Search method and add a QueryString query to the request body. We then add the matching albums to the Genre that we pass as a model to the view.

public ActionResult Search(string q)
{
    var result = ElasticClient.Search<Album>(body =>
        body.Query(query =>
        query.QueryString(qs => qs.Query(q))));

    var genre = new Genre()
    {
        Name = "Search results for " + q,
        Albums = result.Documents.ToList()
    };

    return View("Browse", genre);
}

We now have working free text search on our music store, albeit without a form for our users to enter their query in but we can try it out by browsing to, for instance, /store/search?q=deep%20purple. So, what did we do to create the search request in the code above?

Well, we used the Search method specifying what type to search for using it’s type parameter. We also passed it a single argument, a delegate that modifies the search request body. In elasticsearch the search request body can contain a number of things such as a filter, requests for facets, number of hits to retrieve and so on. Perhaps most prominently it can also contain a query, which we added using the Query method. elasticsearch supports many types of queries and in this case we used a query string query (not directly related to query strings in URLs).

A query string query will be parsed by the search engine and converted to various other types of queries, allowing users to use Lucene syntax when searching. This is powerful as it allows users to use keywords such as AND and OR, specify fuzziness and quite a few other options. Be ware though that this also means that the search engine will throw an exception if the query isn’t syntactically valid.

Using elasticsearch for querying

While elasticsearch is great for free text search we can also use it for many other tasks that aren’t perceived as searching per se by users. A simple example of this is replacing the database query in the Browse action. While doing so doesn’t make much sense considering the small amount of data in the database that ships with MVC Music Store it does make for a good example of how we can use filters to query for data. And if the site had contained tens of thousands of albums instead of a couple of hundred using elasticsearch to list albums in a specific genre might be faster than querying the database. Not to mention that it would take a load off the database allowing it to focus on handling admin functionality and purchases where it’s transaction management functionalities comes in handy.

In it’s original form the Browse action looks like this:

public ActionResult Browse(string genre)
{
    // Retrieve Genre and its Associated Albums from database
    var genreModel = storeDB.Genres.Include("Albums")
        .Single(g => g.Name == genre);

    return View(genreModel);
}

As you can see we fetch the genre matching the input parameter to the action from the database, joining it with the Albums table. While we haven’t indexed any genres separately we have all the relevant data in the index as we’ve indexed albums along with their genres. Therefor we can re-implement the Browse action using a simple search request with a filter.

public ActionResult Browse(string genre)
{
    var result = ElasticClient.Search<Album>(body =>
        body.Filter(filter =>
            filter.Term(x => 
                x.Genre.Name, genre.ToLower()))
        .Take(1000));

    var genreModel = new Genre()
        {
            Name = genre,
            Albums = result.Documents.ToList()
        };

    return View(genreModel);
}

The above code yields the same result as the original implementation, only we’re using elasticsearch instead of going to the database through Entity Framework. But what does the code do?

Just as before we use the ElasticClient’s Search method and pass it a delegate that modifies a search request body. This time we don’t add a query to it but instead add a filter. The filter, a TermFilter, specifies that we want to match albums whose genre equals that of the input parameter to the action. As we’re using an automatically created index it will have default mappings meaning that string values will be indexed in a way suitable for free text search. This means that they will, among other things, be normalized in terms of casing. We could change this by adding a mapping for the Genre.Name field but for this simple example we make due with lowercasing the input value.

We also modify the search request body to request a thousand albums, which is plenty considering that the site only contains 246 albums. If we hadn’t done so we’d only get the default number of search results back from elasticsearch, ten.

Finally we create a genre populating it with the matching albums and the genre name from the input to the action. Of course setting the name to the action parameter opens up all kinds of security holes, but for this example it’ll do.

Queries and filters

In the above example we added a filter directly to the search request body. That works, but in production we’d typically would like to wrap the filter in a query. Doing so yields better performance as the search engine then doesn’t have to bother with relevance scoring. Of course in the above example we don’t actually want to use a query that effects the result. In such cases a constant score query comes in handy. In other words, for optimal performance we should rewrite the search part of the Browse action to something like this:

var result = ElasticClient.Search<Album>(body =>
    body.Query(query => 
        query.ConstantScore(
            csq => csq.Filter(filter =>
                filter.Term(x =>
                    x.Genre.Name, genre.ToLower()))))
    .Take(1000));

 

Rewriting the genre menu using a facet

A second example of how we could take a load off the database by using elasticsearch is by modifying the left hand side menu of the store that lists genres. Again, while we haven’t indexed individual genres we have indexed them as nested objects in the albums. This means that we can feed the GenreMenu action’s view with an aggregated list of genres retrieved from the albums. A perfect job for a terms facet, one of many facets supported by elasticsearch.

In it’s original form the GenreMenu action is implemented like this:

public ActionResult GenreMenu()
{
    var genres = storeDB.Genres.ToList();

    return PartialView(genres);
}

Using elasticsearch we could re-implement it like this:

public ActionResult GenreMenu()
{
    var result = ElasticClient.Search<Album>(body =>
        body.Take(0)
        .FacetTerm(x => x.OnField(f => f.Genre.Name)));

    var genres = result
        .FacetItems<TermItem>(x => x.Genre.Name)
        .Select(x => new Genre {Name = x.Term});

    return PartialView(genres);
}

In the above example we again create a search request for albums. Only this time we specifically ask for zero hits and instead add a request for a terms facet for the Genre.Name field. We then retrieve the facet items, consisting of the term (the genre name) and a count (number of albums in the genre), from the result and use those to build up a list of genres.

This implementation consists of a lot more code than the original version and I’m not saying that re-implementing the GenreMenu action like this is necessarily a good idea. But it’s interesting to see that we can.

I should also add that this implementation won’t produce the exact same result a the original. As we haven’t added any custom mappings to our index the genre names will be returned as lower cased. Fixing that could fairly easily be done by adding mappings should we want to, but that’s beyond the scope of this post.

Summary

While this post doesn’t discuss the nitty gritty details of using a search engine such as elasticsearch, such as mappings, specifying fields to search in and scalability (which is just awesome) we’ve seen how we can easily add free text search to an ASP.NET site. We’ve also seen that we can use elasticsearch to create functionality beyond free text search. By doing so we can build highly scalable websites and free up any relational database we may be using so that it can focus on what it does best, handling relations and transactions.

PS. For updates about new posts, sites I find useful and the occasional rant you can follow me on Twitter. You are also most welcome to subscribe to the RSS-feed.

Joel Abrahamsson

Joel Abrahamsson

I'm a passionate web developer and systems architect living in Stockholm, Sweden. I work as CTO for a large media site and enjoy developing with all technologies, especially .NET, Node.js, and ElasticSearch. Read more

Comments

comments powered by Disqus

More about Search Engines