EPiServer  /  Find May 07, 2012

Building a search page for an EPiServer site using Truffler – Part 2

NOTE Since writing this post the company behind Truffler, 200OK, has been sold to EPiServer and the product Truffler has been renamed to EPiServer Find. Most of the content of this blog post is however applicable to EPiServer Find as well. For questions regarding Find, be sure to visit the forum on EPiServer World.

The time has come to extend the search page created in the previous article with stemming and highlighting.

 In the previous post we covered building a very basic search page using Truffler for an EPiServer CMS based site for a fictive airline. The search page displayed search results based on user input but lacked some very basic functionality needed by any self-respecting search page. In this post we’ll fix that.

Stemming and specifying fields to search in

In the previous post we managed to find pages that matched a given search term using the Search and For methods, like this:

EPiSearchClient.Instance
    .Search<StandardPage>(Language.English)
    .For(Query)

With that code the search query will be executed in a language neutral way over a special field containing all indexed values for each page. To improve search results we could tell the search engine what language we’re searching in, English in our case. By doing so the search engine will use stemming, meaning that not just exact matches of words but also abbreviations, for instance car and cars, will match. To do so we pass an instance of the Language class to the Search method. To make that convenient the Language class has a static property returning such an instance for each supported language.

var result = EPiSearchClient.Instance
    .Search<StandardPage>(Language.English)
    .For(Query)

In order for stemming to be used we also need to explicitly specify what fields to search in. That can be done using the methods InField and InFields. The InField method requires a lamda expression specifying what field to search in. The InFields method does the same, but accepts multiple expressions. In our case the page’s name, preamble (MainIntro) and body text (MainBody) seems suitable to search in.

var result = EPiSearchClient.Instance
    .Search<StandardPage>(Language.English)
    .For(Query)
    .InField(x => x.PageName)
    .InFields(x => x.MainIntro, x => x.MainBody)
    .InAllField()

Note that we use a combination of the InField and InFields method here. That’s by no means necessary and I’ve only done so to illustrate that it’s possible and to set the stage for some tweaking that we may discuss in a future post. Anyway, but specifying a language to use and a number of fields to search in the search engine will match keywords and phrases in those fields in a language sensitive way.

However, for some fields we may not care about stemming, or we may for some other reason want to search in the field with all values. We can do that using the InAllField method.

With a language parameter passed to the search method, and after specifying a few fields to search in, including the field with all values the Search method looks like this:

private void Search()
{
    var result = EPiSearchClient.Instance
        .Search<StandardPage>(Language.English)
        .For(Query)
        .InField(x => x.PageName)
        .InFields(x => x.MainIntro, x => x.MainBody)
        .InAllField()
        .Select(x => new SearchHit
            {
                Title = x.PageName,
                Url = x.LinkURL
            })
        .GetResult();

        Results = result;
}

Searching over multiple types

So far we’ve limited the search functionality to searching for pages of the StandardPage type. In many situations it’s desirable to not search for all page types as some may not be of interest to visitors, but as we saw in the previous post, there is at least one other page type we’d like to search for, the Destination page type.To do so we have two options.

The IncludeType method

The first option is to explicitly include pages of that type using the IncludeType method. This method has two generic type parameters, the first specifying the result type, SearchHit in our case, and the second specifying the type to include. It also requires a lambda expression with a projection from the included type to the result type. This means that we could include pages of the DestinationPage type by modifying the search method like this:

private void Search()
{
    var result = EPiSearchClient.Instance
        .Search<StandardPage>(Language.English)
        .For(Query)
        .InField(x => x.PageName)
        .InFields(x => x.MainIntro, x => x.MainBody)
        .InAllField()
        .Select(x => new SearchHit
            {
                Title = x.PageName,
                Url = x.LinkURL
            })
        .IncludeType<SearchHit, DestinationPage>(x => new SearchHit
        {
            Title = x.PageName,
            Url = x.LinkURL
        })
        .GetResult();

        Results = result;
}

Inheritance

The IncludeType method can come in handy when searching over very different types, but in our case there’s a far easier solution. We can simply let both the StandardPage type and the DestinationPage type inherit from a common base class and utilize the inheritance support in Truffler’s .NET API. Common to both types are the MainIntro and MainBody properties so we begin by creating a an abstract base class with those that we let both types inherit from.

public abstract class EditorialPageBase : TypedPageData
{
    [PageTypeProperty(EditCaption = "Preamble", 
        SortOrder = 10, 
        Type = typeof (PropertyLongString))]
    public virtual string MainIntro { get; set; }

    [PageTypeProperty(SortOrder = 20)]
    public virtual string MainBody { get; set; }
}

With the common base class extracted we can modify the search method to search for pages of that type instead of using the IncludeType method:

private void Search()
{
    var result = EPiSearchClient.Instance
        .Search<EditorialPageBase>(Language.English)
        .For(Query)
        .InField(x => x.PageName)
        .InFields(x => x.MainIntro, x => x.MainBody)
        .InAllField()
        .Select(x => new SearchHit
            {
                Title = x.PageName,
                Url = x.LinkURL
            })
        .GetResult();

        Results = result;
}

Note that after introducing the base class and modifying the Search method we need to run the re-indexing job in order to get any search results as we’re relying on a type hierarchy that wasn’t there when the pages were initially indexed.

Highlighting

We’re now searching, with stemming, over multiple types, projecting search results to a common result item class. So far each search result item only contain a linked title though. Let’s extend them to also include a snippet of text. There’s several ways of doing that with Truffler. One would be to fetch a reference to each matching page and get the text, such as the MainIntro property, from the actual page. That works great in scenarios where we want to list pages based on other criteria than free text search, but for the classical search page we’d typically want a more context sensitive solution, highlighting.

Highlighting means that a part, or all, of the text is returned with keywords from the search query encased in HTML tags, em tags by default, that make them stand out. In order to retrieve highlights we need to specify at least one field, such as the MainIntro property, from which we want to extract them. There are multiple ways of doing that using the .NET API but the by far easiest is to use a special method named AsHighlighted in our existing projection expression.

private void Search()
{
    var result = EPiSearchClient.Instance
        .Search<EditorialPageBase>(Language.English)
        .For(Query)
        .InField(x => x.PageName)
        .InFields(x => x.MainIntro, x => x.MainBody)
        .InAllField()
        .Select(x => new SearchHit
            {
                Title = x.PageName,
                Url = x.LinkURL,
                Text = x.MainIntro.AsHighlighted()
            })
        .GetResult();

        Results = result;
}

In the above code we’ve modified the Search method to populate a third property in the returned search hits with the value of the MainIntro property with keywords highlighted. It’s important to note that while it may look like we’re doing the highlighting in memory on the web server we’re not. Instead the .NET API intercepts the AsHighlighted call and works some magic to the search query, instructing the search engine to return highlights for the MainIntro field.

In fact all of the code prior to the GetResult method call only builds up a search query on the web server side and the rest, with a very limited set of exceptions happen on the search engine side. An example search results listing now looks like the image below.

FlyTruffler-search-page-3-with-highlights_thumb[1]

In the above example almost all of the matching pages had the search term (beaches) in their MainIntro properties. But one of them didn’t and therefor no excerpt is displayed. We can fix this in the projection expression by using the original text if no highlight is returned:

Text = !string.IsNullOrWhiteSpace(x.MainIntro.AsHighlighted()) 
    ? x.MainIntro.AsHighlighted() 
    : x.MainIntro

A better solution would be to not only retrieve highlights from the MainIntro property but from the MainBody property as well. This could be achieved using the below code:

Text = !string.IsNullOrWhiteSpace(x.MainIntro.AsHighlighted()) 
    ? x.MainIntro.AsHighlighted()
    : x.MainBody.AsHighlighted(new HighlightSpec
        {
            FragmentSize = 300, 
            NumberOfFragments = 1
        })

With the above code we use the MainIntro propertys value given that it’s returned with highlights, otherwise we use highlights from the MainBody property. By default the entire string is returned when using the AsHighlighted method which works well for the MainIntro property but produces way too long results from the MainBody property.  We therefore give the AsHighlighted call for MainBody an instance of the HighlightSpec class setting the FragmentSize to 300, ensuring that the returned text won’t be much longer than 300 characters. In order for this setting to take effect we also specify that we want a single fragment of text returned.

Indexing a special property for highlighting

Using multiple invocations of the AsHighlighted method works well, but it looks a bit messy. In most situations an easier and better approach would instead be to index a special property from which we fetch our highlights. Let’s therefor add a new, non-EPiServer-property, to the base class for page types that we created before:

public virtual string SearchText
{
    get { return MainIntro + " " + MainBody; }
}

After adding this property and reindexing (as we’ve modified what should be in the index without actually making any changes to the pages) we can simplify our projection expression:

Text = x.SearchText.AsHighlighted(new HighlightSpec
    {
        FragmentSize = 300, 
        NumberOfFragments = 1
    })

Using the AsCropped method as fallback

The highlighting part of the projection to SearchHit again looks better, and the approach is quite flexible as any class that inherits from our base class can override the SearchText property to include or exclude properties. There’s just one problem. What if none of the included properties contain a keyword from the search query? Remember that we’re searching in the field with all values meaning that we could for instance get a hit for a page based on the names of one of its categories. One approach would of course be to only search in the fields we’re highlighting. Another would be to retrieve a part of the text without highlights. That can be achieved using another special method named AsCropped:

Text = FirstNonEmpty(
    x.SearchText.AsHighlighted(new HighlightSpec
        {
            FragmentSize = 300,
            NumberOfFragments = 1
        }),
    x.SearchText.AsCropped(300))


//... Helper method used above
private string FirstNonEmpty(params string[] texts)
{
    foreach (var text in texts)
    {
        if (!string.IsNullOrEmpty(text))
        {
            return text;
        }
    }
    return "";
}

Just like with the AsHighlighted method the AsCropped method may look like it does something on full string value on the web server while in fact the cropping is done on the search engine side, meaning that the full length of the text doesn’t have to be sent over the wire.

Paging

The search results now look pretty good with excerpts of text from the pages. There’s just one thing left to complete the basic search functionality – paging. Contrary to a database query, but similar to many other search engines, the search engine will default to returning the ten first hits. Changing how many hits should be returned and how many to skip is done using the Take and Skip methods, just like we’re used to from LINQ. With that said I won’t bore you with the details of how to implement paging functionality as that, apart from the fact that we can use Skip and Take, has little to do with Truffler. However, here’s one implementation Ler

A paging control – markup

<ul>
    <% if (ActivePageNumber > 1) { %>
        <li>
            <a href="<%= GetPageUrl(ActivePageNumber-1) %>">
                Prev
            </a>
        </li>
    <% } %>
    <% for (int page = 1; page <= NumberOfPages; page++) { %>
        <li>
            <a href="<%= GetPageUrl(page) %>" class='<%= page == ActivePageNumber ? "active" : ""%>'>
                <%= page %>
            </a>
        </li>
    <%} %>
    <% if (ActivePageNumber < NumberOfPages) { %>
        <li>
            <a href="<%= GetPageUrl(ActivePageNumber+1) %>">
                Next
            </a>
        </li>
    <% } %>
</ul>

A paging control – code behind

public Paging()
{
    //Set default values, overridable by setting the properties
    QuerystringKey = "p";
    PageSize = 10;
}

public string QuerystringKey { get; set; }

public int PageSize { get; set; }

public int ItemCount { get; set; }

public int ActivePageNumber
{
    get
    {
        var pageNumber = 1;
        if (!int.TryParse(
            Request.QueryString[QuerystringKey], 
            out pageNumber))
        {
            pageNumber = 1;
        }
        return pageNumber;
    }
}

public int NumberOfPages
{
    get { return (ItemCount + PageSize - 1)/PageSize; }
}

protected string GetPageUrl(int pageNumber)
{
    return UriSupport.AddQueryString(
        Request.RawUrl, 
        QuerystringKey, 
        pageNumber.ToString(CultureInfo.InvariantCulture));
}

Using the paging control

With the above user control added to our search page we can easily implement paging by adding three lines of code to the Search method:

private void Search()
{
    var result = EPiSearchClient.Instance
        .Search<EditorialPageBase>(Language.English)
        .For(Query)
        .InField(x => x.PageName)
        .InFields(x => x.MainIntro, x => x.MainBody)
        .InAllField()
        .Select(x => new SearchHit
            {
                Title = x.PageName,
                Url = x.LinkURL,
                Text = FirstNonEmpty(
                    x.SearchText.AsHighlighted(new HighlightSpec
                        {
                            FragmentSize = 300,
                            NumberOfFragments = 1
                        }),
                    x.SearchText.AsCropped(300))
            })
            .Take(Paging.PageSize)
            .Skip((Paging.ActivePageNumber - 1) * Paging.PageSize)
        .GetResult();

    Results = result;
    Paging.ItemCount = result.TotalMatching;
}

Summary

We now have a full blown search page. Free text search is made in a language sensitive way. A subset of the content is displayed for each search result with keywords highlighted. We also have paging functionality.

We’ve seen that there are a number of ways to work with highlights and projections. In my opinion this flexibility is great, but in most situations a recommended pattern is to index a separate field/property with text suitable for highlighting as that makes the search query easier to write.

While we now have a pretty good search page in terms of free text search and listing of results there’s more we can do to make it better and more powerful for users, especially by the use of facets. We’ll look at that in the next post.

PS. For updates about new posts, sites I find useful and the occasional rant you can follow me on Twitter. You are also most welcome to subscribe to the RSS-feed.

Joel Abrahamsson

Joel Abrahamsson

I'm a passionate web developer and systems architect living in Stockholm, Sweden. I work as CTO for a large media site and enjoy developing with all technologies, especially .NET, Node.js, and ElasticSearch. Read more

Comments

comments powered by Disqus

My book

Want a structured way to learn EPiServer 7 development? Check out my book on Leanpub!

More about EPiServer Find