EPiServer  /  Find September 17, 2012

Wildcard queries with EPiServer Find

Match parts of words with EPiServer Find using an extension method for wildcard queries.

A few days ago a question was posted in the Find forum on EPiServer World that basically boiled down to how to do wildcard queries with EPiServer Find. That is, how to match a part of a word. Find’s .NET API is made up of several “layers” with a fluent API for querying at the top and classes that map more closely to the REST API exposed by the search engine at the bottom. While the fluent API doesn’t currently support wild card queries the REST API certainly does and the lower level components exist for it in the .NET API. Therefor it’s possible to add a custom extension method with which wild card queries can be created.

using System;
using System.Linq.Expressions;
using EPiServer.Find;
using EPiServer.Find.Api.Querying.Queries;

public static class SearchExtensions
{
    public static ITypeSearch<T> WildCardQuery<T>(
        this ITypeSearch<T> search, 
        string query, 
        Expression<Func<T, string>> fieldSelector,
        double? boost = null)
    {
        //Create the Wildcard query object
        var fieldName = search.Client.Conventions
            .FieldNameConvention
            .GetFieldNameForAnalyzed(fieldSelector);
        var wildcardQuery = new WildcardQuery(
            fieldName, 
            query.ToLowerInvariant());
        wildcardQuery.Boost = boost;

        //Add it to the search request body
        return new Search<T, WildcardQuery>(search, context =>
        {
            if (context.RequestBody.Query != null)
            {
                var boolQuery = new BoolQuery();
                boolQuery.Should.Add(context.RequestBody.Query);
                boolQuery.Should.Add(wildcardQuery);
                boolQuery.MinimumNumberShouldMatch = 1;
                context.RequestBody.Query = boolQuery;
            }
            else
            {
                context.RequestBody.Query = wildcardQuery;
            }
        });
    }
}

Using this method in it's simples form could look something like this:

var result = SearchClient.Instance
  .Search<PageData>()
  .WildcardQuery("*ppl*", x => x.PageName)
  .GetPagesResult();

The above query will match any page that has a name containing a word which contains the character sequence “ppl”, meaning that pages named “Apple”, “Supply”, “Green apples” and “My application” will be matched while a page named “App” won’t be. Modifying the above to instead only have question marks around the characters (“?ppl?”) would limit the matched pages to the one named “Apple” while “?ppl*” would also include the page named “Green apples”.

As you’ve probably noticed the method requires an expression which specifies what field to search in. There’s nothing stopping us from invoking the query multiple times to search in multiple fields. However, please note that wild card queries can become slow when applied to many documents and/or fields with much text. Therefor it’s best to try to limit their use to fields with short text, such as PageName and not MainBody for EPiServer CMS pages. It’s also a good idea to question requirements if they specify that a wild card query should be used on all text as that tends to produce very broad results and that’s typically not the expected behavior (you don’t get hits for “Apple” if you search for “ppl” with Google).

You can of course combine the WildcardQuery method with the regular For method. For instance, the below query might work well for search-as-you-type functionality.

var result = SearchClient.Instance
  .Search<PageData>()
  .For("ppl")
  .InFields(x => x.PageName, x.MainBody)
  .WildcardQuery("*ppl*", x => x.PageName)
  .GetPagesResult();

The last, optional parameter can be used to specify a boost level for the wildcard query. Setting it to a very low value (like 0.01) will have the effect of including documents that match the query in the result but favor those matched by other queries, such as the one added with the For method in the above example.

Fuzzy queries

Related to wildcard queries are fuzzy queries. Be sure to check out Henrik Lindström’s post on how to fuzzy queries with Find!

PS. For updates about new posts, sites I find useful and the occasional rant you can follow me on Twitter. You are also most welcome to subscribe to the RSS-feed.

Joel Abrahamsson

Joel Abrahamsson

I'm a passionate web developer and systems architect living in Stockholm, Sweden. I work as CTO for a large media site and enjoy developing with all technologies, especially .NET, Node.js, and ElasticSearch. Read more

Comments

comments powered by Disqus

My book

Want a structured way to learn EPiServer 7 development? Check out my book on Leanpub!

More about EPiServer Find