There's one situation where we need to help ElasticSearch to understand the structure of our data in order to be able to query it fully - when dealing with arrays of complex objects.
Arguably one of the best features of ElasticSearch is that it allows us to index and search amongst complex JSON objects. We're not limited to a flat list of fields but can work with object graphs, like we're used to when programming with object oriented languages.
However, there's one situation where we need to help ElasticSearch to understand the structure of our data in order to be able to query it fully - when dealing with arrays of complex objects.
As an example, look at the below indexing request where we index a movie, including a list of the cast in the form of complex objects consisting of actors first and last names:
curl -XPOST "http://localhost:9200/index-1/movie/" -d'
{
"title": "The Matrix",
"cast": [
{
"firstName": "Keanu",
"lastName": "Reeves"
},
{
"firstName": "Laurence",
"lastName": "Fishburne"
}
]
}'
Given many such movies in our index we can find all movies with an actor named "Keanu" using a search request such as:
curl -XPOST "http://localhost:9200/index-1/movie/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"cast.firstName": "keanu"
}
}
}
}
}'
Running the above query indeed returns The Matrix. The same is true if we try to find movies that have an actor with the first name "Keanu" and last name "Reeves":
curl -XPOST "http://localhost:9200/index-1/movie/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"cast.firstName": "keanu"
}
},
{
"term": {
"cast.lastName": "reeves"
}
}
]
}
}
}
}
}'
Or at least so it seems. However, let's see what happens if we search for movies with an actor with "Keanu" as first name and "Fishburne" as last name.
curl -XPOST "http://localhost:9200/index-1/movie/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"cast.firstName": "keanu"
}
},
{
"term": {
"cast.lastName": "fishburne"
}
}
]
}
}
}
}
}'
Clearly this should, at first glance, not match The Matrix as there's no such actor amongst its cast. However, ElasticSearch will return The Matrix for the above query. After all, the movie does contain an author with "Keanu" as first name and (albeit a different) actor with "Fishburne" as last name. Based on the above query it has no way of knowing that we want the two term filters to match the same unique object in the list of actors. And even if it did, the way the data is indexed it wouldn't be able to handle that requirement.
Nested mapping and filter to the rescue
Luckily ElasticSearch provides a way for us to be able to filter on multiple fields within the same objects in arrays; mapping such fields as nested. To try this out, let's create ourselves a new index with the "actors" field mapped as nested.
curl -XPUT "http://localhost:9200/index-2" -d'
{
"mappings": {
"movie": {
"properties": {
"cast": {
"type": "nested"
}
}
}
}
}'
After indexing the same movie document into the new index we can now find movies based on multiple properties of each actor by using a nested filter. Here's how we would search for movies starring an actor named "Keanu Fishburne":
curl -XPOST "http://localhost:9200/index-2/movie/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested": {
"path": "cast",
"filter": {
"bool": {
"must": [
{
"term": {
"firstName": "keanu"
}
},
{
"term": {
"lastName": "fishburne"
}
}
]
}
}
}
}
}
}
}'
As you can see we've wrapped our initial bool filter in a nested filter. The nested filter contains a path property where we specify that the filter applies to the cast property of the searched document. It also contains a filter (or a query) which will be applied to each value within the nested property.
As intended, running the abobe query doesn't return The Matrix while modifying it to instead match "Reeves" as last name will make it match The Matrix. However, there's one caveat.
Including nested values in parent documents
If we go back to our very first query, filtering only on actors first names without using a nested filter, like the request below, we won't get any hits.
curl -XPOST "http://localhost:9200/index-2/movie/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"cast.firstName": "keanu"
}
}
}
}
}'
This happens because movie documents no longer have cast.firstName fields. Instead each element in the cast array is, internally in ElasticSearch, indexed as a separate document.
Obviously we can still search for movies based only on first names amongst the cast, by using nested filters though. Like this:
curl -XPOST "http://localhost:9200/index-2/movie/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested": {
"path": "cast",
"filter": {
"term": {
"firstName": "keanu"
}
}
}
}
}
}
}'
The above request returns The Matrix. However, sometimes having to use nested filters or queries when all we want to do is filter on a single property is a bit tedious. To be able to utilize the power of nested filters for complex criterias while still being able to filter on values in arrays the same way as if we hadn't mapped such properties as nested we can modify our mappings so that the nested values will also be included in the parent document. This is done using the include_in_parent property, like this:
curl -XPUT "http://localhost:9200/index-3" -d'
{
"mappings": {
"movie": {
"properties": {
"cast": {
"type": "nested",
"include_in_parent": true
}
}
}
}
}'
In an index such as the one created with the above request we'll both be able to filter on combinations of values within the same complex objects in the actors array using nested filters while still being able to filter on single fields without using nested filters. However, we now need to carefully consider where to use, and where to not use, nested filters in our queries as a query for "Keanu Fishburne" will match The Matrix using a regular bool filter while it won't when wrapping it in a nested filter. In other words, when using include_in_parent we may get unexpected results due to queries matching documents that it shouldn't if we forget to use nested filters.
PS. For updates about new posts, sites I find useful and the occasional rant you can follow me on Twitter. You are also most welcome to subscribe to the RSS-feed.
Similar articles
- ElasticSearch 101
- Dynamic mappings and dates in ElasticSearch
- Extending ASP.NET MVC Music Store with elasticsearch
- Grouping in ElasticSearch using child documents
- Truffler update – dotting the i’s and crossing the t’s
- Introducing Truffler – Advanced search made easy
- Getting to know Lucene.Net
- Cool new features in the Truffler .NET API
Comments
comments powered by Disqus