Search Engines June 07, 2014

Dynamic mappings and dates in ElasticSearch

JSON doesn't have a date type. Yet ElasticSearch can automatically map date fields for us. While this "just works" most of the time, it can be a good idea to help ElasticSearch help us by instead using naming conventions for dates. Here's why, and how.

ElasticSearch has a feature called dynamic mapping which is turned on by default. Using this we don't have to explicitly tell ElasticSearch how to index and store specific fields. Instead ElasticSearch figures it out itself by inspecting the content of our JSON properties.

Let's look at an example.

curl -XPOST "http://localhost:9200/myindex/tweet/" -d'
{
    "content": "Hello World!",
    "postDate": "2009-11-15T14:12:12"
}'

Given that there isn't already an indexed named "myindex" the above request will cause a number of things to happen in our ElasticSearch cluster.

  1. An index named "myindex" will be created.
  2. Mappings for a type named tweet will be created for the index. The mappings will contain two properties, content and postDate.
  3. The JSON object in the request body will be indexed.

After having made the above request we can inspect the mappings that will have been automatically created with the below request.

curl -XGET "http://localhost:9200/myindex/_mapping"

The response looks like this:

{
   "myindex": {
      "mappings": {
         "tweet": {
            "properties": {
               "content": {
                  "type": "string"
               },
               "postDate": {
                  "type": "date",
                  "format": "dateOptionalTime"
               }
            }
         }
      }
   }
}

As we can see in the above response, ElasticSearch has mapped the content property as a string and the postDate property as a date.  All is well.

However, let's look at what happens if we delete the index and modify our indexing request to instead look like this:

curl -XPOST "http://localhost:9200/myindex/tweet/" -d'
{
    "content": "1985-12-24",
    "postDate": "2009-11-15T14:12:12"
}'

In the above request the content property is still a string, but the only content of the string is a date. Retrieving the mappings now gives us a different result.

{
   "myindex": {
      "mappings": {
         "tweet": {
            "properties": {
               "content": {
                  "type": "date",
                  "format": "dateOptionalTime"
               },
               "postDate": {
                  "type": "date",
                  "format": "dateOptionalTime"
               }
            }
         }
      }
   }
}

ElasticSearch has now inferred that the content property also is a date. If we now try to index our original JSON object we'll get an exception in our faces.

{
   "error": "MapperParsingException[failed to parse [content]]; nested: MapperParsingException[failed to parse date field [Hello World!], tried both date format [dateOptionalTime], and timestamp number with locale []]; nested: IllegalArgumentException[Invalid format: \"Hello World!\"]; ",
   "status": 400
}

We're trying to insert a string value into a field which is mapped as a date. Naturally ElasticSearch won't allow us to do that.

While this scenario isn't very likely to happen, when it does it can be quite annoying and cause problems that can only be fixed by re-indexing everything into a new index. Luckily there's a number of possible solutions.

Disabling date detection

As a first step we can disable date detection for dynamic mapping. Here's how we would do that explicitly for documents of type tweet when creating the index:

curl -XPUT "http://localhost:9200/myindex" -d'
{
   "mappings": {
      "tweet": {
         "date_detection": false
      }
   }
}'

We then index our "problematic" tweet again:

curl -XPOST "http://localhost:9200/myindex/tweet/" -d'
{
    "content": "1985-12-24",
    "postDate": "2009-11-15T14:12:12"
}'

When we now inspect the mappings that has been dynamically created for us we see a different result compared to before:

{
   "myindex": {
      "mappings": {
         "tweet": {
            "date_detection": false,
            "properties": {
               "content": {
                  "type": "string"
               },
               "postDate": {
                  "type": "string"
               }
            }
         }
      }
   }
}

Now both fields have been mapped as strings, which they indeed are, even though they contain values that can be parsed as dates. However, this isn't good either as we'd like the postDate field to be mapped as a date though so that we can use range filters and the like on it.

Explicitly mapping date fields

We can explicitly map the postDate field as a date by re-creating the index and include a property mapping, like this:

curl -XPUT "http://localhost:9200/myindex" -d'
{
   "mappings": {
      "tweet": {
         "date_detection": false,
         "properties": {
             "postDate": {
                 "type": "date"
             }
         }
      }
   }
}'

If we now index our "problematic" tweet with a date in the content field we'll get the desired mappings; the content field mapped as a string and the postDate field mapped as a date. That's nice. However, this approach can be cumbersome when dealing with many types or types that we don't know about prior to documents of those types are indexed.

Mapping date fields using naming conventions

An alternative approach to disabling date detection and explicitly mapping specific fields as dates is instruct ElasticSearchs dynamic mapping functionality to adhere to naming conventions for dates. Take a look at the below request that (again) creates an index.

curl -XPUT "http://localhost:9200/myindex" -d'
{
   "mappings": {
      "_default_": {
         "date_detection": false,
         "dynamic_templates": [
            {
               "dates": {
                  "match": ".*Date|date",
                  "match_pattern": "regex",
                  "mapping": {
                     "type": "date"
                  }
               }
            }
         ]
      }
   }
}'

Compared to our previous requests used to creating an index with mappings this is quite different. First of all we no longer provide mappings for the tweet type. Instead we provide mappings for a type named _default_. This is a special type whose mappings will be used as the default "template" for all other types.

As before we start by disabling date detection in the mappings. However, after that we no longer provide mappings for properties but instead provide a dynamic template named dates.

Within the dates template we provide a pattern and specify that the pattern should be interpreted as a regular expression. Using this the template will be applied to all fields whose names either end with "Date" or whose names are exactly "date". For such fields the template instructs the dynamic mapping functionality to map them as dates.

Using this approach all string fields, no matter if their values can be parsed as dates or not will be mapped as string unless the field name is something like "postDate", "updateDate" or simply "date". Fields with such names will be mapped as dates instead.

While this is nice, there's one caveat. Indexing a JSON object with a property matching the naming convention for date fields but whose value can't be parsed as a date will cause an exception. Still, adhering to naming conventions for dates may be a small price to pay compared to the headaches of seemingly randomly having string fields mapped as dates simply because the first document to be indexed of a specific type happened to contain a string value that could be parsed as a date.

PS. For updates about new posts, sites I find useful and the occasional rant you can follow me on Twitter. You are also most welcome to subscribe to the RSS-feed.

Joel Abrahamsson

Joel Abrahamsson

I'm a passionate web developer and systems architect living in Stockholm, Sweden. I work as CTO for a large media site and enjoy developing with all technologies, especially .NET, Node.js, and ElasticSearch. Read more

Comments

comments powered by Disqus

More about Search Engines