ElasticSearch - 嵌套映射和过滤器

Because nested objects are indexed as separate hidden documents, we can’t query them directly. Instead, we have to use the nested query to access them:

GET /my_index/blogpost/_search
{
"query": {
"bool": {
"must": [
{ "match": { "title": "eggs" }},
{
"nested": {
"path": "comments",
"query": {
"bool": {
"must": [
{ "match": { "comments.name": "john" }},
{ "match": { "comments.age": 28 }}
]
}}}}
]
}}} ①The title clause operates on the root document.
②The nested clause “steps down” into the nested comments field. It no longer has access to fields in the root document, nor fields in any other nested document.
③ The comments.name and comments.age clauses operate on the same nested  document
nested field can contain other nested fields. Similarly, a nested query can contain othernested queries. The nesting hierarchy is applied as you would expect.

Of course, a nested query could match several nested documents. Each matching nested document would have its own relevance score, but these multiple scores need to be reduced to a single score that can be applied to the root document.

By default, it averages the scores of the matching nested documents. This can be controlled by setting thescore_mode parameter to avgmaxsum, or even none (in which case the root document gets a constant score of 1.0).

GET /my_index/blogpost/_search
{
"query": {
"bool": {
"must": [
{ "match": { "title": "eggs" }},
{
"nested": {
"path": "comments",
"score_mode": "max",
"query": {
"bool": {
"must": [
{ "match": { "comments.name": "john" }},
{ "match": { "comments.age": 28 }}
]
}}}}
]
}}}
①Give the root document the _score from the best-matching nested document.

If placed inside the filter clause of a Boolean query, a nested query behaves much like anested query, except that it doesn’t accept the score_mode parameter. Because it is being used as a non-scoring query — it includes or excludes, but doesn’t score —  a score_modedoesn’t make sense since there is nothing to score.

curl -XPOST "http://localhost:9200/index-1/movie/" -d'
{
   "title": "The Matrix",
   "cast": [
      {
         "firstName": "Keanu",
         "lastName": "Reeves"
      },
      {
         "firstName": "Laurence",
         "lastName": "Fishburne"
      }
   ]
}'

Given many such movies in our index we can find all movies with an actor named "Keanu" using a search request such as:

curl -XPOST "http://localhost:9200/index-1/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "term": {
               "cast.firstName": "keanu"
            }
         }
      }
   }
}'

Running the above query indeed returns The Matrix. The same is true if we try to find movies that have an actor with the first name "Keanu" and last name "Reeves":

curl -XPOST "http://localhost:9200/index-1/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "bool": {
               "must": [
                  {
                     "term": {
                        "cast.firstName": "keanu"
                     }
                  },
                  {
                     "term": {
                        "cast.lastName": "reeves"
                     }
                  }
               ]
            }
         }
      }
   }
}'

Or at least so it seems. However, let's see what happens if we search for movies with an actor with "Keanu" as first name and "Fishburne" as last name.

curl -XPOST "http://localhost:9200/index-1/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "bool": {
               "must": [
                  {
                     "term": {
                        "cast.firstName": "keanu"
                     }
                  },
                  {
                     "term": {
                        "cast.lastName": "fishburne"
                     }
                  }
               ]
            }
         }
      }
   }
}'

Clearly this should, at first glance, not match The Matrix as there's no such actor amongst its cast. However, ElasticSearch will return The Matrix for the above query. After all, the movie does contain an author with "Keanu" as first name and (albeit a different) actor with "Fishburne" as last name. Based on the above query it has no way of knowing that we want the two term filters to match the same unique object in the list of actors. And even if it did, the way the data is indexed it wouldn't be able to handle that requirement.

Nested mapping and filter to the rescue

Luckily ElasticSearch provides a way for us to be able to filter on multiple fields within the same objects in arrays; mapping such fields as nested. To try this out, let's create ourselves a new index with the "actors" field mapped as nested.

curl -XPUT "http://localhost:9200/index-2" -d'
{
   "mappings": {
      "movie": {
         "properties": {
            "cast": {
               "type": "nested"
            }
         }
      }
   }
}'

After indexing the same movie document into the new index we can now find movies based on multiple properties of each actor by using a nested filter. Here's how we would search for movies starring an actor named "Keanu Fishburne":

curl -XPOST "http://localhost:9200/index-2/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "nested": {
               "path": "cast",
               "filter": {
                  "bool": {
                     "must": [
                        {
                           "term": {
                              "firstName": "keanu"
                           }
                        },
                        {
                           "term": {
                              "lastName": "fishburne"
                           }
                        }
                     ]
                  }
               }
            }
         }
      }
   }
}'

As you can see we've wrapped our initial bool filter in a nested filter. The nested filter contains a path property where we specify that the filter applies to the cast property of the searched document. It also contains a filter (or a query) which will be applied to each value within the nested property.

As intended, running the abobe query doesn't return The Matrix while modifying it to instead match "Reeves" as last name will make it match The Matrix. However, there's one caveat.

Including nested values in parent documents

If we go back to our very first query, filtering only on actors first names without using a nested filter, like the request below, we won't get any hits.

curl -XPOST "http://localhost:9200/index-2/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "term": {
               "cast.firstName": "keanu"
            }
         }
      }
   }
}'

This happens because movie documents no longer have cast.firstName fields. Instead each element in the cast array is, internally in ElasticSearch, indexed as a separate document.

Obviously we can still search for movies based only on first names amongst the cast, by using nested filters though. Like this:

curl -XPOST "http://localhost:9200/index-2/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "nested": {
               "path": "cast",
               "filter": {
                  "term": {
                     "firstName": "keanu"
                  }
               }
            }
         }
      }
   }
}'

The above request returns The Matrix. However, sometimes having to use nested filters or queries when all we want to do is filter on a single property is a bit tedious. To be able to utilize the power of nested filters for complex criterias while still being able to filter on values in arrays the same way as if we hadn't mapped such properties as nested we can modify our mappings so that the nested values will also be included in the parent document. This is done using theinclude_in_parent property, like this:

curl -XPUT "http://localhost:9200/index-3" -d'
{
   "mappings": {
      "movie": {
         "properties": {
            "cast": {
               "type": "nested",
               "include_in_parent": true
            }
         }
      }
   }
}'

In an index such as the one created with the above request we'll both be able to filter on combinations of values within the same complex objects in the actors array using nested filters while still being able to filter on single fields without using nested filters. However, we now need to carefully consider where to use, and where to not use, nested filters in our queries as a query for "Keanu Fishburne" will match The Matrix using a regular bool filter while it won't when wrapping it in a nested filter. In other words, when using include_in_parent we may get unexpected results due to queries matching documents that it shouldn't if we forget to use nested filters.

PS. For updates about new posts, sites I find useful and the occasional rant you can follow me on Twitter. You are also most welcome to subscribe to the RSS-feed.

Array Type

Read the doc on elasticsearch.org

As its name suggests, it can be an array of native types (string, int, …) but also an array of objects (the basis used for “objects” and “nested”).

Here are some valid indexing examples :

{
"Article" : [
{
"id" : 12
"title" : "An article title",
"categories" : [1,3,5,7],
"tag" : ["elasticsearch", "symfony",'Obtao'],
"author" : [
{
"firstname" : "Francois",
"surname": "francoisg",
"id" : 18
},
{
"firstname" : "Gregory",
"surname" : "gregquat"
"id" : "2"
}
]
}
},
{
"id" : 13
"title" : "A second article title",
"categories" : [1,7],
"tag" : ["elasticsearch", "symfony",'Obtao'],
"author" : [
{
"firstname" : "Gregory",
"surname" : "gregquat",
"id" : "2"
}
]
}
}

You can find different Array :

  • Categories : array of integers
  • Tags : array of strings
  • author : array of objects (inner objects or nested)

We explicitely specify this “simple” type as it can be more easy/maintainable to store a flatten value rather than the complete object.
Using a non relational structure should make you think about a specific model for your search engine :

  • To filter : If you just want to filter/search/aggregate on the textual value of an object, then flatten the value in the parent object.
  • To get the list of objects that are linked to a parent (and if you do not need to filter or index these objects), just store the list of ids and hydrate them with Doctrine and Symfony (in French for the moment).

Inner objects

The inner objects are just the JSON object association in a parent. For example, the “authors” in the above example. The mapping for this example could be :

fos_elastica:
clients:
default: { host: %elastic_host%, port: %elastic_port% }
indexes:
blog :
types:
article :
mappings:
title : ~
categories : ~
tag : ~
author :
type : object
properties :
firstname : ~
surname : ~
id :
type : integer

You can Filter or Query on these “inner objects”. For example :

query: author.firstname=Francois will return the post with the id 12 (and not the one with the id 13).

You can read more on the Elasticsearch website

Inner objects are easy to configure. As Elasticsearch documents are “schema less”, you can index them without specify any mapping.

The limitation of this method lies in the manner as ElasticSearch stores your data. Reusing the above example, here is the internal representation of our objects :

[
{
"id" : 12
"title" : An article title",
"categories" : [1,3,5,7],
"tag" : ["elasticsearch", "symfony",'Obtao'],
"author.firstname" : ["Francois","Gregory"],
"author.surname" : ["Francoisg","gregquat"],
"author.id" : [18,2]
}
{
"id" : 13
"title" : "A second article",
"categories" : [1,7],
"tag" : ["elasticsearch", "symfony",'Obtao'],
"author.firstname" : ["Gregory"],
"author.surname" : ["gregquat"],
"author.id" : [2]
}
]

The consequence is that the query :

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"firstname": "francois",
"surname": "gregquat"
}
}
}
}
}

author.firstname=Francois AND surname=gregquat will return the document “12″. In the case of an inner object, this query can by translated as “Who has at least one author.surname = gregquat and one author.firstname=francois”.

To fix this problem, you must use the nested.

Les nested

First important difference : nested must be specified in your mapping.

The mapping looks like an object one, only the type changes :

fos_elastica:
clients:
default: { host: %elastic_host%, port: %elastic_port% }
indexes:
blog :
types:
article :
mappings:
title : ~
categories : ~
tag : ~
author :
type : nested
properties :
firstname : ~
surname : ~
id :
type : integer

This time, the internal representation will be :

[
{
"id" : 12
"title" : "An article title",
"categories" : [1,3,5,7],
"tag" : ["elasticsearch", "symfony",'Obtao'],
"author" : [{
"firstname" : "Francois",
"surname" : "Francoisg",
"id" : 18
},
{
"firstname" : "Gregory",
"surname" : "gregquat",
"id" : 2
}]
},
{
"id" : 13
"title" : "A second article title",
"categories" : [1,7],
"tags" : ["elasticsearch", "symfony",'Obtao'],
"author" : [{
"firstname" : "Gregory",
"surname" : "gregquat",
"id" : 2
}]
}
]

This time, we keep the object structure.

Nested have their own filters which allows to filter by nested object. If we go on with our example (with the limitation of inner objects), we can write this query :

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested" : {
"path" : "author",
"filter": {
"bool": {
"must": [
{
"term" : {
"author.firsname": "francois"
}
},
{
"term" : {
"author.surname": "gregquat"
}
}
]
}
}
}
}
}
}
}

hi
We can translate it as “Who has an author object whose surname is equal to ‘gregquat’ and whose firstname is ‘francois’”. This query will return no result.

There is still a problem which is penalizing when working with bug objects : when you want to change a single value of the nester, you have to reindex the whole parent document (including the nested).
If the objects are heavy, and often updated, the impact on performances can be important.

To fix this problem, you can use the parent/child associations.

Parent/Child

Parent/child associations are very similar to OneToMany relationships (one parent, several children).
The relationship remains hierarchical : an object type is only associated to one parent, and it’s impossible to create a ManyToMany relationship.

We are going to link our article to a category :

fos_elastica:
clients:
default: { host: %elastic_host%, port: %elastic_port% }
indexes:
blog :
types:
category :
mappings :
id : ~
name : ~
description : ~
article :
mappings:
title : ~
tag : ~
author : ~
_routing:
required: true
path: category
_parent:
type : "category"
identifier: "id" #optional as id is the default value
property : "category" #optional as the default value is the type value

When indexing an article, a reference to the Category will also be indexed (category.id).
So, we can index separately categories and article while keeping the references between them.

Like for nested, there are Filters and Queries that allow to search on parents or children :

  • Has Parent Filter / Has Parent Query : Filter/query on parent fields, returns children objects. In our case, we could filter articles whose parent category contains “symfony” in his description.
  • Has Child Filter / Has Child Query : Filter/query on child fields, returns the parent object. In our case, we could filter Categories for which “francoisg” has written an article.
{
"query": {
"has_child": {
"type": "article",
"query" : {
"filtered": {
"query": { "match_all": {}},
"filter" : {
"term": {"tag": "symfony"}
}
}
}
}
}
}

This query will return the Categories that have at least one article tagged with “symfony”.

The queries are here written in JSON, but are easily transformable into PHP with the Elastica library.

ElasticSearch 嵌套映射和过滤器及查询的更多相关文章

  1. Solr查询和过滤器执行顺序剖析

    一.简介 Solr的搜索主要由两个操作组成:找到与请求参数相匹配的文档:对这些文档进行排序,返回最相关的匹配文档.默认情况下,文档根据相关度进行排序.这意味着,找到匹配的文档集之后,需要另一个操作来计 ...

  2. ElasticSearch 5学习(10)——结构化查询(包括新特性)

    之前我们所有的查询都属于命令行查询,但是不利于复杂的查询,而且一般在项目开发中不使用命令行查询方式,只有在调试测试时使用简单命令行查询,但是,如果想要善用搜索,我们必须使用请求体查询(request ...

  3. Elasticsearch(入门篇)——Query DSL与查询行为

    ES提供了丰富多彩的查询接口,可以满足各种各样的查询要求.更多内容请参考:ELK修炼之道 Query DSL结构化查询 Query DSL是一个Java开源框架用于构建类型安全的SQL查询语句.采用A ...

  4. python 全栈开发,Day70(模板自定义标签和过滤器,模板继承 (extend),Django的模型层-ORM简介)

    昨日内容回顾 视图函数: request对象 request.path 请求路径 request.GET GET请求数据 QueryDict {} request.POST POST请求数据 Quer ...

  5. python3之Django内置模板标签和过滤器

    一.模板标签 内置标签: 1.autoescape 控制当前的自动转义行为,此标记采用on或者off作为参数,并确定自动转义是否在块内有效.该块以endautoescape结束标签关闭. views: ...

  6. Django内建模版标签和过滤器

    第四章列出了许多的常用内建模板标签和过滤器.然而,Django自带了更多的内建模板标签及过滤器.这章附录列出了截止到编写本书时,Django所包含的各个内建模板标签和过滤器,但是,新的标签是会被定期地 ...

  7. Django基础(2)--模板自定义标签和过滤器,模板继承 (extend),Django的模型层-ORM简介

    没整理完 昨日回顾: 视图函数: request对象 request.path 请求路径 request.GET GET请求数据 QueryDict {} request.POST POST请求数据 ...

  8. .Net Core中间件和过滤器实现错误日志记录

    1.中间件的概念 ASP.NET Core的处理流程是一个管道,中间件是组装到应用程序管道中用来处理请求和响应的组件. 每个中间件可以: 选择是否将请求传递给管道中的下一个组件. 可以在调用管道中的下 ...

  9. Elasticsearch入门教程(三):Elasticsearch索引&映射

    原文:Elasticsearch入门教程(三):Elasticsearch索引&映射 版权声明:本文为博主原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明. 本文 ...

随机推荐

  1. mysql连接查询,封装mysql函数

    连接查询 交叉连接语法: select  * | 字段列表 from 表1  cross join 表2 内连接: select *|字段列表 from 左表 inner join 右表 on 左表. ...

  2. 习课的视频播放器 video.js

    jsp <%@ page language="java" contentType="text/html; charset=utf-8" pageEncod ...

  3. js实现表单验证 常用JS表单验证

    CSS代码 @charset "gb2312"; /* CSS Document */ body,dl,dt,dd,div,form {padding:;margin:;} #he ...

  4. LR12.53—使用HP网络导游示例应用程序

    本教程使用 的HP Web之旅,一个样本的基于Web的旅行社系统,向人们展示LoadRunner将如何作为负载测试解决方案.惠普网络旅游用户连接到Web服务器,搜索航班,预订机票,检查飞行路线. 虽然 ...

  5. 微信nickname乱码及mysql编码格式设置(utf8mb4)

    微信nickname乱码及mysql编码格式设置(utf8mb4) 今天在写微信公众平台项目时,写到一个用户管理模块,接口神马的已经调试好了,于是将用户从微信服务器保存到本地数据库,发现报错: jav ...

  6. 解决MD5问题

    使用VS时报错此实现不是 Windows 平台 FIPS 验证的加密算法的一部分. 解决方案如下:在window中打开功能里输入regedit,回车打开注册器.然后进入如下路径中 HKEY_LOCAL ...

  7. 跨域的get和post的区别

    GET和POST是HTTP请求的两种基本方法,要说它们的区别,接触过WEB开发的人都能说出一二.最直观的区别就是GET把参数包含在URL中,POST通过request body传递参数.当你在面试中被 ...

  8. 转:MyBean简介

                        (在开始之前,非常感谢 D10.天地弦) 1.1 概述 MyBean是一个用于Delphi应用程序开发的开源.轻量级.可配置插件框架.它通过巧妙的系统架构设计, ...

  9. 关键字sizeof---常年被人误认为函数

    sizeof 是关键字不是函数, sample: int  i=0; A)  sizeof(int);  B) sizeof(i);  C)sizeof  int;   D)sizeof  i; C) ...

  10. 【转】Mac系统中安装homebrew(类似redhat|Centos中的yum;类似Ubuntu中的apt-get)

    Homebrew,Homebrew简称brew,是Mac OSX上的软件包管理工具,能在Mac中方便的安装软件或者卸载软件,可以说Homebrew就是mac下的apt-get.yum神器 Homebr ...