Elastic Search笔记

1.简介
2.概念和工具使用
- 2.1 基本概念
- 2.2 使用kibana
3.操作索引和数据
4. 搜索
5. 聚合

1.简介

Elastic Search是一个分布式的全文检索工具，可以用在商城中检索商品信息等。

接下来介绍本文需要用的三个工具，这三个工具版本号要相等，我选用的全部是6.8版本。

Elastic Search本体

linux版下载路径：https://www.elastic.co/cn/downloads/past-releases/elasticsearch-6-8-0
kibana

一个强大的可视化工具，基于Node.js，可以用来发送Rest请求，搜索数据可视化等。

下载路径(win版)：https://www.elastic.co/cn/downloads/past-releases/kibana-6-8-0
ik分词器

分词工具，可以把“我是中国人”分成["我","是","中国人","中国","国人"]，即把一段话分成一个个词语，

在用户在搜索句子时匹配对应的分词。

下载路径：https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v6.8.0

安装过程不再赘述，注意一点：ES的客户端端口号默认是9200，集群节点通信端口是9300，安装完后要打开linux防火墙上的对应端口。

2.概念和工具使用

2.1 基本概念

阮一峰的博客里面讲的比较好：http://www.ruanyifeng.com/blog/2017/08/elasticsearch.html

我自己的理解

ES的存储结构：索引下面包含许多文档，每个文档就是一条数据，类型把文档逻辑分组。

搜索就是根据过滤条件查询文档的过程。

我对类型的概念不是特别理解，接下来实践后再说，另外这个概念在ES7以后要被废除掉。

除了Node、Cluster、Index、Document、Type这几个概念，还有分片（shard）和副本（replica）两个概念。

分片就是把整体的数据切割成几片，目的是为了在数据量大的时候分流；

副本是每个分片的备份，如果出现意外导致数据丢失，还能指望一下备份。

下图表示：3个分片，每个分片有1个副本。

2.2 使用kibana

ES的API都是Rest风格的，请求和响应都是json格式。

kibana中提供了开发工具，可以很方便地发送请求，接收数据，还有语法提示：

接下来的演示都是在kibana中操作的。

3.操作索引和数据

2.3 索引

2.3.1 创建索引

使用PUT请求创建一个名叫test_index的索引，有3个分片，2个副本。

PUT test_index

{

  "settings": {

    "number_of_shards": 3,

    "number_of_replicas": 2

  }

}

图示：

2.3.2 查看索引设置

使用 GET 索引名即可查询

GET test_index

结果图示，可以看到创建时间，分片和副本信息等。

2.3.3 删除索引

使用 DELETE 索引名即可删除索引

DELETE test_index

2.4 索引映射到文档

2.4.1 创建映射

语法

PUT /索引库名/_mapping/类型名称

{

  "properties": {

    "字段名": {

      "type": "类型",

      "index": true，

      "store": true，

      "analyzer": "分词器"

    }

  }

}

类型名称：相当于把文档逻辑分组。
字段名：文档中的字段名，比如title、price等等。
type：字段类型，比如text、long、integer、object等等。
index：是否索引，默认为true
store：是否存储，默认为false
analyzer：分词器类型。

示例：

PUT test_index/_mapping/goods

{

  "properties": {

    "title": {

      "type": "text",

      "analyzer": "ik_max_word"

    },

    "images": {

      "type": "keyword",

      "index": "false"

    },

    "price": {

      "type": "float"

    }

  }

}

2.4.2 查看映射

发送请求

GET /test_index/_mapping

得到结果

{

  "test_index" : {

    "mappings" : {

      "goods" : {

        "properties" : {

          "images" : {

            "type" : "keyword",

            "index" : false

          },

          "price" : {

            "type" : "float"

          },

          "title" : {

            "type" : "text",

            "analyzer" : "ik_max_word"

          }

        }

      }

    }

  }

}

2.4.3 字段属性详解

2.4.3.1 type

String类型，又分两种：
- text：可分词，不可参与聚合
- keyword：不可分词，数据会作为完整字段进行匹配，可以参与聚合
Numerical：数值类型，分两类
- 基本数据类型：long、integer、short、byte、double、float、half_float
- 浮点数的高精度类型：scaled_float
  - 需要指定一个精度因子，比如10或100。ES会把真实值乘以这个因子后存储，取出时再还原。
Date：日期类型

ES可以对日期格式化为字符串存储，但是建议我们存储为毫秒值，存储为long，节省空间。
如果是对象

比如{girl:{name:"rose", age:21}}，会处理成两个字段girl.name,girl.age

2.4.3.2 index

index影响字段的索引情况。

true：字段会被索引，则可以用来进行搜索。默认值就是true
false：字段不会被索引，不能用来搜索

index的默认值就是true，也就是说你不进行任何配置，所有字段都会被索引。

但是有些字段是我们不希望被索引的，比如商品的图片信息，就需要手动设置index为false。

值得注意的一个问题，不能用来搜索的字段，存在ES中用来干嘛呢？

是不是就把ES当成数据库了，查出来的数据要直接能用。

2.4.3.3 store

Elasticsearch在创建文档索引时，会将文档中的原始数据备份，保存到一个叫做_source的属性中。而且我们可以通过过滤_source来选择哪些要显示，哪些不显示。

而如果设置store为true，就会在_source以外额外存储一份数据，多余，因此一般我们都会将store设置为false，事实上，store的默认值就是false。

2.5 新增数据

格式如下，如果不定义ID，则会创建一个随机ID

POST /索引/类型/ID

{

    文档内容

}

示例

POST /test_index/goods/1

{

    "title":"小米手机",

    "images":"http://image.leyou.com/12479122.jpg",

    "price":2699.00

}

2.5.1 智能添加字段

如果添加的数据中有未定义的字段，ES会自动添加

POST /test_index/goods/100

{

    "title":"超米手机",

    "images":"http://image.leyou.com/12479122.jpg",

    "price":2899.00,

    "stock": 200,

    "saleable":true

}

添加后查询结果如图所示：

添加的字段不会影响到其他已经存在的数据的_source

但是索引的映射结构会变化

存入的数据包含什么字段，_source就会包含什么字段。

2.6 修改数据

发送方式改为PUT，指定Id即可修改数据

id对应文档存在，则修改
id对应文档不存在，则新增

示例

PUT /test_index/goods/100

{

    "title":"超级大米手机",

    "images":"http://image.leyou.com/12479122.jpg",

    "price":2899.00,

    "stock": 200

}

2.7 删除数据

语法

DELETE /索引库名/类型名/id值

例子

DELETE /test_index/goods/3

4. 搜索

先存点数据：

POST /test_index/goods/1

{

    "title":"小米手机",

    "images":"http://image.leyou.com/12479122.jpg",

    "price":2699.00

}

POST /test_index/goods/2

{

    "title":"大米手机",

    "images":"http://image.leyou.com/12479122.jpg",

    "price":2899.00

}

PUT /test_index/goods/3

{

    "title":"小米电视4A",

    "images":"http://image.leyou.com/12479122.jpg",

    "price":3899.00

}

接下来开始重头戏：花式查询

4.1 查询

4.1.1 match_all(查询所有)

GET /test_index/_search

{

    "query":{

        "match_all": {}

    }

}

结果

{

  "took" : 0,

  "timed_out" : false,

  "_shards" : {

    "total" : 3,

    "successful" : 3,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : 3,

    "max_score" : 1.0,

    "hits" : [

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "2",

        "_score" : 1.0,

        "_source" : {

          "title" : "大米手机",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 2899.0

        }

      },

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "1",

        "_score" : 1.0,

        "_source" : {

          "title" : "小米手机",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 2699.0

        }

      },

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "3",

        "_score" : 1.0,

        "_source" : {

          "title" : "小米电视4A",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 3899.0

        }

      }

    ]

  }

}

注意结果里面的_score,这是文档相关性得分，得分越高说明越符合搜索条件。

4.1.2 match(匹配查询)

单字段查询：OR关系

把小米电视分成小米和电视两个词分别查询，多个词语的查询条件是or的关系

相当于title like '%小米%' or title like '%电视%'

关键词命中越多，搜索得分越高，结果越靠前

GET /test_index/_search

{

    "query":{

        "match":{

            "title":"小米电视"

        }

    }

}

结果

{

  "took" : 0,

  "timed_out" : false,

  "_shards" : {

    "total" : 3,

    "successful" : 3,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : 2,

    "max_score" : 0.77041245,

    "hits" : [

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "3",

        "_score" : 0.77041245,

        "_source" : {

          "title" : "小米电视4A",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 3899.0

        }

      },

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "1",

        "_score" : 0.21110918,

        "_source" : {

          "title" : "小米手机",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 2699.0

        }

      }

    ]

  }

}

单字段查询：AND关系

小米和电视两个词，查询条件用and组合起来，相当于title like '%小米%' and title like '%电视%'

GET /test_index/_search

{

    "query":{

        "match": {

          "title": {

            "query": "小米电视",

            "operator": "and"

          }

        }

    }

}

结果

{

  "took" : 1,

  "timed_out" : false,

  "_shards" : {

    "total" : 3,

    "successful" : 3,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : 1,

    "max_score" : 0.77041245,

    "hits" : [

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "3",

        "_score" : 0.77041245,

        "_source" : {

          "title" : "小米电视4A",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 3899.0

        }

      }

    ]

  }

}

单字段查询：匹配度

“小米曲面电视” 在ik_max_word的设置下，会被分为小米、曲面、电视三个词。

如果需要查到能够匹配其中两个词语的结果，设置匹配度>=(2/3)即可。

实验表明，设置67%的查询结果为“小米电视4A”；66%的查询结果为“小米电视4A”和“小米手机”

GET /test_index/_search

{

    "query":{

        "match":{

            "title":{

            	"query":"小米曲面电视",

            	"minimum_should_match": "67%"

            }

        }

    }

}

结果

{

  "took" : 1,

  "timed_out" : false,

  "_shards" : {

    "total" : 3,

    "successful" : 3,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : 1,

    "max_score" : 0.77041245,

    "hits" : [

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "3",

        "_score" : 0.77041245,

        "_source" : {

          "title" : "小米电视4A",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 3899.0

        }

      }

    ]

  }

}

4.1.3 multi_match(多字段查询)

在title和subTitle两个字段都匹配

GET /test_index/_search

{

    "query":{

        "multi_match": {

            "query":    "小米",

            "fields":   [ "title", "subTitle" ]

        }

	}

}

4.1.4 term(精确匹配)

查询price=2699.00的数据

GET /test_index/_search

{

    "query":{

        "term":{

            "price":2699.00

        }

    }

}

4.1.5 terms(多词条精确匹配)

查询price=数组中的任何一个数字的结果

GET /test_index/_search

{

    "query":{

        "terms":{

            "price":[2699.00,2899.00,3899.00]

        }

    }

}

4.1.6 bool(布尔查询)

must 与
must_not 非
should 或

下面的查询是要找：title字段中必须包含“大米”，必须不包含“电视”，可以包含“手机”的结果。

GET /test_index/_search

{

    "query":{

        "bool":{

        	"must":     { "match": { "title": "大米" }},

        	"must_not": { "match": { "title":  "电视" }},

        	"should":   { "match": { "title": "手机" }}

        }

    }

}

结果

{

  "took" : 0,

  "timed_out" : false,

  "_shards" : {

    "total" : 3,

    "successful" : 3,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : 1,

    "max_score" : 0.5753642,

    "hits" : [

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "2",

        "_score" : 0.5753642,

        "_source" : {

          "title" : "大米手机",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 2899.0

        }

      }

    ]

  }

}

4.1.7 range(范围查询)

一般是数值和时间范围的查询

操作符	说明
gt	大于
gte	大于等于
lt	小于
lte	小于等于

例子：查询price>=1000 and price < 2800的结果

GET /test_index/_search

{

    "query":{

        "range": {

            "price": {

                "gte":  1000.0,

                "lt":   2800.00

            }

    	}

    }

}

4.1.8 fuzzy(模糊查询)

允许输入内容有些偏差，但还能返回正确的结果：比如输入了appla却能够查到apple

例子：

新增商品“apple手机”

POST /test_index/goods/4

{

    "title":"apple手机",

    "images":"http://image.leyou.com/12479122.jpg",

    "price":6899.00

}

模糊查询，设置偏移量为2，即偏差<=2

GET /test_index/_search

{

  "query": {

    "fuzzy": {

        "title": {

            "value":"appla",

            "fuzziness":2

        }

    }

  }

}

结果是能找到apple手机。

4.1.9 结果字段的显示

指定返回结果的字段为title和price

GET /test_index/_search

{

  "_source": ["title","price"],

  "query": {

    "term": {

      "price": 2699

    }

  }

}

返回结果

{

  "took" : 0,

  "timed_out" : false,

  "_shards" : {

    "total" : 3,

    "successful" : 3,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : 1,

    "max_score" : 1.0,

    "hits" : [

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "1",

        "_score" : 1.0,

        "_source" : {

          "price" : 2699.0,

          "title" : "小米手机"

        }

      }

    ]

  }

}

指定includes和excludes

includes指定包含的字段，excludes指定要排除的字段。

例如

GET /test_index/_search

{

  "_source": {

    "includes":["title","price"]

  },

  "query": {

    "term": {

      "price": 2699

    }

  }

}

GET /test_index/_search

{

  "_source": {

    "excludes": ["images"]

  },

  "query": {

    "term": {

      "price": 2699

    }

  }

}

4.2 过滤

查询和过滤有何区别？

参考这篇博客:[https://blog.csdn.net/en_joker/article/details/78017306

所有的查询都会影响到文档的评分及排名。如果我们需要在查询结果中进行过滤，并且不希望过滤条件影响评分，那么就不要把过滤条件作为查询条件来用。而是使用filter方式：

GET /test_index/_search

{

    "query":{

        "bool":{

        	"must":{ "match": { "title": "小米手机" }},

        	"filter":{

                "range":{"price":{"gt":2000.00,"lt":3800.00}}

        	}

        }

    }

}

注意：filter中还可以再次进行bool组合条件过滤。

如果一次查询只有过滤，没有查询条件，不希望进行评分，我们可以使用constant_score取代只有 filter 语句的 bool 查询。在性能上是完全相同的，但对于提高查询简洁性和清晰度有很大帮助。

GET /test_index/_search

{

    "query":{

        "constant_score":   {

            "filter": {

            	 "range":{"price":{"gt":2000.00,"lt":3000.00}}

            }

        }

}

查询相比于过滤，最重要的特点是：关注相关性

4.3 排序

4.3.1 单字段排序

GET /test_index/_search

{

  "query": {

    "match": {

      "title": "小米手机"

    }

  },

  "sort": [

    {

      "price": {

        "order": "desc"

      }

    }

  ]

}

结果

{

  "took" : 3,

  "timed_out" : false,

  "_shards" : {

    "total" : 3,

    "successful" : 3,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : 4,

    "max_score" : null,

    "hits" : [

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "4",

        "_score" : null,

        "_source" : {

          "title" : "apple手机",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 6899.0

        },

        "sort" : [

          6899.0

        ]

      },

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "3",

        "_score" : null,

        "_source" : {

          "title" : "小米电视4A",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 3899.0

        },

        "sort" : [

          3899.0

        ]

      },

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "2",

        "_score" : null,

        "_source" : {

          "title" : "大米手机",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 2899.0

        },

        "sort" : [

          2899.0

        ]

      },

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "1",

        "_score" : null,

        "_source" : {

          "title" : "小米手机",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 2699.0

        },

        "sort" : [

          2699.0

        ]

      }

    ]

  }

}

4.3.2 多字段排序

查询结果先按照价格排序，再按照相关性得分排序

GET /test_index/_search

{

    "query":{

        "bool":{

        	"must":{ "match": { "title": "小米手机" }},

        	"filter":{

                "range":{"price":{"gt":2,"lt":300000}}

        	}

        }

    },

    "sort": [

      { "price": { "order": "desc" }},

      { "_score": { "order": "desc" }}

    ]

}

结果不再展示

5. 聚合

聚合可以让我们极其方便的实现对数据的统计、分析。例如：

什么品牌的手机最受欢迎？
这些手机的平均价格、最高价格、最低价格？
这些手机每月的销售情况如何？

实现这些统计功能的比数据库的sql要方便的多，而且查询速度非常快，可以实现实时搜索效果。

5.1 基本概念

桶（bucket）

桶的作用，是按照某种方式对数据进行分组，每一组数据在ES中称为一个桶，例如我们根据国籍对人划分，可以得到中国桶、英国桶，日本桶……或者我们按照年龄段对人进行划分：0_10,1020,20_30,3040等。

Elasticsearch中提供的划分桶的方式有很多：

Date Histogram Aggregation：根据日期阶梯分组，例如给定阶梯为周，会自动每周分为一组
Histogram Aggregation：根据数值阶梯分组，与日期类似
Terms Aggregation：根据词条内容分组，词条内容完全匹配的为一组
Range Aggregation：数值和日期的范围分组，指定开始和结束，然后按段分组
……

度量（metrics）

分组完成以后，我们一般会对组中的数据进行聚合运算，例如求平均值、最大、最小、求和等，这些在ES中称为度量

比较常用的一些度量聚合方式：

Avg Aggregation：求平均值
Max Aggregation：求最大值
Min Aggregation：求最小值
Percentiles Aggregation：求百分比
Stats Aggregation：同时返回avg、max、min、sum、count等
Sum Aggregation：求和
Top hits Aggregation：求前几
Value Count Aggregation：求总数
……

注意：在ES中，需要进行聚合、排序、过滤的字段其处理方式比较特殊，不能被分词。比如字符串的类型必须为keyword，而不是text，因为text能被分词。

5.2 导入数据

导入汽车销售统计数据

先创建索引

PUT /cars

{

  "settings": {

    "number_of_shards": 1,

    "number_of_replicas": 0

  },

  "mappings": {

    "transactions": {

      "properties": {

        "color": {

          "type": "keyword"

        },

        "make": {

          "type": "keyword"

        }

      }

    }

  }

}

批量导入数据

POST /cars/transactions/_bulk

{ "index": {}}

{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }

{ "index": {}}

{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }

{ "index": {}}

{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }

{ "index": {}}

{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }

{ "index": {}}

{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }

{ "index": {}}

{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }

{ "index": {}}

{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }

{ "index": {}}

{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }

5.3 聚合为桶

下面的例子演示：统计每种颜色的汽车销量。

GET /cars/_search

{

    "size" : 0,

    "aggs" : {

        "popular_colors" : {

            "terms" : {

              "field" : "color"

            }

        }

    }

}

size：查询条数，这里设置为0，因为我们不关心搜索到的数据，只关心聚合结果，提高效率
aggs：声明这是一个聚合查询，是aggregations的缩写
- popular_colors：给这次聚合起一个名字，任意。
  - terms：划分桶的方式，这里是根据词条划分
    - field：划分桶的字段

结果

{

  "took" : 0,

  "timed_out" : false,

  "_shards" : {

    "total" : 1,

    "successful" : 1,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : 8,

    "max_score" : 0.0,

    "hits" : [ ]

  },

  "aggregations" : {

    "popular_colors" : {

      "doc_count_error_upper_bound" : 0,

      "sum_other_doc_count" : 0,

      "buckets" : [

        {

          "key" : "red",

          "doc_count" : 4

        },

        {

          "key" : "blue",

          "doc_count" : 2

        },

        {

          "key" : "green",

          "doc_count" : 2

        }

      ]

    }

  }

}

hits：查询结果为空，因为我们设置了size为0
aggregations：聚合的结果
popular_colors：我们定义的聚合名称
buckets：查找到的桶，每个不同的color字段值都会形成一个桶
- key：这个桶对应的color字段的值
- doc_count：这个桶中的文档数量

观察结果可以发现红色小车最畅销。

5.4 桶内度量

5.3中只是对数据进行了聚合操作，但通常在聚合之后还要进行度量，比如查询每种颜色的车的价格平均值。

发送请求

GET /cars/_search

{

    "size" : 0,

    "aggs" : {

        "popular_colors" : {

            "terms" : {

              "field" : "color"

            },

            "aggs":{

                "avg_price": {

                   "avg": {

                      "field": "price"

                   }

                }

            }

        }

    }

}

得到结果

{

  "took" : 0,

  "timed_out" : false,

  "_shards" : {

    "total" : 1,

    "successful" : 1,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : 8,

    "max_score" : 0.0,

    "hits" : [ ]

  },

  "aggregations" : {

    "popular_colors" : {

      "doc_count_error_upper_bound" : 0,

      "sum_other_doc_count" : 0,

      "buckets" : [

        {

          "key" : "red",

          "doc_count" : 4,

          "avg_price" : {

            "value" : 32500.0

          }

        },

        {

          "key" : "blue",

          "doc_count" : 2,

          "avg_price" : {

            "value" : 20000.0

          }

        },

        {

          "key" : "green",

          "doc_count" : 2,

          "avg_price" : {

            "value" : 21000.0

          }

        }

      ]

    }

  }

}

5.5 桶嵌套桶

在5.4统计条件的基础上，增加聚合条件：查询每种颜色的汽车分别都是哪几个品牌。

GET /cars/_search

{

    "size" : 0,

    "aggs" : {

        "popular_colors" : {

            "terms" : {

              "field" : "color"

            },

            "aggs":{

                "avg_price": {

                   "avg": {

                      "field": "price"

                   }

                },

                "maker":{

                    "terms":{

                        "field":"make"

                    }

                }

            }

        }

    }

}

结果

{

  "took" : 1,

  "timed_out" : false,

  "_shards" : {

    "total" : 1,

    "successful" : 1,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : 8,

    "max_score" : 0.0,

    "hits" : [ ]

  },

  "aggregations" : {

    "popular_colors" : {

      "doc_count_error_upper_bound" : 0,

      "sum_other_doc_count" : 0,

      "buckets" : [

        {

          "key" : "red",

          "doc_count" : 4,

          "maker" : {

            "doc_count_error_upper_bound" : 0,

            "sum_other_doc_count" : 0,

            "buckets" : [

              {

                "key" : "honda",

                "doc_count" : 3

              },

              {

                "key" : "bmw",

                "doc_count" : 1

              }

            ]

          },

          "avg_price" : {

            "value" : 32500.0

          }

        },

        {

          "key" : "blue",

          "doc_count" : 2,

          "maker" : {

            "doc_count_error_upper_bound" : 0,

            "sum_other_doc_count" : 0,

            "buckets" : [

              {

                "key" : "ford",

                "doc_count" : 1

              },

              {

                "key" : "toyota",

                "doc_count" : 1

              }

            ]

          },

          "avg_price" : {

            "value" : 20000.0

          }

        },

        {

          "key" : "green",

          "doc_count" : 2,

          "maker" : {

            "doc_count_error_upper_bound" : 0,

            "sum_other_doc_count" : 0,

            "buckets" : [

              {

                "key" : "ford",

                "doc_count" : 1

              },

              {

                "key" : "toyota",

                "doc_count" : 1

              }

            ]

          },

          "avg_price" : {

            "value" : 21000.0

          }

        }

      ]

    }

  }

}

可以看出来，红色车里面本田车最多。

5.6 其他划分桶的方式

前面是根据词条内容划分桶，还有很多其他的分桶方式，比如

Histogram（柱状图）分桶

直方图的X轴是按照固定间隔分开的，因此我们需要一个阶梯值（interval）来指定这个固定间隔。

示例

对汽车价格进行分组，指定间隔为5000，并且不显示统计数量为0的桶。

GET /cars/_search

{

  "size":0,

  "aggs":{

    "price":{

      "histogram": {

        "field": "price",

        "interval": 5000,

        "min_doc_count": 1

      }

    }

  }

}

结果

{

  "took" : 0,

  "timed_out" : false,

  "_shards" : {

    "total" : 1,

    "successful" : 1,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : 8,

    "max_score" : 0.0,

    "hits" : [ ]

  },

  "aggregations" : {

    "price" : {

      "buckets" : [

        {

          "key" : 10000.0,

          "doc_count" : 2

        },

        {

          "key" : 15000.0,

          "doc_count" : 1

        },

        {

          "key" : 20000.0,

          "doc_count" : 2

        },

        {

          "key" : 25000.0,

          "doc_count" : 1

        },

        {

          "key" : 30000.0,

          "doc_count" : 1

        },

        {

          "key" : 80000.0,

          "doc_count" : 1

        }

      ]

    }

  }

}

价格统计为：

[10000, 15000) : 2个
[15000, 20000) : 1个
[20000, 25000) : 2个
[25000, 30000) : 1个
[30000, 80000) : 1个
[80000, ...) : 1个