(转)通过HTTP RESTful API 操作elasticsearch搜索数据
样例数据集
这是编造的JSON格式银行客户账号信息文档,文档schema如下:
{
“account_number”: 0,
“balance”: 16623,
“firstname”: “Bradshaw”,
“lastname”: “Mckenzie”,
“age”: 29,
“gender”: “F”,
“address”: “244 Columbus Place”,
“employer”: “Euron”,
“email”: “bradshawmckenzie@euron.com”,
“city”: “Hobucken”,
“state”: “CO”
}
这些数据可以通过www.json-generator.com网站生成
加载样例数据集
下载样例数据集链接
解压数据到指定目录,然后加载到elasticsearch集群
绝对路径:
curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary "@/home/cluster/apps/elasticsearch/elasticsearch-1.7.2/test/accounts.json"
相对路径:
curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary "@test/accounts.json"
- 1
- 2
- 3
- 4
- 5
curl 'localhost:9200/_cat/indices?v'
结果:
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open bank 5 1 1000 0 417.1kb 417.1kb
- 1
- 2
- 3
- 4
上面结果,说明我们成功bulk 1000个文档到bank索引中了
搜索数据API
有两种方式:一种方式是通过 REST 请求 URI ,发送搜索参数;另一种是通过REST 请求体,发送搜索参数。而请求体允许你包含更容易表达和可阅读的JSON格式。
- 通过 REST 请求 URI
curl 'localhost:9200/bank/_search?q=*&pretty'
结果:
{
"took" : 63,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1000,
"max_score" : 1.0,
"hits" : [ {
"_index" : "bank",
"_type" : "account",
"_id" : "1",
"_score" : 1.0, "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
}, {
"_index" : "bank",
"_type" : "account",
"_id" : "6",
"_score" : 1.0, "_source" : {"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}
}, {
"_index" : "bank",
"_type" : "account",
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
q=*,参数告诉elasticsearch,在bank索引中匹配所有的文档
pretty,参数告诉elasticsearch,返回形式打印JSON结果
- 通过REST 请求体:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match_all": {} }
}'
结果:
{
"took" : 26,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1000,
"max_score" : 1.0,
"hits" : [ {
"_index" : "bank",
"_type" : "account",
"_id" : "1",
"_score" : 1.0, "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
}, {
"_index" : "bank",
"_type" : "account",
"_id" : "6",
"_score" : 1.0, "_source" : {"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}
}, {
"_index" : "bank",
"_type" : "account",
"_id" : "13",
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
与第一种方式不同是在URI中替代传递q=*,使用POST方式提交,请求体包含JSON格式搜索
介绍查询语言
elasticsearch提供JSON格式领域特定语言执行查询。可参考Query DSL。
{
"query": { "match_all": {} }
}
- 1
- 2
- 3
query:告诉我们定义查询
match_all:运行简单类型查询指定索引中的所有文档
除了指定查询参数,还可以指定其他参数来影响最终的结果。
- match_all & 只返回第一个文档:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match_all": {} },
"size": 1
}'
结果:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1000,
"max_score" : 1.0,
"hits" : [ {
"_index" : "bank",
"_type" : "account",
"_id" : "4",
"_score" : 1.0,
"_source":{"account_number":4,"balance":27658,"firstname":"Rodriquez","lastname":"Flores","age":31,"gender":"F","address":"986 Wyckoff Avenue","employer":"Tourmania","email":"rodriquezflores@tourmania.com","city":"Eastvale","state":"HI"}
} ]
}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
如果不指定size,默认是返回10条文档信息
- match_all & 返回11到20个文档信息
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match_all": {} },
"from": 10,
"size": 10
}'
- 1
- 2
- 3
- 4
- 5
- 6
from:指定文档索引从哪里开始,默认从0开始
size:从from开始,返回多个文档
这feature在实现分页查询很有用
- match_all and 根据account balance 降序排序 & 返回10个文档(默认10个)
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match_all": {} },
"sort": { "balance": { "order": "desc" } }
}'
- 1
- 2
- 3
- 4
- 5
执行搜索
默认的,我们搜索返回完整的JSON文档。而source(_source字段搜索点击量)。如果我们不想返回完整的JSON文档,我们可以使用source返回指定字段。
- 返回 account_number and balance:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match_all": {} },
"_source": ["account_number", "balance"]
}'
- 1
- 2
- 3
- 4
- 5
这样操作有点类似于SQL SELECT FROM field lis
match 查询,可作为基本字段搜索查询
- 返回 account_number=20:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match": { "account_number": 20 } }
}'
- 1
- 2
- 3
- 4
- 返回 address=mill:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match": { "address": "mill" } }
}'
- 1
- 2
- 3
- 4
- 返回 address=mill or address=lane:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match": { "address": "mill lane" } }
}'
- 1
- 2
- 3
- 4
- 返回 短语匹配 address=mill lane:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match_phrase": { "address": "mill lane" } }
}'
- 1
- 2
- 3
- 4
布尔值(bool)查询
- 返回 匹配address=mill & address=lane:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": {
"bool": {
"must": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}'
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
must:要求所有条件都要满足(类似于&&)
- 返回 匹配address=mill or address=lane:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": {
"bool": {
"should": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}'
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
should:任何一个满足就可以(类似于||)
- 返回 不匹配address=mill & address=lane:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": {
"bool": {
"must_not": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}'
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
must_not:所有条件都不能满足(类似于! (&&))
- 返回 age=40 & state!=ID
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": {
"bool": {
"must": [
{ "match": { "age": "40" } }
],
"must_not": [
{ "match": { "state": "ID" } }
]
}
}
}'
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
执行过滤器
文档中score(_score字段是搜索结果)。score是一个数字型的,是一种相对方法匹配查询文档结果。分数越高,搜索关键字与该文档相关性越高;越低,搜索关键字与该文档相关性越低。
在elasticsearch中所有的搜索都会触发相关性分数计算。如果我们不使用相关性分数计算,那要使用另一种查询能力,构建过滤器。
过滤器是类似于查询的概念,除了得以优化,更快的执行速度的两个主要原因:
1. 过滤器不计算得分,所以他们比执行查询的速度
2. 过滤器可缓存在内存中,允许重复搜索
为了便于理解过滤器,先介绍过滤器搜索(like match_all, match, bool, etc.),可以与其他的普通查询搜索组合一个过滤器。
range filter,允许我们通过一个范围值来过滤文档,一般用于数字或日期过滤
使用过滤器搜索返回 balances[ 20000,30000]。换句话说,balance>=20000 && balance<=30000
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": {
"filtered": {
"query": { "match_all": {} },
"filter": {
"range": {
"balance": {
"gte": 20000,
"lte": 30000
}
}
}
}
}
}'
结果:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 217,
"max_score" : 1.0,
"hits" : [ {
"_index" : "bank",
"_type" : "account",
"_id" : "4",
"_score" : 1.0,
"_source":{"account_number":4,"balance":27658,"firstname":"Rodriquez","lastname":"Flores","age":31,"gender":"F","address":"986 Wyckoff Avenue","employer":"Tourmania","email":"rodriquezflores@tourmania.com","city":"Eastvale","state":"HI"}
}, {
"_index" : "bank",
"_type" : "account",
"_id" : "9",
"_score" : 1.0,
"_source":{"account_number":9,"balance":24776,"firstname":"Opal","lastname":"Meadows","age":39,"gender":"M","address":"963 Neptune Avenue","employer":"Cedward","email":"opalmeadows@cedward.com","city":"Olney","state":"OH"}
}, {
"_index" : "bank",
"_type" : "account",
"_id" : "11",
"_score" : 1.0,
"_source":{"account_number":11,"balance":20203,"firstname":"Jenkins","lastname":"Haney","age":20,"gender":"M","address":"740 Ferry Place","employer":"Qimonk","email":"jenkinshaney@qimonk.com","city":"Steinhatchee","state":"GA"}
}, {
"_index" : "bank",
"_type" : "account",
"_id" : "42",
"_score" : 1.0,
"_source":{"account_number":42,"balance":21137,"firstname":"Harding","lastname":"Hobbs","age":26,"gender":"F","address":"474 Ridgewood Place","employer":"Xth","email":"hardinghobbs@xth.com","city":"Heil","state":"ND"}
}, {
"_index" : "bank",
"_type" : "account",
"_id" : "54",
"_score" : 1.0,
"_source":{"account_number":54,"balance":23406,"firstname":"Angel","lastname":"Mann","age":22,"gender":"F","address":"229 Ferris Street","employer":"Amtas","email":"angelmann@amtas.com","city":"Calverton","state":"WA"}
}, {
"_index" : "bank",
"_type" : "account",
"_id" : "66",
"_score" : 1.0,
"_source":{"account_number":66,"balance":25939,"firstname":"Franks","lastname":"Salinas","age":28,"gender":"M","address":"437 Hamilton Walk","employer":"Cowtown","email":"frankssalinas@cowtown.com","city":"Chase","state":"VT"}
}, {
"_index" : "bank",
"_type" : "account",
"_id" : "92",
"_score" : 1.0,
"_source":{"account_number":92,"balance":26753,"firstname":"Gay","lastname":"Brewer","age":34,"gender":"M","address":"369 Ditmars Street","employer":"Savvy","email":"gaybrewer@savvy.com","city":"Moquino","state":"HI"}
}, {
"_index" : "bank",
"_type" : "account",
"_id" : "100",
"_score" : 1.0,
"_source":{"account_number":100,"balance":29869,"firstname":"Madden","lastname":"Woods","age":32,"gender":"F","address":"696 Ryder Avenue","employer":"Slumberia","email":"maddenwoods@slumberia.com","city":"Deercroft","state":"ME"}
}, {
"_index" : "bank",
"_type" : "account",
"_id" : "105",
"_score" : 1.0,
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
过滤查询包含match_all查询(查询部分)和一系列过滤(过滤部分)。可以代替任何其他查询到查询部分以及其他过滤器过滤部分。在上述情况下,过滤器范围智能,因为文档落入range所有匹配“平等”,即。比另一个更相关,没有文档。
一般情况,最明智的方式决定是否使用filter or query,就看你是否关心相关性分数。如果相关性不重要,那就使用filter,否则就使用query。
queries and filters很类似于关系型数据库中的 “SELECT WHERE clause”
执行聚合
聚合提供从你的数据中分组和提取统计能力。
类似于关系型数据中的SQL GROUP BY和SQL 聚合函数。
在Elasticsearch,你有能力执行搜索返回命中结果,同时拆分命中结果,然后统一返回结果。当你使用简单的API运行搜索和多个聚合,然后返回所有结果避免网络带宽过大的情况是高效的。
- 根据state分组,降序统计top 10 state
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state"
}
}
}
}'
结果:
"hits" : {
"total" : 1000,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"group_by_state" : {
"buckets" : [ {
"key" : "al",
"doc_count" : 21
}, {
"key" : "tx",
"doc_count" : 17
}, {
"key" : "id",
"doc_count" : 15
}, {
"key" : "ma",
"doc_count" : 15
}, {
"key" : "md",
"doc_count" : 15
}, {
"key" : "pa",
"doc_count" : 15
}, {
"key" : "dc",
"doc_count" : 14
}, {
"key" : "me",
"doc_count" : 14
}, {
"key" : "mo",
"doc_count" : 14
}, {
"key" : "nd",
"doc_count" : 14
} ]
}
}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
类似于关系型数据库
SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC
- 1
size=0 不是展示搜索结果命中数,因为我只是想要看聚合结果
- 根据state计算账户平均balance,降序统计top 10 state
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state"
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}'
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
注意嵌套average_balance聚合group_by_state内聚合。这是一个常见的模式,所有的聚合。您可以嵌套内聚合聚合任意提取旋转汇总时,你需要从你的数据。
- 降序排序平均 balance:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state",
"order": {
"average_balance": "desc"
}
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}'
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 聚合年龄分区间(ages 20-29, 30-39, and 40-49),聚合性别,最后平均balance 展示最终结果
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"size": 0,
"aggs": {
"group_by_age": {
"range": {
"field": "age",
"ranges": [
{
"from": 20,
"to": 30
},
{
"from": 30,
"to": 40
},
{
"from": 40,
"to": 50
}
]
},
"aggs": {
"group_by_gender": {
"terms": {
"field": "gender"
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
}
}'
(转)通过HTTP RESTful API 操作elasticsearch搜索数据的更多相关文章
- Elasticsearch 搜索数据
章节 Elasticsearch 基本概念 Elasticsearch 安装 Elasticsearch 使用集群 Elasticsearch 健康检查 Elasticsearch 列出索引 Elas ...
- 基于jeesite的cms系统(三):使用RESTful API在前端渲染数据
使用RESTful API可以更好的开发前后分离的应用,后面一节会介绍使用模版引擎Beetl开发后端渲染的应用. 一.配置Swagger(Api 接口文档) 1.使用系统自带 拷贝jeesite-mo ...
- ElasticSearch搜索数据到底有几种方式?
Elasticsearch允许三种方式执行搜索请求: GET请求正文: curl -XGET "http://localhost:9200/app/users/_search" - ...
- 可以执行全文搜索的原因 Elasticsearch full-text search Kibana RESTful API with JSON over HTTP elasticsearch_action es 模糊查询
https://www.elastic.co/guide/en/elasticsearch/guide/current/getting-started.html Elasticsearch is a ...
- 使用Java操作Elasticsearch(Elasticsearch的java api使用)
1.Elasticsearch是基于Lucene开发的一个分布式全文检索框架,向Elasticsearch中存储和从Elasticsearch中查询,格式是json. 索引index,相当于数据库中的 ...
- Elasticsearch入坑指南之RESTful API
Elasticsearch入坑指南之RESTful API Tags:Elasticsearch ES为开发者提供了非常丰富的基于Http协议的Rest API,通过简单的Rest请求,就可以实现非常 ...
- Elasticsearch 搜索API
章节 Elasticsearch 基本概念 Elasticsearch 安装 Elasticsearch 使用集群 Elasticsearch 健康检查 Elasticsearch 列出索引 Elas ...
- elasticsearch(3) 数据操作-更新
一 更新整个文档 更新整个文档的方法和存放数据的方式是相同的,通过PUT 127.0.0.1/test/test/1 我们可以把test/test/1下的文档更新为新的文档 例: PUT 127.0 ...
- httpclient连接池在ES Restful API请求中的应用
package com.wm.utils; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http ...
随机推荐
- 浅析Linux内核同步机制
非常早之前就接触过同步这个概念了,可是一直都非常模糊.没有深入地学习了解过,最近有时间了,就花时间研习了一下<linux内核标准教程>和<深入linux设备驱动程序内核机制>这 ...
- hdu 5038 水题 可是题意坑
http://acm.hdu.edu.cn/showproblem.php?pid=5038 就是求个众数 这个范围小 所以一个数组存是否存在的状态即可了 可是这句话真恶心 If not all ...
- X-code 描述文件的位置
不管是真机测试还是打包的过程中,都需要描述文件.在桌面上,按快捷键“commd+Shift+G”,就会显示一个要填的文件路径,如下图: 找到描述文件的路径: ~/Library/MobileDevic ...
- CentOS-6.3安装配置Nginx--【测试已OK】
安装说明 系统环境:CentOS-6.3软件:nginx-1.2.6.tar.gz安装方式:源码编译安装 安装位置:/usr/local/nginx 下载地址:http://nginx.org/en/ ...
- ReactiveCocoa - iOS开发的新框架
本文转载至 http://www.infoq.com/cn/articles/reactivecocoa-ios-new-develop-framework ReactiveCocoa(其简称为RAC ...
- 《C++ Primer Plus》第14章 C++中的代码重用 学习笔记
C++提供了集中重用代码的手段.第13章介绍的共有继承能够建立is-a关系,这样派生类可以重用基类的代码.私有继承和保护继承也使得能够重用基类的代码,单建立的是has-a关系.使用私有继承时,积累的公 ...
- c++11——基于范围的for循环
c++11中有基于范围的for循环,基于范围的for循环可以不再关心迭代器的概念,只需要关系容器中的元素类型即可,同时也不必显式的给出容器的开头和结尾. int arr[] = {1, 2, 3, 4 ...
- MQTT协议笔记之mqtt.io项目Websocket协议支持
前言 MQTT协议专注于网络.资源受限环境,建立之初不曾考虑WEB环境,倒也正常.虽然如此,但不代表它不适合HTML5环境. HTML5 Websocket是建立在TCP基础上的双通道通信,和TCP通 ...
- poj3734 Blocks[矩阵优化dp or 组合数学]
Blocks Time Limit: 1000MS Memory Limit: 65536K Total Submissions: 6578 Accepted: 3171 Descriptio ...
- Nmap的活跃主机探测常见方法
最近由于工作需求,开始对Nmap进行一点研究,主要是Nmap对于主机活跃性的探测,也就是存活主机检测的领域. Nmap主机探测方法一:同网段优先使用arp探测: 当启动Namp主机活跃扫描时候,Nma ...