Elasticsearch 查询语言(Query DSL)认识(一)

一、基本认识

查询子句的行为取决于

  • query context
  • filter context

也就是执行的是查询(query)还是过滤(filter)

  • query context 描述的是:被搜索的文档和查询子句的匹配程度

  • filter context 描述的是: 被搜索的文档和查询子句是否匹配

一个是匹配程度问题,一个是是否匹配的问题

二、实例

  1. 导入数据 bank account data download
  2. 将数据导入到elasticsearch
curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary "@accounts.json"
curl 'localhost:9200/_cat/indices?v'

这里有两个地方需要注意,1.host要改成符合自己的。2.早期版本中下载的数据可以能是'accounts.json?raw=true'

大概如下 curl -XPOST 'wbelk:9200/bank/account/_bulk?pretty' --data-binary "@accounts.json?raw=true"

  1. 参数认识

为了便捷操作,可以安装一个kiabna sense

$./bin/kibana plugin --install elastic/sense

$./bin/kibana
sudo -i service restart kibana(或者用这个启动kibana)

match_all 搜索,直接返回所有文档

GET /bank/_search
{
"query": {
"match_all": {}
}
}

返回大致如下:

{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1000,
"max_score": 1,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "25",
"_score": 1,
"_source": {
"account_number": 25,
"balance": 40540,
"firstname": "Virginia",
"lastname": "Ayala",
"age": 39,
"gender": "F",
"address": "171 Putnam Avenue",
"employer": "Filodyne",
"email": "virginiaayala@filodyne.com",
"city": "Nicholson",
"state": "PA"
}
},

参数大致解释:

  • took: 执行搜索耗时,毫秒为单位,也就是本文我1ms
  • time_out: 搜索是否超时
  • _shards: 多少分片被搜索,成功多少,失败多少
  • hits: 搜索结果展示
  • hits.total: 匹配条件的文档总数
  • hits.hits: 返回结果展示,默认返回十个
  • hits.max_score:最大匹配得分
  • hits._score: 返回文档的匹配得分(得分越高,匹配程度越高,越靠前)
  • _index _type _id 作为剥层定位到特定的文档
  • _source 文档源
  1. 查询语言之 执行查询
  • 只显示account_number 和 balance
POST /bank/_search
{
"query": { "match_all": {} },
"_source": ["account_number", "balance"]
}
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1000,
"max_score": 1,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "25",
"_score": 1,
"_source": {
"account_number": 25,
"balance": 40540
}
},
{
"_index": "bank",
"_type": "account",
"_id": "44",
"_score": 1,
"_source": {
"account_number": 44,
"balance": 34487
}
},
{
"_index": "bank",
"_type": "account",
"_id": "99",
"_score": 1,
"_source": {
"account_number": 99,
"balance": 47159
}
},
  • 返回accountu_number 为20的document
POST /bank/_search
{
"query": { "match": { "account_number": 20 } }
}
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 5.6587105,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "20",
"_score": 5.6587105,
"_source": {
"account_number": 20,
"balance": 16418,
"firstname": "Elinor",
"lastname": "Ratliff",
"age": 36,
"gender": "M",
"address": "282 Kings Place",
"employer": "Scentric",
"email": "elinorratliff@scentric.com",
"city": "Ribera",
"state": "WA"
}
}
]
}
}
  • 返回地址中包含(term)mill的所有账户
POST /bank/_search
{
"query": { "match": { "address": "mill" } }
}
  • 返回地址中包含term 'mill'或者 'lane'的所有账户
POST /bank/_search
{
"query": { "match": { "address": "mill lane" } }
}
  • 匹配phrase 'mill lane'
POST /bank/_search
{
"query": { "match_phrase": { "address": "mill lane" } }
}
  • 返回address包含'mill'和'lane'的所有账户 (AND)
POST /bank/_search
{
"query": {
"bool": {
"must": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
  • 返回address包含'mill'或'lane'的所有账户 (OR)
POST /bank/_search
{
"query": {
"bool": {
"should": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
  • 返回address既不包含'mill'也不包含'lane'的所有账户 (NO)
POST /bank/_search
{
"query": {
"bool": {
"must_not": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
  • 返回age为40,并且state不是ID的所有账户 (组合)
POST /bank/_search
{
"query": {
"bool": {
"must": [
{ "match": { "age": "40" } }
],
"must_not": [
{ "match": { "state": "ID" } }
]
}
}
}
  1. 查询语言之 执行过滤

过滤不会进行相关度得分的计算

  • 在所有账户中寻找balance 在29900到30000之间(闭区间)的所有账户

    (先查询到所有的账户,然后进行过滤)
POST /bank/_search
{
"query": {
"filtered": {
"query": { "match_all": {} },
"filter": {
"range": {
"balance": {
"gte": 29900,
"lte": 30000
}
}
}
}
}
}
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 1,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "243",
"_score": 1,
"_source": {
"account_number": 243,
"balance": 29902,
"firstname": "Evangelina",
"lastname": "Perez",
"age": 20,
"gender": "M",
"address": "787 Joval Court",
"employer": "Keengen",
"email": "evangelinaperez@keengen.com",
"city": "Mulberry",
"state": "SD"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "781",
"_score": 1,
"_source": {
"account_number": 781,
"balance": 29961,
"firstname": "Sanford",
"lastname": "Mullen",
"age": 26,
"gender": "F",
"address": "879 Dover Street",
"employer": "Zanity",
"email": "sanfordmullen@zanity.com",
"city": "Martinez",
"state": "TX"
}
},
...

根据返回结果我们可以看到filter得到的_score为1.不存在程度上的问题。是0和1的问题

三、query和filter效率

一般认为filter的速度快于query的速度

  • filter不会计算相关度得分,效率高
  • filter的结果可以缓存到内存中,方便再用

以bank account 数据为例,认识elasticsearch query 和 filter的更多相关文章

  1. ElasticSearch - query vs filter

    query vs filter 来自stackoverflow Stackoverflow - queries-vs-filters Question 题主希望知道Query和Filter的区别 An ...

  2. elasticsearch query 和 filter 的区别

    Query查询器 与 Filter 过滤器 尽管我们之前已经涉及了查询DSL,然而实际上存在两种DSL:查询DSL(query DSL)和过滤DSL(filter DSL).过滤器(filter)通常 ...

  3. Elasticsearch query和filter的区别

    1.关于Query context和filter context 查询语句的表现行为取决于使用了查询上下文方式还是过滤上下文方式. Query context:查询上下文,回答了“文档是如何被查询语句 ...

  4. 数据从文件导入Elasticsearch

    1.资源准备 1.数据文件:accounts.json 2.索引名称:bank 3.数据类型:account 4.批量操作API:bulk 2.导入数据 curl -XPOST 'localhost: ...

  5. [Codeforces Round #186 (Div. 2)] A. Ilya and Bank Account

    A. Ilya and Bank Account time limit per test 2 seconds memory limit per test 256 megabytes input sta ...

  6. php curl模拟post请求提交数据样例总结

    在php中要模拟post请求数据提交我们会使用到curl函数,以下我来给大家举几个curl模拟post请求提交数据样例有须要的朋友可參考參考.注意:curl函数在php中默认是不被支持的,假设须要使用 ...

  7. How To Change the Supplier Bank Account Masking in UI (Doc ID 877074.1)

      Give Feedback...           How To Change the Supplier Bank Account Masking in UI (Doc ID 877074.1) ...

  8. Pandas之:Pandas高级教程以铁达尼号真实数据为例

    Pandas之:Pandas高级教程以铁达尼号真实数据为例 目录 简介 读写文件 DF的选择 选择列数据 选择行数据 同时选择行和列 使用plots作图 使用现有的列创建新的列 进行统计 DF重组 简 ...

  9. Query DSL for elasticsearch Query

    Query DSL Query DSL (资料来自: http://www.elasticsearch.cn/guide/reference/query-dsl/) http://elasticsea ...

随机推荐

  1. 火焰图分析openresty性能瓶颈

    注:本文操作基于CentOS 系统 准备工作 用wget从https://sourceware.org/systemtap/ftp/releases/下载最新版的systemtap.tar.gz压缩包 ...

  2. jquery.uploadify文件上传组件

    1.jquery.uploadify简介 在ASP.NET中上传的控件有很多,比如.NET自带的FileUpload,以及SWFUpload,Uploadify等等,尤其后面两个控件的用户体验比较好, ...

  3. Java 线程

    线程:线程是进程的组成部分,一个进程可以拥有多个线程,而一个线程必须拥有一个父进程.线程可以拥有自己的堆栈,自己的程序计数器和自己的局部变量,但不能拥有系统资源.它与父进程的其他线程共享该进程的所有资 ...

  4. 如何利用pt-online-schema-change进行MySQL表的主键变更

    业务运行一段时间,发现原来的主键设置并不合理,这个时候,想变更主键.这种需求在实际生产中还是蛮多的. 下面,看看pt-online-schema-change解决这类问题的处理方式. 首先,创建一张测 ...

  5. windows下 安装 rabbitMQ 及操作常用命令

    rabbitMQ是一个在AMQP协议标准基础上完整的,可服用的企业消息系统.它遵循Mozilla Public License开源协议,采用 Erlang 实现的工业级的消息队列(MQ)服务器,Rab ...

  6. [译]处理文本数据(scikit-learn 教程3)

    原文网址:http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html 翻译:Tacey Won ...

  7. 关于SMARTFORMS文本编辑器出错

    最近在做ISH的一个打印功能,SMARTFORM的需求本身很简单,但做起来则一波三折. 使用环境是这样的:Windows 7 64bit + SAP GUI 740 Patch 5 + MS Offi ...

  8. iOS 10 跳转系统设置

    苦心人天不负, 为了项目终于把 iOS 10 跳转系统设置的方法给搞定了, 很欣慰. http://www.cnblogs.com/lurenq/p/6189580.html iOS 10 跳转系统设 ...

  9. MongoDB学习笔记五—查询上

    数据准备 { , "goods_name" : "KD876", "createTime" : ISODate("2016-12- ...

  10. BZOJ 3083: 遥远的国度 [树链剖分 DFS序 LCA]

    3083: 遥远的国度 Time Limit: 10 Sec  Memory Limit: 1280 MBSubmit: 3127  Solved: 795[Submit][Status][Discu ...