第一节 ElasticSearch概述

ElasticSearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎，基于RESTfull web接口。ElasticSearch是用java开发的，设计用户云计算当中，能够达到实时搜索。

概述：ElasticSearch是基于RESTfulweb标准的高扩展高可用性的实时数据分析的全文搜索工具

1.1 ElasticSearch的基本概念

Index

类似于MySQL数据库中的database
Type

类似于MySQL数据库中的table表，es中可以在Index中建立type(table),通过mapping进行映射
Document

由于es中存储的数据是文档型的，一条数据对应一篇文档，即相当于MySQL数据库中的一行数据row,一个文档可以有多个字段也就是MySQL数据库一行可以有多个列。 Filed es中一个文档对应的多个列与MySQL数据库汇总每一列对应
Mapping

可以理解为MySQL或solr中对应的schema，只不过有些时候es中的mapping增加了动态识别功能，感觉很强大的样子，但是生产环境不建议使用，最好还是开始制定好了对应的schema为主
indexed

就是名义上的建议索引。mysql中一般会对经常使用的列增加相应的索引用于提高查询速度，而在es中默认都是会加上索引的，除非你特殊制定不建立索引只是进行存储用于展示，这个需要看具体的需求和业务而定了
Query DSL

类似于mysql的sql语句，只不过在es中使用的json格式的查询语句，专业术语就叫：Query DSL
GET/PUT/POST/DELETE

分别类似于mysql中的select/update/delete...

1.2 RESTfull API

一种软件架构风格，设计风格，而不是标准。主要用户客户端和服务器交互类的软件，基于这个风格设计的软件可以更简洁，更有层次，更易于实现缓存等机制。

它使用典型的HTTP方法，诸如GET,POST,DELETE,PUT来实现资源的获取，添加，修改，删除等操作，即通过HTTP动词开实现资源的状态扭转

GET:用来获取资源
POST:用来新建资源(也可以用于更新资源)
PUT:用来更新资源
DELETE:用来删除资源

1.3 curl命令

以命令的方式执行HTTP协议的请求，GET/POST/DELETE/PUT

示例：

访问一个网页

curl www.baidu.com

保存网页内容到文件

curl -o tt.html www.baidu.com

显示响应的头信息

curl -i www.baidu.com

显示一次HTTP请求的通信过程

curl -v www.baidu.com

使用curl执行GET/POST/DELETE/PUT等操作

curl -X GET/POST/DELETE/PUT www.baidu.com

curl命令帮助

[root@localhost tmp]# curl --help

Usage: curl [options...] <url>

Options: (H) means HTTP/HTTPS only, (F) means FTP only

     --anyauth       Pick "any" authentication method (H)

 -a, --append        Append to target file when uploading (F/SFTP)

     --basic         Use HTTP Basic Authentication (H)

     --cacert FILE   CA certificate to verify peer against (SSL)

     --capath DIR    CA directory to verify peer against (SSL)

 -E, --cert CERT[:PASSWD] Client certificate file and password (SSL)

     --cert-type TYPE Certificate file type (DER/PEM/ENG) (SSL)

     --ciphers LIST  SSL ciphers to use (SSL)

     --compressed    Request compressed response (using deflate or gzip)

 -K, --config FILE   Specify which config file to read

     --connect-timeout SECONDS  Maximum time allowed for connection

 -C, --continue-at OFFSET  Resumed transfer offset

 -b, --cookie STRING/FILE  String or file to read cookies from (H)

 -c, --cookie-jar FILE  Write cookies to this file after operation (H)

     --create-dirs   Create necessary local directory hierarchy

     --crlf          Convert LF to CRLF in upload

     --crlfile FILE  Get a CRL list in PEM format from the given file

 -d, --data DATA     HTTP POST data (H)

     --data-ascii DATA  HTTP POST ASCII data (H)

     --data-binary DATA  HTTP POST binary data (H)

     --data-urlencode DATA  HTTP POST data url encoded (H)

     --delegation STRING GSS-API delegation permission

     --digest        Use HTTP Digest Authentication (H)

     --disable-eprt  Inhibit using EPRT or LPRT (F)

     --disable-epsv  Inhibit using EPSV (F)

 -D, --dump-header FILE  Write the headers to this file

     --egd-file FILE  EGD socket path for random data (SSL)

     --engine ENGINGE  Crypto engine (SSL). "--engine list" for list

 -f, --fail          Fail silently (no output at all) on HTTP errors (H)

 -F, --form CONTENT  Specify HTTP multipart POST data (H)

     --form-string STRING  Specify HTTP multipart POST data (H)

     --ftp-account DATA  Account data string (F)

     --ftp-alternative-to-user COMMAND  String to replace "USER [name]" (F)

     --ftp-create-dirs  Create the remote dirs if not present (F)

     --ftp-method [MULTICWD/NOCWD/SINGLECWD] Control CWD usage (F)

     --ftp-pasv      Use PASV/EPSV instead of PORT (F)

 -P, --ftp-port ADR  Use PORT with given address instead of PASV (F)

     --ftp-skip-pasv-ip Skip the IP address for PASV (F)

     --ftp-pret      Send PRET before PASV (for drftpd) (F)

     --ftp-ssl-ccc   Send CCC after authenticating (F)

     --ftp-ssl-ccc-mode ACTIVE/PASSIVE  Set CCC mode (F)

     --ftp-ssl-control Require SSL/TLS for ftp login, clear for transfer (F)

 -G, --get           Send the -d data with a HTTP GET (H)

 -g, --globoff       Disable URL sequences and ranges using {} and []

 -H, --header LINE   Custom header to pass to server (H)

 -I, --head          Show document info only

 -h, --help          This help text

     --hostpubmd5 MD5  Hex encoded MD5 string of the host public key. (SSH)

 -0, --http1.0       Use HTTP 1.0 (H)

     --ignore-content-length  Ignore the HTTP Content-Length header

 -i, --include       Include protocol headers in the output (H/F)

 -k, --insecure      Allow connections to SSL sites without certs (H)

     --interface INTERFACE  Specify network interface/address to use

 -4, --ipv4          Resolve name to IPv4 address

 -6, --ipv6          Resolve name to IPv6 address

 -j, --junk-session-cookies Ignore session cookies read from file (H)

     --keepalive-time SECONDS  Interval between keepalive probes

     --key KEY       Private key file name (SSL/SSH)

     --key-type TYPE Private key file type (DER/PEM/ENG) (SSL)

     --krb LEVEL     Enable Kerberos with specified security level (F)

     --libcurl FILE  Dump libcurl equivalent code of this command line

     --limit-rate RATE  Limit transfer speed to this rate

 -l, --list-only     List only names of an FTP directory (F)

     --local-port RANGE  Force use of these local port numbers

 -L, --location      Follow redirects (H)

     --location-trusted like --location and send auth to other hosts (H)

 -M, --manual        Display the full manual

     --mail-from FROM  Mail from this address

     --mail-rcpt TO  Mail to this receiver(s)

     --mail-auth AUTH  Originator address of the original email

     --max-filesize BYTES  Maximum file size to download (H/F)

     --max-redirs NUM  Maximum number of redirects allowed (H)

 -m, --max-time SECONDS  Maximum time allowed for the transfer

     --metalink      Process given URLs as metalink XML file

     --negotiate     Use HTTP Negotiate Authentication (H)

 -n, --netrc         Must read .netrc for user name and password

     --netrc-optional Use either .netrc or URL; overrides -n

     --netrc-file FILE  Set up the netrc filename to use

 -N, --no-buffer     Disable buffering of the output stream

     --no-keepalive  Disable keepalive use on the connection

     --no-sessionid  Disable SSL session-ID reusing (SSL)

     --noproxy       List of hosts which do not use proxy

     --ntlm          Use HTTP NTLM authentication (H)

 -o, --output FILE   Write output to <file> instead of stdout

     --pass PASS     Pass phrase for the private key (SSL/SSH)

     --post301       Do not switch to GET after following a 301 redirect (H)

     --post302       Do not switch to GET after following a 302 redirect (H)

     --post303       Do not switch to GET after following a 303 redirect (H)

 -#, --progress-bar  Display transfer progress as a progress bar

     --proto PROTOCOLS  Enable/disable specified protocols

     --proto-redir PROTOCOLS  Enable/disable specified protocols on redirect

 -x, --proxy [PROTOCOL://]HOST[:PORT] Use proxy on given port

     --proxy-anyauth Pick "any" proxy authentication method (H)

     --proxy-basic   Use Basic authentication on the proxy (H)

     --proxy-digest  Use Digest authentication on the proxy (H)

     --proxy-negotiate Use Negotiate authentication on the proxy (H)

     --proxy-ntlm    Use NTLM authentication on the proxy (H)

 -U, --proxy-user USER[:PASSWORD]  Proxy user and password

     --proxy1.0 HOST[:PORT]  Use HTTP/1.0 proxy on given port

 -p, --proxytunnel   Operate through a HTTP proxy tunnel (using CONNECT)

     --pubkey KEY    Public key file name (SSH)

 -Q, --quote CMD     Send command(s) to server before transfer (F/SFTP)

     --random-file FILE  File for reading random data from (SSL)

 -r, --range RANGE   Retrieve only the bytes within a range

     --raw           Do HTTP "raw", without any transfer decoding (H)

 -e, --referer       Referer URL (H)

 -J, --remote-header-name Use the header-provided filename (H)

 -O, --remote-name   Write output to a file named as the remote file

     --remote-name-all Use the remote file name for all URLs

 -R, --remote-time   Set the remote file's time on the local output

 -X, --request COMMAND  Specify request command to use

     --resolve HOST:PORT:ADDRESS  Force resolve of HOST:PORT to ADDRESS

     --retry NUM   Retry request NUM times if transient problems occur

     --retry-delay SECONDS When retrying, wait this many seconds between each

     --retry-max-time SECONDS  Retry only within this period

 -S, --show-error    Show error. With -s, make curl show errors when they occur

 -s, --silent        Silent mode. Don't output anything

     --socks4 HOST[:PORT]  SOCKS4 proxy on given host + port

     --socks4a HOST[:PORT]  SOCKS4a proxy on given host + port

     --socks5 HOST[:PORT]  SOCKS5 proxy on given host + port

     --socks5-basic  Enable username/password auth for SOCKS5 proxies

     --socks5-gssapi Enable GSS-API auth for SOCKS5 proxies

     --socks5-hostname HOST[:PORT] SOCKS5 proxy, pass host name to proxy

     --socks5-gssapi-service NAME  SOCKS5 proxy service name for gssapi

     --socks5-gssapi-nec  Compatibility with NEC SOCKS5 server

 -Y, --speed-limit RATE  Stop transfers below speed-limit for 'speed-time' secs

 -y, --speed-time SECONDS  Time for trig speed-limit abort. Defaults to 30

     --ssl           Try SSL/TLS (FTP, IMAP, POP3, SMTP)

     --ssl-reqd      Require SSL/TLS (FTP, IMAP, POP3, SMTP)

 -2, --sslv2         Use SSLv2 (SSL)

 -3, --sslv3         Use SSLv3 (SSL)

     --ssl-allow-beast Allow security flaw to improve interop (SSL)

     --stderr FILE   Where to redirect stderr. - means stdout

     --tcp-nodelay   Use the TCP_NODELAY option

 -t, --telnet-option OPT=VAL  Set telnet option

     --tftp-blksize VALUE  Set TFTP BLKSIZE option (must be >512)

 -z, --time-cond TIME  Transfer based on a time condition

 -1, --tlsv1         Use => TLSv1 (SSL)

     --tlsv1.0       Use TLSv1.0 (SSL)

     --tlsv1.1       Use TLSv1.1 (SSL)

     --tlsv1.2       Use TLSv1.2 (SSL)

     --trace FILE    Write a debug trace to the given file

     --trace-ascii FILE  Like --trace but without the hex output

     --trace-time    Add time stamps to trace/verbose output

     --tr-encoding   Request compressed transfer encoding (H)

 -T, --upload-file FILE  Transfer FILE to destination

     --url URL       URL to work with

 -B, --use-ascii     Use ASCII/text transfer

 -u, --user USER[:PASSWORD]  Server user and password

     --tlsuser USER  TLS username

     --tlspassword STRING TLS password

     --tlsauthtype STRING  TLS authentication type (default SRP)

     --unix-socket FILE    Connect through this UNIX domain socket

 -A, --user-agent STRING  User-Agent to send to server (H)

 -v, --verbose       Make the operation more talkative

 -V, --version       Show version number and quit

 -w, --write-out FORMAT  What to output after completion

     --xattr        Store metadata in extended file attributes

 -q                 If used as the first parameter disables .curlrc

第二节 ElasticSearch基本操作

2.1 倒排索引

ElasticSearch使用一种称为倒排索引的结构，它适用于快速的全文搜索。一个倒排索引由文档中所有不重复词的列表构成，对于其中每个词，有一个包含它的文档列表

示例：

(1)假设文档集合包含五个文档，每个文档内容如下图所示，在图中最左侧一栏是每个文档对应的文档编号，我们的任务就是对这个文档集合建立倒排索引

(2)中文和英文等语言不同，单词之间没有明确的分隔符号，所以首先要用分词系统将文档自动切分成单词序列，这样每个文档就转换为由单词序列构成的数据流，为了系统后续处理方便，需要对每个不同的单词赋予唯一的单词编号，同时记录下哪些文件包含这个单词，在这样处理结束之后，我们可以得到最简单的倒排索引

"单词ID"一栏记录了每个单词的单词编号，第二栏是对应的单词，第三栏即是每个单词对应的倒排列表

(3)索引系统还可以记录除此之外的更多信息，下图还记载了单词频率信息(TF),即这个单词在某个文档中的出现次数，之所以要记录这个信息，是因为词频信息在搜索结果排序时，计算查询和文档相似度是很重要的一个计算因子，所以将其记录在倒排列表中，以方便后续排查时进行分值计算

(4)倒排列表还科技记录单词在某个文档中出现的位置信息，比如：(1,<11>,1),(2,<7>,1),(3,❤️,9>,2)。有了这个索引系统，搜索引擎可以很方便地响应用户的查询。比如用户输出查询单词"Facebook",搜索系统查找倒排索引，从中可以读出包含这个单词的文档，这些文档就是提供给用户的搜索结果，而利用单词频率信息，，文档频率信息即可以对这些候选结果进行排序，计算文档和查询的相似性，按照相似性得分由高到低排序输出，此即为搜索系统的部分内部流程

使用标准化规则(normalization)：建立倒排索引的时候，会对拆分出的各个单词进行相应的处理，以提升后面索引的时候能够搜索到相关联的文档的概率

2.2 分词器

分词器:从一串文本中切分出一个一个的词条，并对每个词条进行标准化

包括三部分:

character filter：分词之前的预处理，过滤掉HTML标签，特殊符号转换等
tokenizer:分词
token filter:标准化

内置分词器：

standard分词器：默认的，它会将词汇单元转换成小写形式，并去除停用词和标点符号，支持重温采用的方法为单字切分
simple分词器：首选会通过非字母字符来分隔文本信息，然后将词汇单元统一为小写形式，该分词器会去杜鳌数字类型的字符
whitespace分词器：仅仅是去除空格，对字符没有转换成小写形式，不支持中文，并且不对生成的词汇单元进行其他的标准化处理
language分词器：特定语言的分词器，不支持中文

配置中文分词器

下载elasticsearch-analysis-ik-master.zip

wget https://github.com/medcl/elasticsearch-analysis-ik-master.zip
解压elasticsearch-analysis-ik-master.zip

unzip elasticsearch-analysis-ik-master.zip
进入解压目录，编译源码

cd elasticsearch-analysis-ik-master

mvn clean install -Dmaven.test.skip=true （需要事先安装配置好maven环境）
将编译后生成的zip文件移动到es的插件目录下,解压缩并重命名

cd elasticsearch-analysis-ik-master/target/release/

cp elasticsearch-analysis-ik.zip /usr/local/elasticsearch/plugins

unzip elasticsearch-analysis-ik.zip

mv elasticsearch-analysis-ik ik

# 重启elasticsearch，查看加载的插件信息

2.3 使用ElasticSearch API实现CURD

使用浏览器打开http://ip/kibana，左侧导航有开发工具，点开，查看帮助信息等

左侧输入，右侧输出结果

添加索引

相当于新建一个数据库

输入的数据,使用自定义的配置

PUT /lib/

{

  "settings": {

    "index":{

      "number_of_shards":3,

      "number_of_replicas":0

    }

  }

}

右侧输出的结果：

{

  "acknowledged" : true,

  "shards_acknowledged" : true,

  "index" : "lib"

}

添加索引，使用默认的配置：

PUT /lib2/

输出结果是：

{

  "acknowledged" : true,

  "shards_acknowledged" : true,

  "index" : "lib2"

}

查看索引

GET /lib/_settings

{

  "lib" : {

    "settings" : {

      "index" : {

        "creation_date" : "1566617147934",

        "number_of_shards" : "3",

        "number_of_replicas" : "0",

        "uuid" : "6D92TpNWSk-j-gD-nDoxdw",

        "version" : {

          "created" : "7030099"

        },

        "provided_name" : "lib"

      }

    }

  }

}

GET /lib2/_settings

{

  "lib2" : {

    "settings" : {

      "index" : {

        "creation_date" : "1566617313082",

        "number_of_shards" : "1",

        "number_of_replicas" : "1",

        "uuid" : "jw8Lh0n7QM-1lhT6xCqhKw",

        "version" : {

          "created" : "7030099"

        },

        "provided_name" : "lib2"

      }

    }

  }

}

查看所有索引的配置

GET /_all/_settings

添加文档

相当于新建一个数据表，并添加一条数据

指定索引使用put方式。示例中索引为1

PUT /lib/user/1

{

  "first_name":"Jane",

  "last_name":"Simth",

  "age":32,

  "about":"I like to collect rock albums",

  "interests":["music","video"]

}

#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).

{

  "_index" : "lib",

  "_type" : "user",

  "_id" : "1",

  "_version" : 1,

  "result" : "created",

  "_shards" : {

    "total" : 1,

    "successful" : 1,

    "failed" : 0

  },

  "_seq_no" : 0,

  "_primary_term" : 1

}

不指定索引使用post方式,id自动生成

POST /lib/user/

{

  "first_name":"Jane",

  "last_name":"Simth",

  "age":32,

  "about":"I like to collect rock albums",

  "interests":["music","video"]

}

#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).

{

  "_index" : "lib",

  "_type" : "user",

  "_id" : "PRy5wWwBfIGT97PTaZOi",

  "_version" : 1,

  "result" : "created",

  "_shards" : {

    "total" : 1,

    "successful" : 1,

    "failed" : 0

  },

  "_seq_no" : 0,

  "_primary_term" : 1

}

查看文档

GET /lib/user/1

#! Deprecation: [types removal] Specifying types in document get requests is deprecated, use the /{index}/_doc/{id} endpoint instead.

{

  "_index" : "lib",

  "_type" : "user",

  "_id" : "1",

  "_version" : 1,

  "_seq_no" : 0,

  "_primary_term" : 1,

  "found" : true,

  "_source" : {

    "first_name" : "Jane",

    "last_name" : "Simth",

    "age" : 32,

    "about" : "I like to collect rock albums",

    "interests" : [

      "music",

      "video"

    ]

  }

}

查看部分文档信息

GET /lib/user/1?_source=age,about

#! Deprecation: [types removal] Specifying types in document get requests is deprecated, use the /{index}/_doc/{id} endpoint instead.

{

  "_index" : "lib",

  "_type" : "user",

  "_id" : "1",

  "_version" : 1,

  "_seq_no" : 0,

  "_primary_term" : 1,

  "found" : true,

  "_source" : {

    "about" : "I like to collect rock albums",

    "age" : 32

  }

}

更新文档

使用put方式更新，只能更新已有字段

PUT /lib/user/1

{

  "first_name":"Jane",

  "last_name":"Simth",

  "age":36,

  "about":"I like to collect rock albums",

  "interests":["music"]

}

使用post方式更新,若有字段则更新，若无字段则新增

POST /lib/user/1/_update

{

  "doc":{

    "age":1111,

    "aa":2222

  }

}

6.删除

删除一个文档

DELETE /lib/user/1

删除一个索引

DELETE /lib

2.4 批量获取文档

使用es提供的Multi Get API可以通过索引名，类型名，文档id一次得到一个文档集合，文档可以来着同一个索引库，也可以来自不同的索引库。

使用curl命令：

curl -XGET "http://192.168.10.102:9200/_mget" -H 'Content-Type: application/json' -d'{  "docs": [    {      "_index": "lib",      "_type": "user",      "_id": 1    },    {      "_index": "lib",      "_type": "user",      "_id": 2    },    {      "_index": "lib",      "_type": "user",      "_id": 3    }  ]}'

使用kibana提供的客户端工具：开发工具

先添加三条数据

GET /_mget

{

  "docs": [

    {

      "_index": "lib",

      "_type": "user",

      "_id": 1

    },

    {

      "_index": "lib",

      "_type": "user",

      "_id": 2

    },

    {

      "_index": "lib",

      "_type": "user",

      "_id": 3

    }

  ]

}

可以指定具体的字段：

GET /_mget

{

  "docs": [

    {

      "_index": "lib",

      "_type": "user",

      "_id": 1,

      "_source":["age","about"]

    },

    {

      "_index": "lib",

      "_type": "user",

      "_id": 2

    },

    {

      "_index": "lib",

      "_type": "user",

      "_id": 3,

      "_source":"interests"

    }

  ]

}

获取同索引同类型下的不同文档,简写成如下形式：

GET /lib/user/_mget

{

  "docs": [

    {

      "_id": 1

    },

    {

      "_type": "user",

      "_id": 2

    }

  ]

}

GET /lib/user/_mget

{

  "ids":["1","2"]

}

2.5 使用Bulk实现批量操作

bulk格式：

{action:{metadata}}\n

{requestbody}\n

action:(行为)

create:文档不存在时创建
update:更新文档
index:创建新文档或替换已有文档
delete:删除一个文档

metedata:_index,_type,_id

create和index的区别：如果数据存在，使用create操作失败，会提示文档已经存在，使用index则可以成功执行。

示例：

{"delete":{"_index":"lib","_type":"user","_id":"1"}}

批量添加：

右边输出框显示"errors" : false表示添加成功

POST /lib2/books/_bulk

{"index":{"_id":1}}

{"title":"java","price":55}

{"index":{"_id":2}}

{"title":"HTML5","price":35}

{"index":{"_id":3}}

{"title":"python","price":100}

批量获取：

GET /lib2/books/_mget

{

    "ids":["1","2","3"]

}

删除：没有请求体

POST /lib2/books/_bulk

{"delete":{"_index":"lib2","_type":"books","_id":3}}

bulk一次最大处理多少数据量：

bulk会把将要处理的数据载入内存中，所有数据量是有限制的。最佳的数据量不是一个确定的数值，它取决于你的硬件，你的文档大小以及复杂性，你的索引以及搜索的负载。

一般建议是1000-5000个文档，大小建议是5-15MB，默认不能超过100M，可以在es的配置文件中

2.6 版本控制

Elasticsearch采用了乐观锁来保证数据的一致性，也就是说当用户对文档进行操作时，并不需要对该文档作加锁和解锁的操作，只需要指定要操作的版本即可。当版本号一致时，Elasticsearch会允许该操作顺利执行，而当版本号存在冲突时，Elasticsearch会提示冲突并抛出异常

Elasticsearch的版本号取值范文是1到2^63-1

内部版本控制使用的是_version

外部版本控制：Elasticsearch在处理外部版本号时会对内部版本号的处理有些不同，它不再是检查_version是否与请求中指定的数值相同，而是会检查当前的_version是否比指定的数值小。如果请求成功，那么外部版本号就会被存储到文档的_version中

为了保持_version与外部版本控制的数据一致，使用version_type=external

GET /lib/user/2

# 修改version=后面的值，查看变化

PUT /lib/user/1?version=1&version_type=external

{

  "age" : 44

}

2.7 mapping

es自动创建了index，type，以及type对应的mapping(dynamic mapping)

mapping定义了type中的每个字段的数据类型以及这些字段如何分词等相关属性

创建索引的时候，可以预先定义字段的类型及相关属性，这样就能够把日期字段处理成日期，把数字字段处理成数字，把字符串字段处理成字符串值等

支持的数据类型：

核心数据类型(core datatype)

字符型:string,包括text和keyword

text类型被用来索引长文本，在建立索引前会将这些文本进行分词，转化为词的组合，建立索引，允许es来检索这些词语。text类型不能用来排序和聚合。

keyword类型不需要进行分词，可以被用来检索过滤，排序和聚合，keyword类型字段只能用本身来进行检索
数字型：long,integer,short,btype,dobule,float (默认没有分词)
日期型：date (默认没有分词)
布尔型：boolean
二进制型：binary

(2)复杂数据类型

数组类型：数组类型不需要专门制定数组元素的type,比如：

字符型数组：["one","two"]

整型数组：[1,2]

数组型数组：[1,[2,3]],等价于[1,2,3]

对象数组：[{"name":"Mary","age":12},{"name":"Tom","age":20}]
对象类型：_object_用于单个json对象
嵌套类型：_nested_用于json数组

(3)地理位置类型

地理坐标类型：_geo_point_用于经纬度坐标
地理形状类型：_geo_shape_用于类似于多边形的复杂形状

(4)特定类型

IPv4类型：_ip_用于IPv4地址
Completion:_completion_提供自动补全建议
Token count类型：_token_count_用于统计做了标记的字段的index数目，该值会一致增加，不会因为过滤条件而减少
mapper-murmur3类型：通过插件，可以通过_murmur3_来计算index的hash值
附加类型：采用mapper-attachments插件，可支持_attachments_索引，例如Microsoft Office格式，Open Document格式，ePub，HTML等

支持的属性：

"store":false // 是否单独设置此字段的是否存储而从_source字段中分离，默认是false,只能搜索，不能获取值
"index":true // 分词，不分词是false,设置成false字段将不会被索引
"analyzer":"ik" // 指定分词器，默认分词器是standard analyzer
"boost":1.23 // 字段级别的分数加权，默认是1.0
"doc_values":false // 对not_analyzed字段，默认都是开启，分词字段不能使用，对哦排序和聚合能提升较大性能，节约内存
"fielddata":{"format":"disabled"} // 针对分词字段，参与排序或聚合时能提高性能，不分词字段统一建议使用doc_value
"fields":{"raw":{"type":"string","index":"not_analyzed"}} // 可以对一个字段提供多种索引模式，同一个字段的值，一个分词，一个不分词
"ignore_above":100 // 超过100个字符的文本，将会被忽略，不被索引
"include_in_all":true // 设置是否此参数字段包含在_all字段中，默认是true,除非index设置成no选项
"index_options":"docs" // 4个可选参数docs(索引文档号)，freqs(文档号+词频),positions(文档号+词频+位置，通常用来距离查询),offsets(文档号+词频+位置+偏移量，通常被使用在高亮字段)，分词字段默认是positions,其他的默认是docs
"norms":{"enable":true,"loading":lazy} // 分词字段默认配置，不分词字段：默认{"enable":false},存储长度因子和索引时boost,建议对需要参与评分字段使用，会额外增加内存消耗量
"null_value":NULL // 设置一些缺失字段的初始化值，只有string可以使用，分词字段的null值也会被分词
"position_increment_gap":0 // 影响距离查询或近似查询，可以设置在多值字段的数据上或分词字段上，查询时可以指定slop间隔，默认是100
"search_analyzer":"ik" // 设置搜索时的分词器，默认跟analyzer是一致的，比如index时用standard+ngram，搜索时用standard来完成自动提示功能
"similarity":"BM25" // 默认是TF/IDF算法，指定一个字段评分策略，仅仅对字符串型和分词类型有效
"term_vector":"no" // 默认不存储向量信息，支持参数yes(tern存储),with_positions(term+位置),with_offsets(term+偏移量),with_positions_offsets(term+位置+偏移量),对快速高亮fast vector highlighter能提升性能，但开启又会加大索引体积，不适合大数据量用

添加三个文档

PUT /lib/user/1

{

  "first_name":"Jane",

  "last_name":"Simth",

  "age":36,

  "about":"I like to collect rock albums",

  "interests":["music"]

}

PUT /lib/user/2

{

  "first_name":"Jane",

  "last_name":"Simth",

  "age":36,

  "about":"I like to collect rock albums",

  "interests":["music"]

}

PUT /lib/user/3

{

  "first_name":"Jane",

  "last_name":"Simth",

  "age":36,

  "about":"I like to collect rock albums",

  "interests":["music"],

  "data":"2019-08-24"

}

查看其中一个文档

GET /lib/user/1

查看文档mapping

GET /lib/_mapping

输出结果：

{

  "lib" : {

    "mappings" : {

      "properties" : {

        "about" : {

          "type" : "text",

          "fields" : {

            "keyword" : {

              "type" : "keyword",

              "ignore_above" : 256

            }

          }

        },

        "age" : {

          "type" : "long"

        },

        "data" : {

          "type" : "date"

        },

        "first_name" : {

          "type" : "text",

          "fields" : {

            "keyword" : {

              "type" : "keyword",

              "ignore_above" : 256

            }

          }

        },

        "interests" : {

          "type" : "text",

          "fields" : {

            "keyword" : {

              "type" : "keyword",

              "ignore_above" : 256

            }

          }

        },

        "last_name" : {

          "type" : "text",

          "fields" : {

            "keyword" : {

              "type" : "keyword",

              "ignore_above" : 256

            }

          }

        }

      }

    }

  }

}

查询文档

# 查询出来，文本类型的默认分词，不需要精确

GET /lib/_search?q=age:36

# 查询出来

GET /lib/_search?q=about:like

# 查询不出来，日期类型默认没有分词，查询的话必须精确

GET /lib/_search?q=data:2019

# 查询出来

GET /lib/_search?q=data:2019-08-24

Object数据类型及手动创建mapping

# 添加一个文档

PUT /lib5/person/1

{

  "name":"tom",

  "age":30,

  "birthday":"1985-12-12",

  "address":{

    "country":"china",

    "province":"guangdong",

    "city":"shenzhen"

  }

}

# 查看该文档

GET /lib5/person/1

# 查看文档mapping

GET /lib5/_mapping

输出结果：

{

  "lib5" : {

    "mappings" : {

      "properties" : {

        "address" : {

          "properties" : {

            "city" : {

              "type" : "text",

              "fields" : {

                "keyword" : {

                  "type" : "keyword",

                  "ignore_above" : 256

                }

              }

            },

            "country" : {

              "type" : "text",

              "fields" : {

                "keyword" : {

                  "type" : "keyword",

                  "ignore_above" : 256

                }

              }

            },

            "province" : {

              "type" : "text",

              "fields" : {

                "keyword" : {

                  "type" : "keyword",

                  "ignore_above" : 256

                }

              }

            }

          }

        },

        "age" : {

          "type" : "long"

        },

        "birthday" : {

          "type" : "date"

        },

        "name" : {

          "type" : "text",

          "fields" : {

            "keyword" : {

              "type" : "keyword",

              "ignore_above" : 256

            }

          }

        }

      }

    }

  }

}

底层存储格式

{

  "name":["tom"],

  "age":[30],

  "birthday":["1985-12-12"],

  "address.country":["china"],

  "address.province":["guangdong"],

  "address.city":["shenzhen"]

}

更复杂一些的

{

    "person":[

        {"name":"lisi","age":25},

        {"name":"waqngwu","age":26},

        {"name":"zhangsan","age":30}

    ]

}

# 底层存储格式

{

    "person.name":["lisi","waqngwu","zhangsan"],

    "person.age":[25,26,30]

}

注意：ElasticSearch 7.x 默认不再支持指定索引类型,默认索引类型是_doc，如果想改变，则配置include_type_name: true 即可

(这个没有测试，官方文档说的，无论是否可行，建议不要这么做，因为elasticsearch8后就不在提供该字段)

如下手动创建mapping，在6.x可以顺利执行，但是在7.x则会报错：Root mapping definition has unsupported parameters

手动创建mapping

PUT /lib6

{

  "settings": {

    "number_of_shards": 3,

    "number_of_replicas": 0

  },

  "mappings": {

    "books":{

      "properties":{

        "title":{"type":"text"},

        "name":{"type":"text","analyzer":"standard"},

        "publish_date":{"type":"date","index":false},

        "price":{"type":"dobule"},

        "number":{"type":"integer"}

      }

    }

  }

}

所以在Elasticsearch7中应该这么创建索引

跟6.x版本的想比较，少了一层结构

PUT /lib6

{

  "settings": {

    "number_of_shards": 3,

    "number_of_replicas": 0

  },

  "mappings": {

    "properties":{

      "title":{"type":"text"},

      "name":{"type":"text"},

      "publish_date":{"type":"date"},

      "price":{"type":"double"},

      "number":{"type":"integer"}

    }

  }

}

2.8 基本查询(Query查询)

数据准备

PUT /lib3/user/1

{

  "name":"zhaoliu",

  "address":"hei long jiang sheng tie ling shi",

  "age":50,

  "birthday":"1970-12-12",

  "interests":"xi huan he jiu,duan lian,lvyou"

}

PUT /lib3/user/2

{

  "name":"lisi",

  "address":"bei jing hai dian qu qing he zhen",

  "age":20,

  "birthday":"1998-12-12",

  "interests":"xi huan he jiu,duan lian,changge"

}

PUT /lib3/user/3

{

  "name":"zhaoming",

  "address":"bei jing hai dian qu qing he zhen",

  "age":23,

  "birthday":"1970-12-12",

  "interests":"xi huan he jiu,duan lian,lvyou,youyong"

}

# _score:和当前搜索相关度的匹配分数

# 简单查询

GET /lib3/_search?q=name:zhaoming

GET /lib3/_search?q=interests:he jiu&sort=age:desc

query_string查询

把查询的词句先分词，然后再查询

GET /lib3/user/3/_search

{

    "query":{

        "query_string":{

            "default_field":"name",

            "query":"zhangsan"

        }

    }

}

term查询和terms查询

term query会去倒排索引中寻找确切的term,它并不知道分词器的存在，这种查询适合keyword，numeric，date

term:查询某个字段里含有某个关键词的文档

terms:查询某个字段里含有多个关键词的文档

控制查询返回的数量

from：从哪一个文档开始

size：需要的个数
返回版本号

"version":true
match查询

match query知道分词器的存在，会对field进行分词操作，然后再查询

match_all:查询所有文档

multi_match:可以指定多个字段

match_phrase:短语匹配查询，es引擎首先分析查询字符串，从分析后的文本汇总构建短语查询，这意味着必须匹配短语汇总的所有分词，并别保证各个分词的相对位置不变

指定返回的字段

_source
排除某些字段

_include,_exclude
排序

使用sort实现排序：desc降序，asc升序
前缀匹配查询

match_phrase_prefix
查询范围

range：实现查询范围

参数：from,to,include_lower,include_upper,boost

include_lower:是否包含范围的左边界，默认是true
include_upper:是否包含范围的右边界，默认是true

wildcard查询

允许使用通配符*和?来进行查询

*：表示0个或多个字符
?：表示任意一个字符

fuzzy实现模糊查询

value:查询的关键字
boost:查询的权重，默认值是1.0
min_similarity:设置匹配的最小相似度，默认值是0.5，对于字符串，取值为0-1(包括0和1)；对于数值，取值可能大于1；对于日期型取值为1d(1一天),1m等
prefix_length:指明区分词项的共同前缀长度，默认是0
max_expansions：查询中的词项可以扩展的项目，默认可以无限大

高亮搜索结果

"highlight"
filter查询

filter是不计算相关性的，同时可以cache,因此filter速度要快于query

2.9 中文查询

前面步骤安装的IK中文分词器提供了两个分词算法：ik_smart和ik_max_world

其中ik_smart为最少切分，ik_max_world为最细粒度划分

使用postman软件来测试中文分词效果：

GET http://127.0.0.1:9200/_analyze?analyzer=ik_smart&pretty=true&text=测试中文分词器

GET http://127.0.0.1:9200/_analyze?analyzer=ik_max_world&pretty=true&text=测试中文分词器

使用中文分词器的话，在创建索引的时候，需要在文档的mapping中相应字段设置使用中文分词器

PUT /lib6

{

    "mappings": {

        "properties":{

            "title":{

                "type":"text",

                "analyzzer":"ik_max_world"

            },

            "content":{

                "type":"text",

                "analyzzer":"ik_smart"

            },

        }

    }

}

ElasticSearch基础知识讲解的更多相关文章

Html基础知识讲解
Html基础知识讲解 <title>淄博汉企</title> </head> <body bgcolor="#66FFCC" topmar ...
python基础知识讲解——@classmethod和@staticmethod的作用
python基础知识讲解——@classmethod和@staticmethod的作用在类的成员函数中,可以添加@classmethod和@staticmethod修饰符,这两者有一定的差异,简单来 ...
java Reflection（反射）基础知识讲解
原文链接:小ben马的java Reflection(反射)基础知识讲解 1.获取Class对象的方式 1.1)使用 "Class#forName" public static C ...
ElasticSearch（四）：关于es的一些基础知识讲解
上一篇博客更新完之后,我发现一个问题:在我创建索引的时候依旧无法准确的理解每个字段的意义,所以就有了这个. 1. 关于索引 1.1 关于索引的一些基础知识在创建标准化索引的时候,我们传入的请求体如下 ...
Elasticsearch基础知识要点QA
前言:本文为学习整理实践他人成果的记录型博客.在此统一感谢各原作者,如果你对基础知识不甚了解,可以通过查看Elasticsearch权威指南中文版, 此处注意你的elasticsearch版本,版本不 ...
elasticsearch基础知识杂记
日常工作中用到的ES相关基础知识和总结.不足之处请指正,会持续更新. 1.集群的健康状况为 yellow 则表示全部主分片都正常运行(集群可以正常服务所有请求),但是副本分片没有全部处在正常状态. ...
Elasticsearch基础知识学习
概要 ElasticSearch是一个基于Lucene的搜索服务器.它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口.Elasticsearch是用Java开发的,并作为Ap ...
shell基础知识讲解
第1章 shell基础 1.1 什么叫做shell编程 shell编程也叫做bash高级编程语法 1.2 常见的shell命令解释器 bash redhat和centos使用 d ...
Elasticsearch基础知识分享
1. Elasticsearch背景介绍 Elasticsearch 是一个基于 Lucene 的搜索服务器.它提供了一个分布式多用户能力的全文搜索引擎,基于 RESTful web 接口.Elast ...

随机推荐

「CQOI2014」数三角形
题目链接问题分析可以先任意选$3$个数,然后减去三点共线的部分. 三点共线又分$2$种情况: 横的或者竖的.这一部分方案数是\(n\times{m\choose 3}+m\times {n ...
AtCoder AGC032D Rotation Sort (DP)
题目链接 https://atcoder.jp/contests/agc032/tasks/agc032_d 题解又是一道神仙题啊啊啊啊...atcoder题真的做不来啊QAQ 第一步又是神仙转化: ...
rem等比例自适应手机尺寸
方法:用sass的函数动态计算rem值 $rem : 75px;基准值设计图是750的宽设为$rem变量设为75,设计图是350的宽设为$rem变量设为35,老的写法需要用js来配合来动态改变 ...
C++入门经典-例3.17-使用while循环进行计算
1:代码如下: // 3.17.cpp : 定义控制台应用程序的入口点. // #include "stdafx.h" #include <iostream> usin ...
桥接模式下，主机能ping通虚拟机，虚拟机ping不通主机
好像是防火墙阻止了什么东西而导致的无法ping通! 1.打开WIN7防火墙 2.选择高级设置 3.入站规则 4.找到配置文件类型为“公用”的“文件和打印共享(回显请求 – ICMPv4-In)”规则, ...
第七周学习总结&JAVA实验五报告。
JAVA实验报告五: 实验四类的继承实验目的理解抽象类与接口的使用: 了解包的作用,掌握包的设计方法. 实验要求掌握使用抽象类的方法. 掌握使用系统接口的技术和创建自定义接口的方法. 了解 J ...
MySQL学习笔记（cmd模式下的操作）
1.登入MySQL 1.1 登入MySQL 1.1.1命令如下: C:\Users\zjw>mysql -hlocalhost -uroot -p Enter password: ****** ...
Spring配置多个数据源，并实现数据源的动态切换转载）
1.首先在config.properties文件中配置两个数据库连接的基本数据.这个省略了 2.在spring配置文件中配置这两个数据源: 数据源1 <!-- initialSize初始化时建立 ...
csp2019 Emiya家今天的饭题解
qwq 由于窝太菜了,实在是不会,所以在题解的帮助下过掉了这道题. 写此博客来整理一下思路正文传送简化一下题意:现在有$n$行$m$列数,选$k$个数的合法方案需满足: 1.一行最多 ...
Mac 安装 Novicat
https://blog.csdn.net/jor_ivy/article/details/81323199 详细见这篇文章

ElasticSearch基础知识讲解