1、什么是Elashticsearch

1.1 Elashticsearch介绍

　　Elasticsearch是一个基于Apache Lucene(TM)的开源搜索引擎。能够快速搜索数十亿的文件以及PB级的数据，结构化或者非结构化的数据都可以。对于大多数数据库而言，横向扩展意味着你的程序将做非常大的改动来利用这些新添加的设备。对比来说，Elasticsearch天生是分布式的：它知道如何管理节点来提供高扩展和高可用。这意味着你的程序不需要关心这些。

　　Elasticsearch使用Java开发并使用Lucene作为其核心来实现所有索引和搜索的功能，但是它的目的是通过简单的RESTful API来隐藏Lucene的复杂性，从而让全文搜索变得简单。

　　

1.2 Elashticsearch的基础概念

接近实时（NRT）

　　Elasticsearch是一个接近实时的搜索平台。这意味着，从索引一个文档直到这个文档能够被搜索到有一个轻微的延迟（通常是1秒）。

　　
集群（cluster）

　　一个集群就是由一个或多个节点组织在一起，它们共同持有你整个的数据，并一起提供索引和搜索功能。一个集群由一个唯一的名字标识，这个名字默认就是“elasticsearch”。这个名字是重要的，因为一个节点只能通过指定某个集群的名字，来加入这个集群。在产品环境中显式地设定这个名字是一个好习惯，但是使用默认值来进行测试/开发也是不错的。

　　
节点（node）

　　一个节点是你集群中的一个服务器，作为集群的一部分，它存储你的数据，参与集群的索引和搜索功能。和集群类似，一个节点也是由一个名字来标识的，默认情况下，这个名字是一个随机的漫威漫画角色的名字，这个名字会在启动的时候赋予节点。这个名字对于管理工作来说挺重要的，因为在这个管理过程中，你会去确定网络中的哪些服务器对应于Elasticsearch集群中的哪些节点。

　　一个节点可以通过配置集群名称的方式来加入一个指定的集群。默认情况下，每个节点都会被安排加入到一个叫做“elasticsearch”的集群中，这意味着，如果你在你的网络中启动了若干个节点，并假定它们能够相互发现彼此，它们将会自动地形成并加入到一个叫做“elasticsearch”的集群中。

　　
索引（index）

　　一个索引就是一个拥有几分相似特征的文档的集合。比如说，你可以有一个客户数据的索引，另一个产品目录的索引，还有一个订单数据的索引。一个索引由一个名字来标识（必须全部是小写字母的），并且当我们要对对应于这个索引中的文档进行索引、搜索、更新和删除的时候，都要使用到这个名字。

　　
类型（type）

　　在一个索引中，你可以定义一种或多种类型。一个类型是你的索引的一个逻辑上的分类/分区，其语义完全由你来定。通常，会为具有一组共同字段的文档定义一个类型。比如说，我们假设你运营一个博客平台并且将你所有的数据存储到一个索引中。在这个索引中，你可以为用户数据定义一个类型，为博客数据定义另一个类型，当然，也可以为评论数据定义另一个类型。

　　
文档（document）

　　一个文档是一个可被索引的基础信息单元。比如，你可以拥有某一个客户的文档，某一个产品的一个文档，当然，也可以拥有某个订单的一个文档。文档以JSON（Javascript Object Notation）格式来表示，而JSON是一个到处存在的互联网数据交互格式

　　在一个index/type里面，只要你想，你可以存储任意多的文档。注意，尽管一个文档，物理上存在于一个索引之中，文档必须被索引/赋予一个索引的type。

　　
分片和复制（shards & replicas）

　　一个索引可以存储超出单个结点硬件限制的大量数据。比如，一个具有10亿文档的索引占据1TB的磁盘空间，而任一节点都没有这样大的磁盘空间；或者单个节点处理搜索请求，响应太慢。

　　为了解决这个问题，Elasticsearch提供了将索引划分成多份的能力，这些份就叫做分片。当你创建一个索引的时候，你可以指定你想要的分片的数量。每个分片本身也是一个功能完善并且独立的“索引”，这个“索引”可以被放置到集群中的任何节点上。

　　默认情况下，Elasticsearch中的每个索引被分片5个主分片和1个复制，这意味着，如果你的集群中至少有两个节点，你的索引将会有5个主分片和另外5个复制分片（1个完全拷贝），这样的话每个索引总共就有10个分片。

　　

　　以上的概念都清楚后，我们就可以开始完全运转Elashticsearch了

2、运行Elasticsearch需要哪些环境

2.1、 Java运行环境安装

　　在安装时我们要使用root帐号

root@ubuntu1:~# sudo apt-get install python-software-properties

root@ubuntu1:~# sudo apt-get install software-properties-common

root@ubuntu1:~# sudo add-apt-repository ppa:webupd8team/java

root@ubuntu1:~# sudo apt-get update && sudo apt-get install oracle-java8-installer

　　

　　查看Java安装的版本

root@ubuntu1:~# java -version

java version "1.8.0_91"

Java(TM) SE Runtime Environment (build 1.8.0_91-b14)

Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)

3、Elasticsearch安装以及插件安装

3.1、本文中的测试机器介绍

　　分别有三台机器，后文中Elashticsearch也会部署到这三台机器上，对应的IP地址为：

　　ubuntu1 192.168.0.25

　　ubuntu2 192.168.0.26

　　ubuntu3 192.168.0.27

　　

3.2、在ubuntu1中下载Elasticsearch 2.3.3

　　这时要切换到普通帐号，本文中的测试帐号是lion：

lion@ubuntu1:~# mkdir tar

lion@ubuntu1:~# cd tar

lion@ubuntu1:~/tar# wget https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/zip/elasticsearch/2.3.3/elasticsearch-2.3.3.zip

3.3、在ubuntu1中安装Elasticsearch 2.3.3

　　Elasticsearch不需要单独的安装，解压以后就可以直接使用

lion@ubuntu1:~/tar# unzip elasticsearch-2.3.3.zip

3.4、在ubuntu1中安装elasticsearch-head插件

　　elasticsearch-head是一个elasticsearch的集群管理工具,它是完全由html5编写的独立网页程序。

lion@ubuntu1:~/tar/elasticsearch-2.3.3$ pwd

/home/lion/tar/elasticsearch-2.3.3

lion@ubuntu1:~/tar/elasticsearch-2.3.3$ bin/plugin install mobz/elasticsearch-headd

3.5、在ubuntu1中安装elasticsearch-sql插件

　　elasticsearch-sql可以通过sql语句进行查询。

　　elasticsearch-sql的官网在这里：https://github.com/NLPchina/elasticsearch-sql/

　　

elasticsearch-sql for Elasticsearch 2.3.3的在线安装方式：

lion@ubuntu1:~/tar/elasticsearch-2.3.3$ ./bin/plugin install https://github.com/NLPchina/elasticsearch-sql/releases/download/2.3.3.0/elasticsearch-sql-2.3.3.0.zip

elasticsearch-sql for Elasticsearch 2.3.3的离线安装方式：

　　先下载包：https://github.com/NLPchina/elasticsearch-sql/releases/download/2.3.3.0/elasticsearch-sql-2.3.3.0.zip

　　PS:我在下载的过程中需要翻墙，下载后，复制到elasticsearch-2.3.3的根目录。

lion@ubuntu1:~/tar/elasticsearch-2.3.3$ ll

total 3868

drwxr-xr-x 7 lion lion    4096 Jun 23 15:42 ./

drwxrwxr-x 3 lion lion    4096 Jun 23 15:00 ../

drwxr-xr-x 2 lion lion    4096 May 17 15:48 bin/

drwxr-xr-x 2 lion lion    4096 May 17 15:48 config/

-rw-r--r-- 1 lion lion 3901121 Jun 23 15:42 elasticsearch-sql-2.3.3.0.zip

drwxrwxr-x 2 lion lion    4096 Jun 23 15:00 lib/

-rw-rw-r-- 1 lion lion   11358 Jan 27 12:53 LICENSE.txt

drwxrwxr-x 5 lion lion    4096 May 17 15:48 modules/

-rw-rw-r-- 1 lion lion     150 May 12 13:24 NOTICE.txt

drwxrwxr-x 3 lion lion    4096 Jun 23 15:03 plugins/

-rw-rw-r-- 1 lion lion    8700 May 12 13:24 README.textile

　　

　　执行命令进行离线安装：

lion@ubuntu1:~/tar/elasticsearch-2.3.3$ bin/plugin install file:/home/lion/tar/elasticsearch-2.3.3/elasticsearch-sql-2.3.3.0.zip

　　

　　安装成功后的打印信息如下：

-> Installing from file:/home/lion/tar/elasticsearch-2.3.3/elasticsearch-sql-2.3.3.0.zip...

Trying file:/home/lion/tar/elasticsearch-2.3.3/elasticsearch-sql-2.3.3.0.zip ...

Downloading .......................................DONE

Verifying file:/home/lion/tar/elasticsearch-2.3.3/elasticsearch-sql-2.3.3.0.zip checksums if available ...

NOTE: Unable to verify checksum for downloaded plugin (unable to find .sha1 or .md5 file to verify)

Installed sql into /home/lion/tar/elasticsearch-2.3.3/plugins/sql

3.6、修改ubuntu1机器上的Elasticsearch配置文件

　　更改Elasticsearch2.3.3的配置文件config/elasticsearch.yml，找到network.host这一行，修改后面的IP地址为192.168.0.25，方便内网可以访问，保存退出以后，修改后的配置文件如下：

# ======================== Elasticsearch Configuration =========================

#

# NOTE: Elasticsearch comes with reasonable defaults for most settings.

#       Before you set out to tweak and tune the configuration, make sure you

#       understand what are you trying to accomplish and the consequences.

#

# The primary way of configuring a node is via this file. This template lists

# the most important settings you may want to configure for a production cluster.

#

# Please see the documentation for further information on configuration options:

# <http://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration.html>

#

# ---------------------------------- Cluster -----------------------------------

#

# Use a descriptive name for your cluster:

# 配置es的集群名称，默认是elasticsearch，es会自动发现在同一网段下的es，如果在同一网段下有多个集群，就可以用这个属性来区分不同的集群。

 cluster.name: idoall_org

#

# ------------------------------------ Node ------------------------------------

#

# Use a descriptive name for the node:

# 节点名称

 node.name: node-1

#

# Add custom attributes to the node:

#

# node.rack: r1

#

# 指定该节点是否有资格被选举成为node，默认是true，es是默认集群中的第一台机器为master，如果这台机挂了就会重新选举master。

# node.master: true

#

# 指定该节点是否存储索引数据，默认为true。

# node.data: true

#

# master和data同时配置会产生一些奇异的效果：

#         1) 当master为false，而data为true时，会对该节点产生严重负荷；

#         2) 当master为true，而data为false时，该节点作为一个协调者；

#         3) 当master为false，data也为false时，该节点就变成了一个负载均衡器。

# ----------------------------------- Paths ------------------------------------

#

# 设置索引数据的存储路径，默认是es根目录下的data文件夹，可以设置多个存储路径，用逗号隔开

# path.data: /path/to/data

#

# 设置日志文件的存储路径，默认是es根目录下的logs文件夹

# path.logs: /path/to/logs

#

# 设置配置文件的存储路径，默认是es根目录下的config文件夹。

# path.conf: /path/to/conf

#

# 设置插件的存放路径，默认是es根目录下的plugins文件夹

# path.plugins: /path/to/plugins

# ----------------------------------- Memory -----------------------------------

#

# 设置为true来锁住内存。因为当jvm开始swapping时es的效率 会降低，所以要保证它不swap，可以把ES_MIN_MEM和ES_MAX_MEM两个环境变量设置成同一个值，并且保证机器有足够的内存分配给es。 同时也要允许elasticsearch的进程可以锁住内存，linux下可以通过`ulimit -l unlimited`命令。

# bootstrap.mlockall: true

#

# Make sure that the `ES_HEAP_SIZE` environment variable is set to about half the memory

# available on the system and that the owner of the process is allowed to use this limit.

#

# Elasticsearch performs poorly when the system is swapping the memory.

#

# ---------------------------------- Network -----------------------------------

#

# 绑定host，0.0.0.0代表所有IP，为了安全考虑，建议设置为内网IP

 network.host: 192.168.0.25

#

# 对外提供http服务的端口，安全考虑，建议修改，不用默认的9200

# http.port: 9200

#

# 节点到节点之间的交互是使用tcp的，这个设置设置启用的端口，默认是9300-9400

# transport.tcp.port: 9300

#

# For more information, see the documentation at:

# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html>

#

# --------------------------------- Discovery ----------------------------------

#

# Pass an initial list of hosts to perform discovery when new node is started:

# The default list of hosts is ["127.0.0.1", "[::1]"]

# 设置集群中master节点的初始列表，可以通过这些节点来自动发现新加入集群的节点。

# discovery.zen.ping.unicast.hosts: ["host1", "host2"]

#

# Prevent the "split brain" by configuring the majority of nodes (total number of nodes / 2 + 1):

# 设置这个参数来保证集群中的节点可以知道其它N个有master资格的节点

# discovery.zen.minimum_master_nodes: 3

#

# 设置集群中自动发现其它节点时ping连接超时时间，默认为3秒，对于比较差的网络环境可以高点的值来防止自动发现时出错。

# discovery.zen.ping.timeout: 3s

#

# 设置是否打开多播发现节点，默认是true。

# discovery.zen.ping.multicast.enabled: false

#

# For more information, see the documentation at:

# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery.html>

#

# ---------------------------------- Gateway -----------------------------------

#

# Block initial recovery after a full cluster restart until N nodes are started:

#

# 设置这个集群中节点的数量，默认为2，一旦这N个节点启动，就会立即进行数据恢复。

# gateway.expected_nodes: 2

#

# 设置初始化数据恢复进程的超时时间，默认是5分钟。

# gateway.recover_after_time: 5m

#

# 设置集群中N个节点启动时进行数据恢复，默认为1。

# gateway.recover_after_nodes: 3

#

# For more information, see the documentation at:

# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-gateway.html>

#

# ---------------------------------- Various -----------------------------------

#

# Disable starting multiple nodes on a single system:

#

# node.max_local_storage_nodes: 1

#

# Require explicit names when deleting indices:

#

# action.destructive_requires_name: true

3.7、在ubuntu1中启动Elasticsearch2.3.3

　　Elashticsearch启动只需要执行一条命令：

lion@ubuntu1:~/tar/elasticsearch-2.3.3$ bin/elasticsearch

　　如果需要系统服务化，可以参考这篇文章使用Supervisor3.2.1基于Mac10.10.3对系统进程进行管理

　　

　　浏览安装过的elasticsearch-head：http://192.168.0.25:9200/_plugin/head/

　　

　　

浏览安装过的elasticsearch-sql：http://192.168.0.25:9200/_plugin/sql/

　　

　　

4、Elasticsearch的集群配置

4.1、修改三台机器的hosts配置

　　在上文提到的ubuntu1、ubuntu2、ubuntu3中分别修改/etc/hosts文件如下

lion@ubuntu1:~$ cat /etc/hosts

127.0.0.1	localhost

# 127.0.1.1	ubuntu1

192.168.0.25	ubuntu1

192.168.0.26	ubuntu2

192.168.0.27	ubuntu3

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback

fe00::0 ip6-localnet

ff00::0 ip6-mcastprefix

ff02::1 ip6-allnodes

ff02::2 ip6-allrouters

4.2、修改三台机器的Elashticsearch配置

　　将在ubuntu1上的Elasticsearch文件夹复制到ubuntu2、ubuntu3上面。

　　分别修改三台机器上的配置文件config/elasticsearch.yml。

　　

　　ubuntu1机器上面的Elasticsearch的配置文件内容如下：

# ======================== Elasticsearch Configuration =========================

#

# NOTE: Elasticsearch comes with reasonable defaults for most settings.

#       Before you set out to tweak and tune the configuration, make sure you

#       understand what are you trying to accomplish and the consequences.

#

# The primary way of configuring a node is via this file. This template lists

# the most important settings you may want to configure for a production cluster.

#

# Please see the documentation for further information on configuration options:

# <http://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration.html>

#

# ---------------------------------- Cluster -----------------------------------

#

# Use a descriptive name for your cluster:

# 配置es的集群名称，默认是elasticsearch，es会自动发现在同一网段下的es，如果在同一网段下有多个集群，就可以用这个属性来区分不同的集群。

 cluster.name: idoall_org

#

# ------------------------------------ Node ------------------------------------

#

# Use a descriptive name for the node:

# 节点名称

 node.name: node-1

#

# Add custom attributes to the node:

#

# node.rack: r1

#

# 指定该节点是否有资格被选举成为node，默认是true，es是默认集群中的第一台机器为master，如果这台机挂了就会重新选举master。

# node.master: true

#

# 指定该节点是否存储索引数据，默认为true。

# node.data: true

#

# master和data同时配置会产生一些奇异的效果：

#         1) 当master为false，而data为true时，会对该节点产生严重负荷；

#         2) 当master为true，而data为false时，该节点作为一个协调者；

#         3) 当master为false，data也为false时，该节点就变成了一个负载均衡器。

# ----------------------------------- Paths ------------------------------------

#

# 设置索引数据的存储路径，默认是es根目录下的data文件夹，可以设置多个存储路径，用逗号隔开

# path.data: /path/to/data

#

# 设置日志文件的存储路径，默认是es根目录下的logs文件夹

# path.logs: /path/to/logs

#

# 设置配置文件的存储路径，默认是es根目录下的config文件夹。

# path.conf: /path/to/conf

#

# 设置插件的存放路径，默认是es根目录下的plugins文件夹

# path.plugins: /path/to/plugins

# ----------------------------------- Memory -----------------------------------

#

# 设置为true来锁住内存。因为当jvm开始swapping时es的效率 会降低，所以要保证它不swap，可以把ES_MIN_MEM和ES_MAX_MEM两个环境变量设置成同一个值，并且保证机器有足够的内存分配给es。 同时也要允许elasticsearch的进程可以锁住内存，linux下可以通过`ulimit -l unlimited`命令。

# bootstrap.mlockall: true

#

# Make sure that the `ES_HEAP_SIZE` environment variable is set to about half the memory

# available on the system and that the owner of the process is allowed to use this limit.

#

# Elasticsearch performs poorly when the system is swapping the memory.

#

# ---------------------------------- Network -----------------------------------

#

# 绑定host，0.0.0.0代表所有IP，为了安全考虑，建议设置为内网IP

 network.host: 192.168.0.25

#

# 对外提供http服务的端口，安全考虑，建议修改，不用默认的9200

# http.port: 9200

#

# 节点到节点之间的交互是使用tcp的，这个设置设置启用的端口，默认是9300-9400

# transport.tcp.port: 9300

#

# For more information, see the documentation at:

# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html>

#

# --------------------------------- Discovery ----------------------------------

#

# Pass an initial list of hosts to perform discovery when new node is started:

# The default list of hosts is ["127.0.0.1", "[::1]"]

# 设置集群中master节点的初始列表，可以通过这些节点来自动发现新加入集群的节点。

 discovery.zen.ping.unicast.hosts: ["192.168.0.26", "192.168.0.27"]

#

# Prevent the "split brain" by configuring the majority of nodes (total number of nodes / 2 + 1):

# 设置这个参数来保证集群中的节点可以知道其它N个有master资格的节点

# discovery.zen.minimum_master_nodes: 3

#

# 设置集群中自动发现其它节点时ping连接超时时间，默认为3秒，对于比较差的网络环境可以高点的值来防止自动发现时出错。

# discovery.zen.ping.timeout: 3s

#

# 设置是否打开多播发现节点，默认是true。

# discovery.zen.ping.multicast.enabled: false

#

# For more information, see the documentation at:

# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery.html>

#

# ---------------------------------- Gateway -----------------------------------

#

# Block initial recovery after a full cluster restart until N nodes are started:

#

# 设置这个集群中节点的数量，默认为2，一旦这N个节点启动，就会立即进行数据恢复。

# gateway.expected_nodes: 2

#

# 设置初始化数据恢复进程的超时时间，默认是5分钟。

# gateway.recover_after_time: 5m

#

# 设置集群中N个节点启动时进行数据恢复，默认为1。

# gateway.recover_after_nodes: 3

#

# For more information, see the documentation at:

# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-gateway.html>

#

# ---------------------------------- Various -----------------------------------

#

# Disable starting multiple nodes on a single system:

#

# node.max_local_storage_nodes: 1

#

# Require explicit names when deleting indices:

#

# action.destructive_requires_name: true

　　

　　ubuntu2机器上面的Elasticsearch的配置文件内容如下：

# ======================== Elasticsearch Configuration =========================

#

# NOTE: Elasticsearch comes with reasonable defaults for most settings.

#       Before you set out to tweak and tune the configuration, make sure you

#       understand what are you trying to accomplish and the consequences.

#

# The primary way of configuring a node is via this file. This template lists

# the most important settings you may want to configure for a production cluster.

#

# Please see the documentation for further information on configuration options:

# <http://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration.html>

#

# ---------------------------------- Cluster -----------------------------------

#

# Use a descriptive name for your cluster:

# 配置es的集群名称，默认是elasticsearch，es会自动发现在同一网段下的es，如果在同一网段下有多个集群，就可以用这个属性来区分不同的集群。

 cluster.name: idoall_org

#

# ------------------------------------ Node ------------------------------------

#

# Use a descriptive name for the node:

# 节点名称

 node.name: node-2

#

# Add custom attributes to the node:

#

# node.rack: r1

#

# 指定该节点是否有资格被选举成为node，默认是true，es是默认集群中的第一台机器为master，如果这台机挂了就会重新选举master。

# node.master: true

#

# 指定该节点是否存储索引数据，默认为true。

# node.data: true

#

# master和data同时配置会产生一些奇异的效果：

#         1) 当master为false，而data为true时，会对该节点产生严重负荷；

#         2) 当master为true，而data为false时，该节点作为一个协调者；

#         3) 当master为false，data也为false时，该节点就变成了一个负载均衡器。

# ----------------------------------- Paths ------------------------------------

#

# 设置索引数据的存储路径，默认是es根目录下的data文件夹，可以设置多个存储路径，用逗号隔开

# path.data: /path/to/data

#

# 设置日志文件的存储路径，默认是es根目录下的logs文件夹

# path.logs: /path/to/logs

#

# 设置配置文件的存储路径，默认是es根目录下的config文件夹。

# path.conf: /path/to/conf

#

# 设置插件的存放路径，默认是es根目录下的plugins文件夹

# path.plugins: /path/to/plugins

# ----------------------------------- Memory -----------------------------------

#

# 设置为true来锁住内存。因为当jvm开始swapping时es的效率 会降低，所以要保证它不swap，可以把ES_MIN_MEM和ES_MAX_MEM两个环境变量设置成同一个值，并且保证机器有足够的内存分配给es。 同时也要允许elasticsearch的进程可以锁住内存，linux下可以通过`ulimit -l unlimited`命令。

# bootstrap.mlockall: true

#

# Make sure that the `ES_HEAP_SIZE` environment variable is set to about half the memory

# available on the system and that the owner of the process is allowed to use this limit.

#

# Elasticsearch performs poorly when the system is swapping the memory.

#

# ---------------------------------- Network -----------------------------------

#

# 绑定host，0.0.0.0代表所有IP，为了安全考虑，建议设置为内网IP

 network.host: 192.168.0.26

#

# 对外提供http服务的端口，安全考虑，建议修改，不用默认的9200

# http.port: 9200

#

# 节点到节点之间的交互是使用tcp的，这个设置设置启用的端口，默认是9300-9400

# transport.tcp.port: 9300

#

# For more information, see the documentation at:

# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html>

#

# --------------------------------- Discovery ----------------------------------

#

# Pass an initial list of hosts to perform discovery when new node is started:

# The default list of hosts is ["127.0.0.1", "[::1]"]

# 设置集群中master节点的初始列表，可以通过这些节点来自动发现新加入集群的节点。

 discovery.zen.ping.unicast.hosts: ["192.168.0.25", "192.168.0.27"]

#

# Prevent the "split brain" by configuring the majority of nodes (total number of nodes / 2 + 1):

# 设置这个参数来保证集群中的节点可以知道其它N个有master资格的节点

# discovery.zen.minimum_master_nodes: 3

#

# 设置集群中自动发现其它节点时ping连接超时时间，默认为3秒，对于比较差的网络环境可以高点的值来防止自动发现时出错。

# discovery.zen.ping.timeout: 3s

#

# 设置是否打开多播发现节点，默认是true。

# discovery.zen.ping.multicast.enabled: false

#

# For more information, see the documentation at:

# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery.html>

#

# ---------------------------------- Gateway -----------------------------------

#

# Block initial recovery after a full cluster restart until N nodes are started:

#

# 设置这个集群中节点的数量，默认为2，一旦这N个节点启动，就会立即进行数据恢复。

# gateway.expected_nodes: 2

#

# 设置初始化数据恢复进程的超时时间，默认是5分钟。

# gateway.recover_after_time: 5m

#

# 设置集群中N个节点启动时进行数据恢复，默认为1。

# gateway.recover_after_nodes: 3

#

# For more information, see the documentation at:

# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-gateway.html>

#

# ---------------------------------- Various -----------------------------------

#

# Disable starting multiple nodes on a single system:

#

# node.max_local_storage_nodes: 1

#

# Require explicit names when deleting indices:

#

# action.destructive_requires_name: true

　　

　　ubuntu3机器上面的Elasticsearch的配置文件内容如下：

# ======================== Elasticsearch Configuration =========================

#

# NOTE: Elasticsearch comes with reasonable defaults for most settings.

#       Before you set out to tweak and tune the configuration, make sure you

#       understand what are you trying to accomplish and the consequences.

#

# The primary way of configuring a node is via this file. This template lists

# the most important settings you may want to configure for a production cluster.

#

# Please see the documentation for further information on configuration options:

# <http://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration.html>

#

# ---------------------------------- Cluster -----------------------------------

#

# Use a descriptive name for your cluster:

# 配置es的集群名称，默认是elasticsearch，es会自动发现在同一网段下的es，如果在同一网段下有多个集群，就可以用这个属性来区分不同的集群。

 cluster.name: idoall_org

#

# ------------------------------------ Node ------------------------------------

#

# Use a descriptive name for the node:

# 节点名称

 node.name: node-3

#

# Add custom attributes to the node:

#

# node.rack: r1

#

# 指定该节点是否有资格被选举成为node，默认是true，es是默认集群中的第一台机器为master，如果这台机挂了就会重新选举master。

# node.master: true

#

# 指定该节点是否存储索引数据，默认为true。

# node.data: true

#

# master和data同时配置会产生一些奇异的效果：

#         1) 当master为false，而data为true时，会对该节点产生严重负荷；

#         2) 当master为true，而data为false时，该节点作为一个协调者；

#         3) 当master为false，data也为false时，该节点就变成了一个负载均衡器。

# ----------------------------------- Paths ------------------------------------

#

# 设置索引数据的存储路径，默认是es根目录下的data文件夹，可以设置多个存储路径，用逗号隔开

# path.data: /path/to/data

#

# 设置日志文件的存储路径，默认是es根目录下的logs文件夹

# path.logs: /path/to/logs

#

# 设置配置文件的存储路径，默认是es根目录下的config文件夹。

# path.conf: /path/to/conf

#

# 设置插件的存放路径，默认是es根目录下的plugins文件夹

# path.plugins: /path/to/plugins

# ----------------------------------- Memory -----------------------------------

#

# 设置为true来锁住内存。因为当jvm开始swapping时es的效率 会降低，所以要保证它不swap，可以把ES_MIN_MEM和ES_MAX_MEM两个环境变量设置成同一个值，并且保证机器有足够的内存分配给es。 同时也要允许elasticsearch的进程可以锁住内存，linux下可以通过`ulimit -l unlimited`命令。

# bootstrap.mlockall: true

#

# Make sure that the `ES_HEAP_SIZE` environment variable is set to about half the memory

# available on the system and that the owner of the process is allowed to use this limit.

#

# Elasticsearch performs poorly when the system is swapping the memory.

#

# ---------------------------------- Network -----------------------------------

#

# 绑定host，0.0.0.0代表所有IP，为了安全考虑，建议设置为内网IP

 network.host: 192.168.0.27

#

# 对外提供http服务的端口，安全考虑，建议修改，不用默认的9200

# http.port: 9200

#

# 节点到节点之间的交互是使用tcp的，这个设置设置启用的端口，默认是9300-9400

# transport.tcp.port: 9300

#

# For more information, see the documentation at:

# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html>

#

# --------------------------------- Discovery ----------------------------------

#

# Pass an initial list of hosts to perform discovery when new node is started:

# The default list of hosts is ["127.0.0.1", "[::1]"]

# 设置集群中master节点的初始列表，可以通过这些节点来自动发现新加入集群的节点。

 discovery.zen.ping.unicast.hosts: ["192.168.0.25", "192.168.0.26"]

#

# Prevent the "split brain" by configuring the majority of nodes (total number of nodes / 2 + 1):

# 设置这个参数来保证集群中的节点可以知道其它N个有master资格的节点

# discovery.zen.minimum_master_nodes: 3

#

# 设置集群中自动发现其它节点时ping连接超时时间，默认为3秒，对于比较差的网络环境可以高点的值来防止自动发现时出错。

# discovery.zen.ping.timeout: 3s

#

# 设置是否打开多播发现节点，默认是true。

# discovery.zen.ping.multicast.enabled: false

#

# For more information, see the documentation at:

# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery.html>

#

# ---------------------------------- Gateway -----------------------------------

#

# Block initial recovery after a full cluster restart until N nodes are started:

#

# 设置这个集群中节点的数量，默认为2，一旦这N个节点启动，就会立即进行数据恢复。

# gateway.expected_nodes: 2

#

# 设置初始化数据恢复进程的超时时间，默认是5分钟。

# gateway.recover_after_time: 5m

#

# 设置集群中N个节点启动时进行数据恢复，默认为1。

# gateway.recover_after_nodes: 3

#

# For more information, see the documentation at:

# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-gateway.html>

#

# ---------------------------------- Various -----------------------------------

#

# Disable starting multiple nodes on a single system:

#

# node.max_local_storage_nodes: 1

#

# Require explicit names when deleting indices:

#

# action.destructive_requires_name: true

4.3、启动Elashticsearch集群

　　在三台机器上，分别启动Elasticsearch,执行以下命令,以ubuntu1为例：

lion@ubuntu1:~/tar/elasticsearch-2.3.3$ pwd

/home/lion/tar/elasticsearch-2.3.3

lion@ubuntu1:~/tar/elasticsearch-2.3.3$ bin/elasticsearch

　　

　　启动后，浏览任意一台机器的elasticsearch-head，都可以看到集群和节点的状态，如下图：

　　

　　

4.4、测试集群写入数据

　　向ubuntu1中写入一条数据：

lion@ubuntu1:~$ curl -XPUT 'http://ubuntu1:9200/dept/employee/32' -d '{ "empname": "emp32"}'

　　这时再浏览elasticsearch-head，可以发现索引的分配，可以看到ubuntu1、ubuntu2、ubuntu3分别有不同的主分片：

　　

　　

　　在任意一台机器上执行搜索命令，可以看到我们刚才插入的数据，以下命令以查询ubuntu3机器为例，可以看到查询结果：

lion@ubuntu1:~$ curl -XGET 'http://ubuntu3:9200/dept/employee/32'

4.5、模拟节点宕机，集群重新选择主从节点

　　复制分片的数量，可以在运行中的集群动态调整。首先我们查询集群状态，我们可以看到总的分片是10个，主分片5个，三个节点。

lion@ubuntu1:~$ curl -XGET 'http://ubuntu2:9200/_cluster/health?pretty=true'

{

  "cluster_name" : "idoall_org",

  "status" : "green",

  "timed_out" : false,

  "number_of_nodes" : 3,

  "number_of_data_nodes" : 3,

  "active_primary_shards" : 5,

  "active_shards" : 10,

  "relocating_shards" : 0,

  "initializing_shards" : 0,

  "unassigned_shards" : 0,

  "delayed_unassigned_shards" : 0,

  "number_of_pending_tasks" : 0,

  "number_of_in_flight_fetch" : 0,

  "task_max_waiting_in_queue_millis" : 0,

  "active_shards_percent_as_number" : 100.0

}

　　

　　接下来我们将dep的复制分片调整为2个：

lion@ubuntu1:~$ curl -XPUT 'http://ubuntu1:9200/dept/_settings' -d '{"number_of_replicas" : 2}'

　　返回信息如下：

{"acknowledged":true}

　　再浏览集群的状态，可以发现分片重新进行了分配，主分片分配在ubuntu1和ubuntu3机器上面：

　　

　　

　　我们关闭掉ubuntu3上的Elasticsearch节点，可以看到以下状态，有5个复制分片待分：

　　

　　

　　集群的状态变为了黄色，Elasticsearch的集群状态主要有三种颜色:green、yellow、red。

green：所有主要分片和复制分片都可用

yellow：所有主要分片可用，但不是所有复制分片都可用

red：不是所有的主要分片都可用

　　实际运行过程中，我们也可以根据集群的状态缩小复制分片的数量，保证集群一直是高可用状态。

　　再执行命令，将复制分片调整为1个：

lion@ubuntu1:~$ curl -XPUT 'http://ubuntu1:9200/dept/_settings' -d '{"number_of_replicas" : 1}'