Elasticsearch template学习

Elasticsearch template

Elasticsearch存在一个关键问题就是索引的设置及字段的属性指定，最常见的问题就是，某个字段我们并不希望ES对其进行分词，但如果使用自动模板创建索引，那么默认是所有string类型的字段都会被分词的，因此必须要显式指定字段的not_analyzed属性，其它的比如控制索引的备份数，分片数等，也可以通过模板的套用来实现，并且模板可以通过通配符进行模糊匹配，即可对满足某一通配符的所有新建索引均套用统一模板，不需要为每个索引都建立模板。但也有一点局限性需要注意：模板在设置生效后，仅对ES集群中新建立的索引生效，而对已存在的索引及时索引名满足模板的匹配规则，也不会生效，因此如果需要改变现有索引的mapping信息，仍需要在正确的mapping基础上建立新的索引，并将数据从原索引拷贝至新索引，变更新索引别名为原索引这种方式来实现（改方法适用当前ES版本（1.7+~2.4+）），也许未来会有索引的直接迁移方案。
参考文章：
- http://www.cnblogs.com/huangfox/p/3544883.html
- https://www.elastic.co/guide/en/elasticsearch/reference/2.3/indices-templates.htm

一些重要的字段

_source:_source字段是自动生成的，以JSON格式存储索引文件。_source字段没有建索引，所以不可搜索。当执行“get”或者“search”操作时，默认会返回_source字段。_source字段消耗性能，所以可以屏蔽disable掉。enabale:false的情况下，默认检索只返回ID。例如:"_source":{"enabled":false}
_all:主要指的是All Field字段，我们可以将一个或都多个包含进去，在进行检索时无需指定字段的情况下检索多个字段。前提是你得开启All Field字段 “_all” : {“enabled” : true}。好处是你可以在_all里搜索那些你不在乎在哪个字段找到的东西。另一面是在创建索引和增大索引大小的时候会使用额外更多的CPU。所以如果你不用这个特性的话，关掉它。即使你用，最好也考虑一下定义清楚限定哪些字段包含进_all里。
"index":"analyzed":

1) analyzed -- 使用分词器将域值分解成独立的语汇单元流，并使每个语汇单元能被搜到，适用于普通文本域（如正文、标题、摘要等），通常需要设置“index_analyzer
2) not_analyzed -- 对域进行索引，但不对String值进行分析，实际上将域值作为单一语汇单元并使之能本搜索，适用于不能被分解的域值，如URL、文件路径、日期、电话等。
3) no -- 使用对应的域值不被搜索
"null_value":"none":为空添加的默认值
store:域存储选项store，用来确定是否需要存储域的真实值，以便后续搜集时能恢复这个值
1. yes -- 指定存储域值。该情况下，原始的字符串全部被保存在索引中，并可以由IndexReader类恢复。该选项对于需要展示搜索结果的一些域很有用（如URL、标题等）。如果索引的大小在搜索程序考虑之列的话，不要存储太大的域值，因为这些域值会消耗掉索引的存储空间
2. no -- 指定不存储域值。该选项通常跟Index.ANALYZED选项共同用来索引大的文本域值，这些域值不用恢复初始格式，如文本正文
omit_norms：norms记录了索引中index-time boost信息，但是当你进行搜索时可能会比较耗费内存。omit_norms = true则是忽略掉域加权信息，这样在搜索的时候就不会处理索引时刻的加权信息了

一个nginx-ccess日志解析后写的template

{

"order": 1,

"template": "logstash-app-trace-*",

"settings": {

"index": {

"number_of_shards": "3",

"number_of_replicas": "1",

"refresh_interval": "5s"

}

},

"mappings": {

"_default_": {

"properties": {

"request": {

"index": "not_analyzed",

"type": "string"

},

"span_name": {

"index": "not_analyzed",

"type": "string"

},

"body_bytes_sent": {

"type": "integer"

},

"type": {

"index": "not_analyzed",

"type": "string"

},

"http_user_agent": {

"index": "not_analyzed",

"type": "string"

},

"uid": {

"index": "not_analyzed",

"type": "string"

},

"protocol": {

"index": "not_analyzed",

"type": "string"

},

"request_time": {

"type": "long"

},

"node_type": {

"index": "not_analyzed",

"type": "string"

},

"pspan_id": {

"index": "not_analyzed",

"type": "string"

},

"remote_addr": {

"index": "not_analyzed",

"type": "string"

},

"trace_id": {

"index": "not_analyzed",

"type": "string"

},

"device_id": {

"index": "not_analyzed",

"type": "string"

},

"span_id": {

"index": "not_analyzed",

"type": "string"

},

"time_local": {

"index": "not_analyzed",

"type": "string"

},

"params": {

"index": "not_analyzed",

"type": "string"

},

"server_addr": {

"index": "not_analyzed",

"type": "string"

},

"url": {

"index": "not_analyzed",

"type": "string"

},

"request_body": {

"index": "not_analyzed",

"type": "string"

},

"http_referer": {

"index": "not_analyzed",

"type": "string"

},

"http_x_forwarded_for": {

"index": "not_analyzed",

"type": "string"

},

"upstream_response_time": {

"type": "integer"

},

"response_time": {

"type": "long"

},

"http_status": {

"type": "integer"

},

"result_code": {

"index": "not_analyzed",

"type": "string"

},

"status": {

"type": "integer"

},

"node_id": {

"index": "not_analyzed",

"type": "string"

}

},

"_all": {

"enabled": false

}

},

"aliases": {

"app-trace-template": {}

}