ElasticSearch常用的很受欢迎的是IK,这里稍微介绍下安装过程及测试过程。
 

1、ElasticSearch官方分词

自带的中文分词器很弱,可以体检下:

[zsz@VS-zsz ~]$ curl -XGET 'http://192.168.31.77:9200/_analyze?analyzer=standard' -d '岁月如梭'

{

    "tokens": [

        {

            "token": "岁",

            "start_offset": 0,

            "end_offset": 1,

            "type": "<IDEOGRAPHIC>",

            "position": 0

        },

        {

            "token": "月",

            "start_offset": 1,

            "end_offset": 2,

            "type": "<IDEOGRAPHIC>",

            "position": 1

        },

        {

            "token": "如",

            "start_offset": 2,

            "end_offset": 3,

            "type": "<IDEOGRAPHIC>",

            "position": 2

        },

        {

            "token": "梭",

            "start_offset": 3,

            "end_offset": 4,

            "type": "<IDEOGRAPHIC>",

            "position": 3

        }

    ]

}
[zsz@VS-zsz ~]$ curl -XGET 'http://192.168.31.77:9200/_analyze?analyzer=standard' -d 'i am an enginner'

{

    "tokens": [

        {

            "token": "i",

            "start_offset": 0,

            "end_offset": 1,

            "type": "<ALPHANUM>",

            "position": 0

        },

        {

            "token": "am",

            "start_offset": 2,

            "end_offset": 4,

            "type": "<ALPHANUM>",

            "position": 1

        },

        {

            "token": "an",

            "start_offset": 5,

            "end_offset": 7,

            "type": "<ALPHANUM>",

            "position": 2

        },

        {

            "token": "enginner",

            "start_offset": 8,

            "end_offset": 16,

            "type": "<ALPHANUM>",

            "position": 3

        }

    ]

}
由此看见,ES的官方中文分词能力较差。
 
2、IK中文分词器
 
2.1、如何你下载的ik是源码半,需要打包该分词器,linux安装maven

tar zxvf apache-maven-3.0.5-bin.tar.gz
mv apache-maven-3.0.5 /usr/local/apache-maven-3.0.5
vi /etc/profile
增加:
export MAVEN_HOME=/usr/local/apache-maven-3.0.5

export PATH=$PATH:$MAVEN_HOME/bin
 
source /etc/profile 
mvn -v
2.2、对源码打包得到target/目录下的内容
 
mvn clean package 
 
将打包好的IK插件内容部署到ES中:
[zsz@VS-zsz ~]$ cd /home/zsz/elasticsearch-analysis-ik-1.10.0/target/releases/
[zsz@VS-zsz releases]$ mkdir /usr/local/elasticsearch-2.4.0/plugins/ik/
[zsz@VS-zsz releases]$ cp elasticsearch-analysis-ik-1.10.0.zip /usr/local/elasticsearch-2.4.0/plugins/ik/elasticsearch-analysis-ik-1.10.0.zip
[zsz@VS-zsz releases]$ unzip /usr/local/elasticsearch-2.4.0/plugins/ik/elasticsearch-analysis-ik-1.10.0.zip
[zsz@VS-zsz releases]$ cd /usr/local/elasticsearch-2.4.0/plugins/ik/
[zsz@VS-zsz ik]$ rm elasticsearch-analysis-ik-1.10.0.zip
[zsz@VS-zsz ik]$ mkdir /usr/local/elasticsearch-2.4.0/config/ik
 
将IK的配置copy到ElasticSearch的配置中:
[zsz@VS-zsz ik]$ cp /home/zsz/elasticsearch-analysis-ik-1.10.0/config /usr/local/elasticsearch-2.4.0/config/ik
 
更改ElasticSearch的配置:
[zsz@VS-zsz ik]$ vi /usr/local/elasticsearch-2.4.0/config/elasticsearch.yml
在最后加上分词解析器的配置:
index.analysis.analyzer.ik.type : "ik"
 
启动ElasticSearch:
[zsz@VS-zsz ik]$ cd  /usr/local/elasticsearch-2.4.0/
[zsz@VS-zsz elasticsearch-2.4.0]$ ./bin/elasticsearch -d
 
测试IK分词器的效果:
[zsz@VS-zsz elasticsearch-2.4.0]$ curl -XGET 'http://192.168.31.77:9200/_analyze?analyzer=ik' -d '岁月如梭'
{

    "tokens": [

        {

            "token": "岁月如梭",

            "start_offset": 0,

            "end_offset": 4,

            "type": "CN_WORD",

            "position": 0

        },

        {

            "token": "岁月",

            "start_offset": 0,

            "end_offset": 2,

            "type": "CN_WORD",

            "position": 1

        },

        {

            "token": "如梭",

            "start_offset": 2,

            "end_offset": 4,

            "type": "CN_WORD",

            "position": 2

        },

        {

            "token": "梭",

            "start_offset": 3,

            "end_offset": 4,

            "type": "CN_WORD",

            "position": 3

        }

    ]

}
[zsz@VS-zsz config]$ curl -XGET 'http://192.168.31.77:9200/_analyze?analyzer=ik' -d 'elasticsearch很受欢迎的的一款拥有活跃社区开源的搜索解决方案'
{

    "tokens": [

        {

            "token": "elasticsearch",

            "start_offset": 0,

            "end_offset": 13,

            "type": "CN_WORD",

            "position": 0

        },

        {

            "token": "elastic",

            "start_offset": 0,

            "end_offset": 7,

            "type": "CN_WORD",

            "position": 1

        },

        {

            "token": "很受",

            "start_offset": 13,

            "end_offset": 15,

            "type": "CN_WORD",

            "position": 2

        },

        {

            "token": "受欢迎",

            "start_offset": 14,

            "end_offset": 17,

            "type": "CN_WORD",

            "position": 3

        },

        {

            "token": "欢迎",

            "start_offset": 15,

            "end_offset": 17,

            "type": "CN_WORD",

            "position": 4

        },

        {

            "token": "一款",

            "start_offset": 19,

            "end_offset": 21,

            "type": "CN_WORD",

            "position": 5

        },

        {

            "token": "一",

            "start_offset": 19,

            "end_offset": 20,

            "type": "TYPE_CNUM",

            "position": 6

        },

        {

            "token": "款",

            "start_offset": 20,

            "end_offset": 21,

            "type": "COUNT",

            "position": 7

        },

        {

            "token": "拥有",

            "start_offset": 21,

            "end_offset": 23,

            "type": "CN_WORD",

            "position": 8

        },

        {

            "token": "拥",

            "start_offset": 21,

            "end_offset": 22,

            "type": "CN_WORD",

            "position": 9

        },

        {

            "token": "有",

            "start_offset": 22,

            "end_offset": 23,

            "type": "CN_CHAR",

            "position": 10

        },

        {

            "token": "活跃",

            "start_offset": 23,

            "end_offset": 25,

            "type": "CN_WORD",

            "position": 11

        },

        {

            "token": "跃",

            "start_offset": 24,

            "end_offset": 25,

            "type": "CN_WORD",

            "position": 12

        },

        {

            "token": "社区",

            "start_offset": 25,

            "end_offset": 27,

            "type": "CN_WORD",

            "position": 13

        },

        {

            "token": "开源",

            "start_offset": 27,

            "end_offset": 29,

            "type": "CN_WORD",

            "position": 14

        },

        {

            "token": "搜索",

            "start_offset": 30,

            "end_offset": 32,

            "type": "CN_WORD",

            "position": 15

        },

        {

            "token": "索解",

            "start_offset": 31,

            "end_offset": 33,

            "type": "CN_WORD",

            "position": 16

        },

        {

            "token": "索",

            "start_offset": 31,

            "end_offset": 32,

            "type": "CN_WORD",

            "position": 17

        },

        {

            "token": "解决方案",

            "start_offset": 32,

            "end_offset": 36,

            "type": "CN_WORD",

            "position": 18

        },

        {

            "token": "解决",

            "start_offset": 32,

            "end_offset": 34,

            "type": "CN_WORD",

            "position": 19

        },

        {

            "token": "方案",

            "start_offset": 34,

            "end_offset": 36,

            "type": "CN_WORD",

            "position": 20

        }

    ]

}
 
可以看到,中文分词变得更加合理。
 本文地址:http://www.cnblogs.com/zhongshengzhen/p/elasticsearch_ik.html
 

ElasticSearch中文分词(IK)的更多相关文章

  1. java中调用ElasticSearch中文分词ik没有起作用

    问题描述: 项目中已经将'齐鲁壹点'加入到扩展词中,但是使用客户端调用的时候,高亮显示还是按照单个文字分词的: 解决方案: 1.创建Mapping使用的分词使用ik 2.查询使用QueryBuilde ...

  2. Elasticsearch 中文分词(elasticsearch-analysis-ik) 安装

    由于elasticsearch基于lucene,所以天然地就多了许多lucene上的中文分词的支持,比如 IK, Paoding, MMSEG4J等lucene中文分词原理上都能在elasticsea ...

  3. ES5中文分词(IK)

    ElasticSearch5中文分词(IK) ElasticSearch安装 官网:https://www.elastic.co 1.ElasticSearch安装 1.1.下载安装公共密钥 rpm ...

  4. elasticsearch 中文分词(elasticsearch-analysis-ik)安装

    elasticsearch 中文分词(elasticsearch-analysis-ik)安装 下载最新的发布版本 https://github.com/medcl/elasticsearch-ana ...

  5. ElasticSearch(三) ElasticSearch中文分词插件IK的安装

    正因为Elasticsearch 内置的分词器对中文不友好,会把中文分成单个字来进行全文检索,所以我们需要借助中文分词插件来解决这个问题. 一.安装maven管理工具 Elasticsearch 要使 ...

  6. ElasticSearch 中文分词插件ik 的使用

    下载 IK 的版本要与 Elasticsearch 的版本一致,因此下载 7.1.0 版本. 安装 1.中文分词插件下载地址:https://github.com/medcl/elasticsearc ...

  7. elasticsearch中文分词器(ik)配置

    elasticsearch默认的分词:http://localhost:9200/userinfo/_analyze?analyzer=standard&pretty=true&tex ...

  8. ElasticSearch中文分词器-IK分词器的使用

    IK分词器的使用 首先我们通过Postman发送GET请求查询分词效果 GET http://localhost:9200/_analyze { "text":"农业银行 ...

  9. ElasticSearch5中文分词(IK)

    ElasticSearch安装 官网:https://www.elastic.co 1.ElasticSearch安装 1.1.下载安装公共密钥 rpm --import https://artifa ...

随机推荐

  1. UVALive 3486/zoj 2615 Cells(栈模拟dfs)

    这道题在LA是挂掉了,不过还好,zoj上也有这道题. 题意:好大一颗树,询问父子关系..考虑最坏的情况,30w层,2000w个点,询问100w次,貌似连dfs一遍都会TLE. 安心啦,这肯定是一道正常 ...

  2. hdu 4635 Strongly connected(强连通)

    考强连通缩点,算模板题吧,比赛的时候又想多了,大概是不自信吧,才开始认真搞图论,把题目想复杂了. 题意就是给你任意图,保证是simple directed graph,问最多加多少条边能使图仍然是si ...

  3. apache开源项目--subversion

    Subversion exists to be universally recognized and adopted as an open-source, centralized version co ...

  4. 最简单的视音频播放示例5:OpenGL播放RGB/YUV

    本文记录OpenGL播放视频的技术.OpenGL是一个和Direct3D同一层面的技术.相比于Direct3D,OpenGL具有跨平台的优势.尽管在游戏领域,DirectX的影响力已渐渐超越OpenG ...

  5. wav文件格式分析详解

    wav文件格式分析详解 文章转载自:http://blog.csdn.net/BlueSoal/article/details/932395 一.综述    WAVE文件作为多媒体中使用的声波文件格式 ...

  6. [转] C# 泛型类型参数的约束

    啊.紫原文C# 泛型类型参数的约束 在定义泛型类时,可以对客户端代码能够在实例化类时用于类型参数的类型种类施加限制.如果客户端代码尝试使用某个约束所不允许的类型来实例化类,则会产生编译时错误.这些限制 ...

  7. Hilbert先生旅馆的故事

    以前上实变函数的时候稍微讲了下这个故事呢. 来自Hansschwarzkopf 很久很久以前,在欧洲某国的一个小镇上,Hilbert先生开了一家拥有无数个房间的旅馆.一天,旅馆生意红火得一塌糊涂,不到 ...

  8. 基于Fragment实现Tab的切换,滑出侧边栏

    最近在学习Fragment(碎片)这是android3.0以后提出的概念,很多pad上面的设置部分都是通过Fragment来实现的,先看看具体的效果吧(图一)  (图二) (图三)第一章图片是初始时的 ...

  9. 翻译【ElasticSearch Server】第一章:开始使用ElasticSearch集群(4)

    停止ElasticSearch(Shutting down ElasticSearch) 尽管我们期望集群(或节点)终生完美运行,我们最终可能需要重启或者正确的停止它(例如,维护).有三种方式来停止E ...

  10. 查看系统或者Jmeter的Properties

    工作台-非测试元件-Property Display,可以显示系统或者Jmeter的Properties