[solr] - spell check
solr提供了一个spell check,又叫suggestions,可以用于查询输入的自动完成功能auto-complete。
参考文献:
https://cwiki.apache.org/confluence/display/solr/Spell+Checking
http://www.cnblogs.com/ibook360/archive/2011/11/30/2269077.html
方法:
修改core的solrconfig.xml
加入这段到<config />内
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="name">wordbreak</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<str name="field">content</str>
<str name="combineWords">true</str>
<str name="breakWords">true</str>
<int name="maxChanges">10</int>
</lst>
</searchComponent>
<requestHandler name="/spellcheck" class="org.apache.solr.handler.component.SearchHandler">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">wordbreak</str>
<str name="spellcheck.count">20</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
schema.xml配置:
<?xml version="1.0" ?>
<schema name="my core" version="1.1"> <fieldtype name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
<fieldtype name="binary" class="solr.BinaryField"/>
<fieldType name="text_cn" class="solr.TextField">
<analyzer type="index" class="org.wltea.analyzer.lucene.IKAnalyzer" />
<analyzer type="query" class="org.wltea.analyzer.lucene.IKAnalyzer" />
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType> <!-- general -->
<field name="id" type="long" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="subject" type="text_cn" indexed="true" stored="true" />
<field name="content" type="text_cn" indexed="true" stored="true" />
<field name="category_id" type="long" indexed="true" stored="true" />
<field name="category_name" type="text_cn" indexed="true" stored="true" />
<field name="last_update_time" type="tdate" indexed="true" stored="true" />
<field name="_version_" type="long" indexed="true" stored="true"/> <!-- field to use to determine and enforce document uniqueness. -->
<uniqueKey>id</uniqueKey> <!-- field for the QueryParser to use when an explicit fieldname is absent -->
<defaultSearchField>subject</defaultSearchField> <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
<solrQueryParser defaultOperator="OR"/>
</schema>
关键在于这句:
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
意思是词组搜索
设置完xml,重启tomcat,在浏览器中运行:
http://localhost:8899/solr/mycore/spellcheck?spellcheck.build=true
运行结果:

然后在浏览器中运行:
http://localhost:8899/solr/mycore/spellcheck?q=中央&rows=0
运行结果:

Java代码:
Java bean:
package com.my.entity;
import java.util.Date;
import org.apache.solr.client.solrj.beans.Field;
public class Item {
@Field
private long id;
@Field
private String subject;
@Field
private String content;
@Field("category_id")
private long categoryId;
@Field("category_name")
private String categoryName;
@Field("last_update_time")
private Date lastUpdateTime;
public long getId() {
return id;
}
public void setId(long id) {
this.id = id;
}
public String getSubject() {
return subject;
}
public void setSubject(String subject) {
this.subject = subject;
}
public String getContent() {
return content;
}
public void setContent(String content) {
this.content = content;
}
public long getCategoryId() {
return categoryId;
}
public void setCategoryId(long categoryId) {
this.categoryId = categoryId;
}
public String getCategoryName() {
return categoryName;
}
public void setCategoryName(String categoryName) {
this.categoryName = categoryName;
}
public Date getLastUpdateTime() {
return lastUpdateTime;
}
public void setLastUpdateTime(Date lastUpdateTime) {
this.lastUpdateTime = lastUpdateTime;
}
}
测试代码:
package com.my.solr; import java.io.IOException;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.Map; import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.HttpSolrServer;
import org.apache.solr.client.solrj.impl.XMLResponseParser;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.client.solrj.response.SpellCheckResponse;
import org.apache.solr.client.solrj.response.SpellCheckResponse.Collation;
import org.apache.solr.client.solrj.response.SpellCheckResponse.Correction;
import org.apache.solr.client.solrj.response.SpellCheckResponse.Suggestion; import com.my.entity.Item; public class TestSolr { public static void main(String[] args) throws IOException, SolrServerException {
String url = "http://localhost:8899/solr/mycore";
HttpSolrServer core = new HttpSolrServer(url);
core.setMaxRetries(1);
core.setConnectionTimeout(5000);
core.setParser(new XMLResponseParser()); // binary parser is used by default
core.setSoTimeout(1000); // socket read timeout
core.setDefaultMaxConnectionsPerHost(100);
core.setMaxTotalConnections(100);
core.setFollowRedirects(false); // defaults to false
core.setAllowCompression(true); // ------------------------------------------------------
// remove all data
// ------------------------------------------------------
core.deleteByQuery("*:*");
List<Item> items = new ArrayList<Item>();
items.add(makeItem(1, "cpu", "this is intel cpu", 1, "cpu-intel"));
items.add(makeItem(2, "cpu AMD", "this is AMD cpu", 2, "cpu-AMD"));
items.add(makeItem(3, "cpu intel", "this is intel-I7 cpu", 1, "cpu-intel"));
items.add(makeItem(4, "cpu AMD", "this is AMD 5000x cpu", 2, "cpu-AMD"));
items.add(makeItem(5, "cpu intel I6", "this is intel-I6 cpu", 1, "cpu-intel-I6"));
items.add(makeItem(6, "处理器", "中央处理器英特儿", 1, "cpu-intel"));
items.add(makeItem(7, "处理器AMD", "中央处理器AMD", 2, "cpu-AMD"));
items.add(makeItem(8, "中央处理器", "中央处理器Intel", 1, "cpu-intel"));
items.add(makeItem(9, "中央空调格力", "格力中央空调", 3, "air"));
items.add(makeItem(10, "中央空调海尔", "海尔中央空调", 3, "air"));
items.add(makeItem(11, "中央空调美的", "美的中央空调", 3, "air"));
core.addBeans(items);
// commit
core.commit(); // ------------------------------------------------------
// search
// ------------------------------------------------------
SolrQuery query = new SolrQuery();
String token = "中央";
query.set("qt", "/spellcheck");
query.set("q", token);
query.set("spellcheck", "on");
query.set("spellcheck.build", "true");
query.set("spellcheck.onlyMorePopular", "true"); query.set("spellcheck.count", "100");
query.set("spellcheck.alternativeTermCount", "4");
query.set("spellcheck.onlyMorePopular", "true"); query.set("spellcheck.extendedResults", "true");
query.set("spellcheck.maxResultsForSuggest", "5"); query.set("spellcheck.collate", "true");
query.set("spellcheck.collateExtendedResults", "true");
query.set("spellcheck.maxCollationTries", "5");
query.set("spellcheck.maxCollations", "3"); QueryResponse response = null; try {
response = core.query(query);
System.out.println("查询耗时:" + response.getQTime());
} catch (SolrServerException e) {
System.err.println(e.getMessage());
e.printStackTrace();
} catch (Exception e) {
System.err.println(e.getMessage());
e.printStackTrace();
} finally {
core.shutdown();
} SpellCheckResponse spellCheckResponse = response.getSpellCheckResponse();
if (spellCheckResponse != null) {
List<Suggestion> suggestionList = spellCheckResponse.getSuggestions();
for (Suggestion suggestion : suggestionList) {
System.out.println("Suggestions NumFound: " + suggestion.getNumFound());
System.out.println("Token: " + suggestion.getToken());
System.out.print("Suggested: ");
List<String> suggestedWordList = suggestion.getAlternatives();
for (String word : suggestedWordList) {
System.out.println(word + ", ");
}
System.out.println();
}
System.out.println();
Map<String, Suggestion> suggestedMap = spellCheckResponse.getSuggestionMap();
for (Map.Entry<String, Suggestion> entry : suggestedMap.entrySet()) {
System.out.println("suggestionName: " + entry.getKey());
Suggestion suggestion = entry.getValue();
System.out.println("NumFound: " + suggestion.getNumFound());
System.out.println("Token: " + suggestion.getToken());
System.out.print("suggested: "); List<String> suggestedList = suggestion.getAlternatives();
for (String suggestedWord : suggestedList) {
System.out.print(suggestedWord + ", ");
}
System.out.println("\n\n");
} Suggestion suggestion = spellCheckResponse.getSuggestion(token);
System.out.println("NumFound: " + suggestion.getNumFound());
System.out.println("Token: " + suggestion.getToken());
System.out.print("suggested: ");
List<String> suggestedList = suggestion.getAlternatives();
for (String suggestedWord : suggestedList) {
System.out.print(suggestedWord + ", ");
}
System.out.println("\n\n"); System.out.println("The First suggested word for solr is : " + spellCheckResponse.getFirstSuggestion(token));
System.out.println("\n\n"); List<Collation> collatedList = spellCheckResponse.getCollatedResults();
if (collatedList != null) {
for (Collation collation : collatedList) {
System.out.println("collated query String: " + collation.getCollationQueryString());
System.out.println("collation Num: " + collation.getNumberOfHits());
List<Correction> correctionList = collation.getMisspellingsAndCorrections();
for (Correction correction : correctionList) {
System.out.println("original: " + correction.getOriginal());
System.out.println("correction: " + correction.getCorrection());
}
System.out.println();
}
}
System.out.println();
System.out.println("The Collated word: " + spellCheckResponse.getCollatedResult());
System.out.println();
} System.out.println("查询耗时:" + response.getQTime());
} private static Item makeItem(long id, String subject, String content, long categoryId, String categoryName) {
Item item = new Item();
item.setId(id);
item.setSubject(subject);
item.setContent(content);
item.setLastUpdateTime(new Date());
item.setCategoryId(categoryId);
item.setCategoryName(categoryName);
return item;
}
}
测试结果:

这种方式可以使用于对现在数据内容的查询拼写检查。
[solr] - spell check的更多相关文章
- VIM 拼写/spell check
VIM 拼写检查/spell check 一.Hunspell科普 Hunspell 作为一个拼写检查的工具,已经用在了许多开源的以及商业软件中.包括Google Chrome, Libreoffic ...
- Solr 6.7学习笔记(06)-- spell check
拼写检查也是搜索引擎必备的功能.Solr中提供了SpellCheckComponent 来实现此功能.我看过<Solr In Action>,是基于Solr4.X版本的,那时Suggest ...
- Word: How to Temporarily Disable Spell Check in Word
link: http://johnlamansky.com/tech/disable-word-spell-check/ 引用: Word 2010 Click the “File” button C ...
- 1.7.7 Spell Checking -拼写检查
1. SpellCheck SpellCheck组件设计的目的是基于其他,相似,terms来提供内联查询建议.这些建议的依据可以是solr字段中的terms,外部可以创建文本文件, 或者其实lucen ...
- solr拼写检查配置
拼写检查功能,能在搜索时,提供一个较好用户体验,所以,主流的搜索引擎都有这个功能. 那么什么是拼写检查,其实很好理解,就是你输入的搜索词,可能是你输错了,也有可能在它的检索库里面根本不存在这个词,但是 ...
- Importing/Indexing database (MySQL or SQL Server) in Solr using Data Import Handler--转载
原文地址:https://gist.github.com/maxivak/3e3ee1fca32f3949f052 Install Solr download and install Solr fro ...
- Solr 6.7学习笔记(03)-- 样例配置文件 solrconfig.xml
位于:${solr.home}\example\techproducts\solr\techproducts\conf\solrconfig.xml <?xml version="1. ...
- Solr基础知识二(导入数据)
上一篇讲述了solr的安装启动过程,这一篇讲述如何导入数据到solr里. 一.准备数据 1.1 学生相关表 创建学生表.学生专业关联表.专业表.学生行业关联表.行业表.基础信息表,并创建一条小白的信息 ...
- 【Nutch2.3基础教程】集成Nutch/Hadoop/Hbase/Solr构建搜索引擎:安装及运行【集群环境】
1.下载相关软件,并解压 版本号如下: (1)apache-nutch-2.3 (2) hadoop-1.2.1 (3)hbase-0.92.1 (4)solr-4.9.0 并解压至/opt/jedi ...
随机推荐
- MVC4中的Display Mode简介
本文地址:http://www.cnblogs.com/egger/p/3400076.html 欢迎转载 ,请保留此链接๑•́ ₃•̀๑! 今天学习MVC4时,看到一个不错的特性"vie ...
- 高版本正方教务系统上传后缀过滤不严导致能直接上传Webshell
在旧版本中有一个利用插件上传文件的漏洞,但是在新版本中已经没有了这个插件.这个漏洞是由于过滤不严造成的,可以直接上传Webshell进行提权,由于代码在DLL中,全国大部分高校均有此漏洞,影响范围很大 ...
- bootstrap之消息提示
<!DOCTYPE html><html> <head> <title>Bootstrap</title> < ...
- 原生态AJAX详解和jquery对AJAX的封装
AJAX: A :Asynchronous [eI`sinkrenes] 异步 J :JavaScript JavaScript脚本语言 A: And X :XML 可扩展标记语言 AJAX现在 ...
- MYSQL 5.6中禁用INNODB引擎
并不是所有人都需要INNODB引擎,虽然它弥补了MYSQL缺乏事务支持的毛病,但是它的磁盘性能一直是让人比较担忧的.另外比较老的PHP系统,大多是采用MYISAM引擎在MYSQL建表,似乎INNODB ...
- Linux准确获取IP
有时搞一些跨网段的工程和应用,需要尽量准确的知道电信.网通.铁通等电信运营商的IP地址段分配情况,可网上的资料不但很少,而且经常都是N个月前的过期资料…… APNIC是管理亚太地区IP地址分配的机构, ...
- enmo_day_08
性能监视 管理内存组件 自动内存管理(AMM) : 指定分配给实例的总内存(SGA, PGA) 自动共享内存管理(ASMM) : 指定SGA, 管理分配给共享池, java池, 动态性能视图 :v$( ...
- 基于NodeJS的全栈式开发
前言 为了解决传统Web开发模式带来的各种问题,我们进行了许多尝试,但由于前/后端的物理鸿沟,尝试的方案都大同小异.痛定思痛,今天我们重新思考了“前后端”的定义,引入前端同学都熟悉的 NodeJS,试 ...
- Linq二 LinqToSql
虽然微软已经停止更新了LinqToSql,但是目前的已完全满足目前的需求. 第一步:添加LinqToSql 第二步:将其关联的Sqlserver数据库 第三步:数据库已变成实体类 第四步:可以对数据库 ...
- 在VS下使用 GitFlow管理项目开发
在VS下使用 GitFlow管理项目开发 1.右键将你的解决方案添加到源代码管理,如果你的VS没有安装git,会提示安装,安装完成之后,在团队资源管理可以看到如下界面 (图一) 2.安装gitflow ...