[solr] - spell check

solr提供了一个spell check，又叫suggestions，可以用于查询输入的自动完成功能auto-complete。

参考文献：

https://cwiki.apache.org/confluence/display/solr/Spell+Checking

http://www.cnblogs.com/ibook360/archive/2011/11/30/2269077.html

方法：

修改core的solrconfig.xml

加入这段到<config />内

    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">

      <lst name="spellchecker">

        <str name="name">wordbreak</str>

        <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>

        <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>

        <str name="field">content</str>

        <str name="combineWords">true</str>

        <str name="breakWords">true</str>

        <int name="maxChanges">10</int>

      </lst>

    </searchComponent>

    <requestHandler name="/spellcheck" class="org.apache.solr.handler.component.SearchHandler">

      <lst name="defaults">

        <str name="spellcheck">true</str>

        <str name="spellcheck.dictionary">wordbreak</str>

        <str name="spellcheck.count">20</str>

      </lst>

      <arr name="last-components">

        <str>spellcheck</str>

      </arr>

    </requestHandler>

schema.xml配置：

<?xml version="1.0" ?>

<schema name="my core" version="1.1">

    <fieldtype name="string"  class="solr.StrField" sortMissingLast="true" omitNorms="true"/>

    <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>

    <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>

    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>

    <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>

    <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>

    <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>

    <fieldtype name="binary" class="solr.BinaryField"/>

    <fieldType name="text_cn" class="solr.TextField">

        <analyzer type="index" class="org.wltea.analyzer.lucene.IKAnalyzer" />

        <analyzer type="query" class="org.wltea.analyzer.lucene.IKAnalyzer" />

        <analyzer>

            <tokenizer class="solr.KeywordTokenizerFactory"/>

            <filter class="solr.LowerCaseFilterFactory"/>

        </analyzer>

    </fieldType>

    <!-- general -->

    <field name="id" type="long" indexed="true" stored="true" multiValued="false" required="true"/>

    <field name="subject" type="text_cn" indexed="true" stored="true" />

    <field name="content" type="text_cn" indexed="true" stored="true" />

    <field name="category_id" type="long" indexed="true" stored="true" />

    <field name="category_name" type="text_cn" indexed="true" stored="true" />

    <field name="last_update_time" type="tdate" indexed="true" stored="true" />

    <field name="_version_" type="long" indexed="true" stored="true"/>

     <!-- field to use to determine and enforce document uniqueness. -->

     <uniqueKey>id</uniqueKey>

     <!-- field for the QueryParser to use when an explicit fieldname is absent -->

     <defaultSearchField>subject</defaultSearchField>

     <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->

     <solrQueryParser defaultOperator="OR"/>

</schema>

关键在于这句：

        <analyzer>

            <tokenizer class="solr.KeywordTokenizerFactory"/>

            <filter class="solr.LowerCaseFilterFactory"/>

        </analyzer>

意思是词组搜索

设置完xml，重启tomcat，在浏览器中运行：

http://localhost:8899/solr/mycore/spellcheck?spellcheck.build=true

运行结果：

然后在浏览器中运行：

http://localhost:8899/solr/mycore/spellcheck?q=中央&rows=0

运行结果：

Java代码：

Java bean:

package com.my.entity;

import java.util.Date;

import org.apache.solr.client.solrj.beans.Field;

public class Item {

    @Field

    private long id;

    @Field

    private String subject;

    @Field

    private String content;

    @Field("category_id")

    private long categoryId;

    @Field("category_name")

    private String categoryName;

    @Field("last_update_time")

    private Date lastUpdateTime;

    public long getId() {

        return id;

    }

    public void setId(long id) {

        this.id = id;

    }

    public String getSubject() {

        return subject;

    }

    public void setSubject(String subject) {

        this.subject = subject;

    }

    public String getContent() {

        return content;

    }

    public void setContent(String content) {

        this.content = content;

    }

    public long getCategoryId() {

        return categoryId;

    }

    public void setCategoryId(long categoryId) {

        this.categoryId = categoryId;

    }

    public String getCategoryName() {

        return categoryName;

    }

    public void setCategoryName(String categoryName) {

        this.categoryName = categoryName;

    }

    public Date getLastUpdateTime() {

        return lastUpdateTime;

    }

    public void setLastUpdateTime(Date lastUpdateTime) {

        this.lastUpdateTime = lastUpdateTime;

    }

}

测试代码：

package com.my.solr;

import java.io.IOException;

import java.util.ArrayList;

import java.util.Date;

import java.util.List;

import java.util.Map;

import org.apache.solr.client.solrj.SolrQuery;

import org.apache.solr.client.solrj.SolrServerException;

import org.apache.solr.client.solrj.impl.HttpSolrServer;

import org.apache.solr.client.solrj.impl.XMLResponseParser;

import org.apache.solr.client.solrj.response.QueryResponse;

import org.apache.solr.client.solrj.response.SpellCheckResponse;

import org.apache.solr.client.solrj.response.SpellCheckResponse.Collation;

import org.apache.solr.client.solrj.response.SpellCheckResponse.Correction;

import org.apache.solr.client.solrj.response.SpellCheckResponse.Suggestion;

import com.my.entity.Item;

public class TestSolr {

    public static void main(String[] args) throws IOException, SolrServerException {

        String url = "http://localhost:8899/solr/mycore";

        HttpSolrServer core = new HttpSolrServer(url);

        core.setMaxRetries(1);

        core.setConnectionTimeout(5000);

        core.setParser(new XMLResponseParser()); // binary parser is used by default

        core.setSoTimeout(1000); // socket read timeout

        core.setDefaultMaxConnectionsPerHost(100);

        core.setMaxTotalConnections(100);

        core.setFollowRedirects(false); // defaults to false

        core.setAllowCompression(true);

        // ------------------------------------------------------

        // remove all data

        // ------------------------------------------------------

        core.deleteByQuery("*:*");

        List<Item> items = new ArrayList<Item>();

        items.add(makeItem(1, "cpu", "this is intel cpu", 1, "cpu-intel"));

        items.add(makeItem(2, "cpu AMD", "this is AMD cpu", 2, "cpu-AMD"));

        items.add(makeItem(3, "cpu intel", "this is intel-I7 cpu", 1, "cpu-intel"));

        items.add(makeItem(4, "cpu AMD", "this is AMD 5000x cpu", 2, "cpu-AMD"));

        items.add(makeItem(5, "cpu intel I6", "this is intel-I6 cpu", 1, "cpu-intel-I6"));

        items.add(makeItem(6, "处理器", "中央处理器英特儿", 1, "cpu-intel"));

        items.add(makeItem(7, "处理器AMD", "中央处理器AMD", 2, "cpu-AMD"));

        items.add(makeItem(8, "中央处理器", "中央处理器Intel", 1, "cpu-intel"));

        items.add(makeItem(9, "中央空调格力", "格力中央空调", 3, "air"));

        items.add(makeItem(10, "中央空调海尔", "海尔中央空调", 3, "air"));

        items.add(makeItem(11, "中央空调美的", "美的中央空调", 3, "air"));

        core.addBeans(items);

        // commit

        core.commit();

        // ------------------------------------------------------

        // search

        // ------------------------------------------------------

        SolrQuery query = new SolrQuery();

        String token = "中央";

        query.set("qt", "/spellcheck");

        query.set("q", token);

        query.set("spellcheck", "on");

        query.set("spellcheck.build", "true");

        query.set("spellcheck.onlyMorePopular", "true");

        query.set("spellcheck.count", "100");

        query.set("spellcheck.alternativeTermCount", "4");

        query.set("spellcheck.onlyMorePopular", "true");

        query.set("spellcheck.extendedResults", "true");

        query.set("spellcheck.maxResultsForSuggest", "5");

        query.set("spellcheck.collate", "true");

        query.set("spellcheck.collateExtendedResults", "true");

        query.set("spellcheck.maxCollationTries", "5");

        query.set("spellcheck.maxCollations", "3");

        QueryResponse response = null;

        try {

            response = core.query(query);

            System.out.println("查询耗时：" + response.getQTime());

        } catch (SolrServerException e) {

            System.err.println(e.getMessage());

            e.printStackTrace();

        } catch (Exception e) {

            System.err.println(e.getMessage());

            e.printStackTrace();

        } finally {

            core.shutdown();

        }

        SpellCheckResponse spellCheckResponse = response.getSpellCheckResponse();

        if (spellCheckResponse != null) {

            List<Suggestion> suggestionList = spellCheckResponse.getSuggestions();

            for (Suggestion suggestion : suggestionList) {

                System.out.println("Suggestions NumFound: " + suggestion.getNumFound());

                System.out.println("Token: " + suggestion.getToken());

                System.out.print("Suggested: ");

                List<String> suggestedWordList = suggestion.getAlternatives();

                for (String word : suggestedWordList) {

                    System.out.println(word + ", ");

                }

                System.out.println();

            }

            System.out.println();

            Map<String, Suggestion> suggestedMap = spellCheckResponse.getSuggestionMap();

            for (Map.Entry<String, Suggestion> entry : suggestedMap.entrySet()) {

                System.out.println("suggestionName: " + entry.getKey());

                Suggestion suggestion = entry.getValue();

                System.out.println("NumFound: " + suggestion.getNumFound());

                System.out.println("Token: " + suggestion.getToken());

                System.out.print("suggested: ");

                List<String> suggestedList = suggestion.getAlternatives();

                for (String suggestedWord : suggestedList) {

                    System.out.print(suggestedWord + ", ");

                }

                System.out.println("\n\n");

            }

            Suggestion suggestion = spellCheckResponse.getSuggestion(token);

            System.out.println("NumFound: " + suggestion.getNumFound());

            System.out.println("Token: " + suggestion.getToken());

            System.out.print("suggested: ");

            List<String> suggestedList = suggestion.getAlternatives();

            for (String suggestedWord : suggestedList) {

                System.out.print(suggestedWord + ", ");

            }

            System.out.println("\n\n");

            System.out.println("The First suggested word for solr is : " + spellCheckResponse.getFirstSuggestion(token));

            System.out.println("\n\n");

            List<Collation> collatedList = spellCheckResponse.getCollatedResults();

            if (collatedList != null) {

                for (Collation collation : collatedList) {

                    System.out.println("collated query String: " + collation.getCollationQueryString());

                    System.out.println("collation Num: " + collation.getNumberOfHits());

                    List<Correction> correctionList = collation.getMisspellingsAndCorrections();

                    for (Correction correction : correctionList) {

                        System.out.println("original: " + correction.getOriginal());

                        System.out.println("correction: " + correction.getCorrection());

                    }

                    System.out.println();

                }

            }

            System.out.println();

            System.out.println("The Collated word: " + spellCheckResponse.getCollatedResult());

            System.out.println();

        }

        System.out.println("查询耗时：" + response.getQTime());

    }

    private static Item makeItem(long id, String subject, String content, long categoryId, String categoryName) {

        Item item = new Item();

        item.setId(id);

        item.setSubject(subject);

        item.setContent(content);

        item.setLastUpdateTime(new Date());

        item.setCategoryId(categoryId);

        item.setCategoryName(categoryName);

        return item;

    }

}

测试结果：

这种方式可以使用于对现在数据内容的查询拼写检查。

[solr] - spell check的更多相关文章

VIM 拼写/spell check
VIM 拼写检查/spell check 一.Hunspell科普 Hunspell 作为一个拼写检查的工具,已经用在了许多开源的以及商业软件中.包括Google Chrome, Libreoffic ...
Solr 6.7学习笔记（06）-- spell check
拼写检查也是搜索引擎必备的功能.Solr中提供了SpellCheckComponent 来实现此功能.我看过<Solr In Action>,是基于Solr4.X版本的,那时Suggest ...
Word: How to Temporarily Disable Spell Check in Word
link: http://johnlamansky.com/tech/disable-word-spell-check/ 引用: Word 2010 Click the “File” button C ...
1.7.7 Spell Checking -拼写检查
1. SpellCheck SpellCheck组件设计的目的是基于其他,相似,terms来提供内联查询建议.这些建议的依据可以是solr字段中的terms,外部可以创建文本文件, 或者其实lucen ...
solr拼写检查配置
拼写检查功能,能在搜索时,提供一个较好用户体验,所以,主流的搜索引擎都有这个功能. 那么什么是拼写检查,其实很好理解,就是你输入的搜索词,可能是你输错了,也有可能在它的检索库里面根本不存在这个词,但是 ...
Importing/Indexing database (MySQL or SQL Server) in Solr using Data Import Handler--转载
原文地址:https://gist.github.com/maxivak/3e3ee1fca32f3949f052 Install Solr download and install Solr fro ...
Solr 6.7学习笔记（03）-- 样例配置文件 solrconfig.xml
位于:${solr.home}\example\techproducts\solr\techproducts\conf\solrconfig.xml <?xml version="1. ...
Solr基础知识二（导入数据）
上一篇讲述了solr的安装启动过程,这一篇讲述如何导入数据到solr里. 一.准备数据 1.1 学生相关表创建学生表.学生专业关联表.专业表.学生行业关联表.行业表.基础信息表,并创建一条小白的信息 ...
【Nutch2.3基础教程】集成Nutch/Hadoop/Hbase/Solr构建搜索引擎：安装及运行【集群环境】
1.下载相关软件,并解压版本号如下: (1)apache-nutch-2.3 (2) hadoop-1.2.1 (3)hbase-0.92.1 (4)solr-4.9.0 并解压至/opt/jedi ...

随机推荐

Samba Linux 和windows 共享
1.安装Samba (yum install Samba) 2.配置Samba (Samba的配置文件为/etc/samba/smb.conf) 1)打开smb.conf vim /etc/s ...
guava学习--Ordering
转载:http://www.cnblogs.com/peida/p/Guava_Ordering.html Ordering是Guava类库提供的一个犀利强大的比较器工具,Guava的Ordering ...
PDF 补丁丁 0.4.2.1013 测试版发布：修复替换PDF文档字体等问题
新的测试版修复了若干问题,建议下载了旧测试版的用户更新.修复的问题列表: 替换PDF文档字体可能将字体宽度变成0. 书签编辑器在加载XML书签时不显示对应文档的预览. 自动生成书签功能复制文本筛选器后 ...
WCF初探-13：WCF客户端为双工服务创建回调对象
前言: 在WCF初探-5:WCF消息交换模式之双工通讯(Duplex)博文中,我讲解了双工通信服务的一个应用场景,即订阅和发布模式,这一篇,我将通过一个消息发送的例子讲解一下WCF客户端如何为双工服务 ...
iOS开发拓展篇—音频处理（音乐播放器1）
iOS开发拓展篇—音频处理(音乐播放器1) 说明:该系列文章通过实现一个简单的音乐播放器来介绍音频处理的相关知识点,需要重点注意很多细节的处理. 一.调整项目的结构,导入必要的素材调整后的项目结构如 ...
微信、QQ浏览器X5内核问题汇总
一. 资料汇总 1.前端H5调起QQ浏览器的总结:http://km.oa.com/group/22486/articles/show/210189?kmref=search 2.Android We ...
二模（15）day1
第一题: 题目大意: 有两个长度为N的序列A和B,在A和B中各任取一个数相加可以得到N2个和,求这N2个和中最小的N个. 解题过程: 1.这题是刘汝佳<<训练指南>>上的一道经 ...
Linux驱动之HelloWorld
最近看android的一些源码,里面有一些功能是用驱动实现的.于是就兴起看了一些驱动相关的东西,准备日后深入.这没有技术含量的水文,仅作为日后的备忘吧. 系统使用的是ubuntu 12.0.04,内核 ...
Python开发入门与实战6-表单
6. 表单从简朴的单个搜索框,到常见的Blog评论提交表单,再到复杂的自定义数据输入接口,HTML表单一直是交互性网站的重要交互手段.本章介绍如何用Django如何对用户通过表单提交的数据进行访问. ...
springmvc学习第三天
利用spring mvc 实现crud 1.导入jar包 commons-logging-1.2.jarjstl.jarspring-aop-4.1.6.RELEASE.jarspring-beans ...

[solr] - spell check

[solr] - spell check的更多相关文章

随机推荐

热门专题