lucene的多种搜索2-SpanQuery

SpanQuery按照词在文章中的距离或者查询几个相邻词的查询

SpanQuery包括以下几种：

SpanTermQuery：词距查询的基础，结果和TermQuery相似，只不过是增加了查询结果中单词的距离信息。

SpanFirstQuery：在指定距离可以找到第一个单词的查询。

SpanNearQuery：查询的几个语句之间保持者一定的距离。

SpanOrQuery：同时查询几个词句查询。

SpanNotQuery：从一个词距查询结果中，去除一个词距查询。

下面一个简单例子介绍

package com;
//SpanQuery：跨度查询。此类为抽象类。
import java.io.IOException;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.List;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Index;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.spans.SpanFirstQuery;
import org.apache.lucene.search.spans.SpanNearQuery;
import org.apache.lucene.search.spans.SpanNotQuery;
import org.apache.lucene.search.spans.SpanOrQuery;
import org.apache.lucene.search.spans.SpanQuery;
import org.apache.lucene.search.spans.SpanTermQuery;
import org.apache.lucene.search.spans.Spans;
import org.apache.lucene.store.RAMDirectory;
public class SpanQueryTest {
private RAMDirectory directory;
private IndexSearcher indexSearcher;
private IndexReader reader;
private SpanTermQuery quick;
private SpanTermQuery brown;
private SpanTermQuery red;
private SpanTermQuery fox;
private SpanTermQuery lazy;
private SpanTermQuery sleepy;
private SpanTermQuery dog;
private SpanTermQuery cat;
private Analyzer analyzer;
// 索引及初使化
public void index() throws IOException {
directory = new RAMDirectory();
analyzer = new WhitespaceAnalyzer();
IndexWriter writer = new IndexWriter(directory, analyzer, true);
Document doc1 = new Document();
doc1.add(new Field("field",
"the quick brown fox jumps over the lazy dog", Store.YES,
Index.TOKENIZED));
Document doc2 = new Document();
doc2.add(new Field("field",
"the quick red fox jumps over the sleepy cat", Store.YES,
Index.TOKENIZED));
writer.addDocument(doc1);
writer.addDocument(doc2);
writer.optimize();
writer.close();
quick = new SpanTermQuery(new Term("field", "quick"));
brown = new SpanTermQuery(new Term("field", "brown"));
red = new SpanTermQuery(new Term("field", "red"));
fox = new SpanTermQuery(new Term("field", "fox"));
lazy = new SpanTermQuery(new Term("field", "lazy"));
sleepy = new SpanTermQuery(new Term("field", "sleepy"));
dog = new SpanTermQuery(new Term("field", "dog"));
cat = new SpanTermQuery(new Term("field", "cat"));
indexSearcher = new IndexSearcher(directory);
reader = IndexReader.open(directory);
}
private void dumpSpans(SpanQuery query) throws IOException {
// 检索效果和TermQuery一样,可以把他当成TermQuery
Hits hits = indexSearcher.search(query);
for (int i = ; i < hits.length(); i++) {
// System.out.println(hits.doc(i).get("field"));
}
// 但内部会记录一些位置信息，供SpanQuery的其它API使用，是其它属于SpanQuery的Query的基础。
Spans spans = query.getSpans(reader);
int numSpans = ;
float[] scores = new float[];
for (int i = ; i < hits.length(); i++) {
scores[hits.id(i)] = hits.score(i);
}
while (spans.next()) {
numSpans++;
int id = spans.doc();
Document doc = reader.document(id);
Token[] tokens = AnalyzerUtils.tokensFromAnalysis(analyzer, doc
.get("field"));
StringBuffer buffer = new StringBuffer();
for (int i = ; i < tokens.length; i++) {
// the quick brown fox jumps over the lazy dog
// spans记录了位置信息,比如搜索brown,brown在这句话中位于第三个位置,所以spans.start()=2,spans.end()=3
// 在第二项的位置后加<,第三项后加> 返回<brown>
if (i == spans.start()) {
buffer.append("<");
}
buffer.append(tokens[i].termText());
if (i + == spans.end()) {
buffer.append(">");
}
buffer.append(" ");
}
buffer.append("(" + scores[id] + ") ");
System.out.println(buffer);
}
// indexSearcher.close();
}
// SpanTermQuery：检索效果完全同TermQuery，但内部会记录一些位置信息，供SpanQuery的其它API使用，是其它属于SpanQuery的Query的基础。
public void spanTermQueryTest() throws IOException {
dumpSpans(brown);
//// 搜索结果
// the quick <brown> fox jumps over the lazy dog (0.22097087)
}
// SpanFirstQuery：查找方式为从Field的内容起始位置开始，在一个固定的宽度内查找所指定的词条。
public void spanFirstQueryTest() throws IOException {
// the quick brown fox jumps over the lazy dog
// 在给定的范围搜索,前两个为the quick
// brown 在doc1的第三个位置,用SpanFirstQuery从起点查找的话,他的跨度必须为>=3才能找到
SpanFirstQuery firstQuery = new SpanFirstQuery(brown, );
dumpSpans(firstQuery);
////搜索结果
// the quick <brown> fox jumps over the lazy dog (0.22097087)
}
// SpanNearQuery：功能类似PharaseQuery。SpanNearQuery查找所匹配的不一定是短语，还有可能是另一个SpanQuery的查询结果作为整体考虑，进行嵌套查询。
public void spanNearQueryTest() throws IOException {
// the quick brown fox jumps over the lazy dog
// 第二个参数为两个项的位置之间允许的最大间隔
// 在这里两个较远的项为quick和fox,他们之是的最大间隔为5,所以slop必须>=5才能搜到结果
SpanNearQuery nearQuery = new SpanNearQuery(new SpanQuery[] { quick,
brown, fox }, , true);
dumpSpans(nearQuery);
// 与PhraseQuery短语搜索相似
// 这里搜索quick,dog,brown,要想得到结果,就要将brown向后移动5个位置才能到dog的后面,所以slop要>=5才能找到结果
// 第三个参数,如果为true表示保持各项位置不变,顺序搜索
nearQuery = new SpanNearQuery(new SpanQuery[] { quick, dog, brown }, ,
false);
dumpSpans(nearQuery);
//////搜索结果/////
// 第一个dumpSpans的结果 the <quick brown fox> jumps over the lazy dog (0.34204215)
// 第二个dumpSpans的结果 the <quick brown fox jumps over the lazy dog> (0.27026406)
}
// 从第一个SpanQuery查询结果中，去掉第二个SpanQuery查询结果，作为检索结果
public void spanNotQueryTest() throws IOException {
// the quick brown fox jumps over the lazy dog
SpanNearQuery quick_fox = new SpanNearQuery(new SpanQuery[] { quick,
fox }, , true);
// 结果为quick brown fox 和 quick red fox
dumpSpans(quick_fox);
// SpanNotQuery quick_fox_dog = new SpanNotQuery(quick_fox, dog);
//
// dumpSpans(quick_fox_dog);
// 在quick_fox结果中,去掉red,结果为quick brown fox
SpanNotQuery no_quick_red_fox = new SpanNotQuery(quick_fox, red);
dumpSpans(no_quick_red_fox);
//////搜索结果///////第一个dumpSpans结果为前两条,第二个dumpSpans结果为第三条
//the <quick brown fox> jumps over the lazy dog (0.18579213)
//the <quick red fox> jumps over the sleepy cat (0.18579213)
//the <quick brown fox> jumps over the lazy dog (0.18579213)
}
// SpanOrQuery：把所有SpanQuery查询结果综合起来，作为检索结果。
public void spanOrQueryTest() throws IOException {
SpanNearQuery quick_fox = new SpanNearQuery(new SpanQuery[] { quick,
fox }, , true);
SpanNearQuery lazy_dog = new SpanNearQuery(
new SpanQuery[] { lazy, dog }, , true);
SpanNearQuery sleepy_cat = new SpanNearQuery(new SpanQuery[] { sleepy,
cat }, , true);
SpanNearQuery qf_near_ld = new SpanNearQuery(new SpanQuery[] {
quick_fox, lazy_dog }, , true);
dumpSpans(qf_near_ld);
SpanNearQuery qf_near_sc = new SpanNearQuery(new SpanQuery[] {
quick_fox, sleepy_cat }, , true);
dumpSpans(qf_near_sc);
SpanOrQuery or = new SpanOrQuery(new SpanQuery[] { qf_near_ld,
qf_near_sc });
dumpSpans(or);
/////////搜索结果第一个dumpSpans结果为第一条,第二个为第二条,第三个为第三,四条
// the <quick brown fox jumps over the lazy dog> (0.3321948)
// the <quick red fox jumps over the sleepy cat> (0.3321948)
// the <quick brown fox jumps over the lazy dog> (0.5405281)
// the <quick red fox jumps over the sleepy cat> (0.5405281)
}
public static void main(String[] args) throws IOException {
SpanQueryTest test = new SpanQueryTest();
test.index();
test.spanOrQueryTest();
}
}
class AnalyzerUtils {
public static Token[] tokensFromAnalysis(Analyzer analyzer, String text)
throws IOException {
TokenStream stream = analyzer.tokenStream("contents", new StringReader(
text));
boolean b = true;
List<Token> list = new ArrayList<Token>();
while (b) {
Token token = stream.next();
if (token == null)
b = false;
else
list.add(token);
}
return (Token[]) list.toArray(new Token[]);
}
}

lucene的多种搜索2-SpanQuery的更多相关文章

Apache Lucene(全文检索引擎)—搜索
目录返回目录:http://www.cnblogs.com/hanyinglong/p/5464604.html 本项目Demo已上传GitHub,欢迎大家fork下载学习:https://gith ...
Apache Solr采用Java开发、基于Lucene的全文搜索服务器
http://docs.spring.io/spring-data/solr/ 首先介绍一下solr: Apache Solr (读音: SOLer) 是一个开源.高性能.采用Java开发.基于Luc ...
Lucene的其他搜索(三)
生成索引: package com.wp.search; import java.nio.file.Paths; import org.apache.lucene.analysis.Analyzer; ...
基于 Lucene 的桌面文件搜索
开源2010年,自己在学习 Lucene 时开发的一款桌面文件搜索工具,这么多年过去了,代码一直静静存放在自己的硬盘上,与其让其沉睡,不如分享出来. 这款工具带有明显的模仿 Everything 的痕 ...
理解Lucene索引与搜索过程中的核心类
理解索引过程中的核心类执行简单索引的时候需要用的类有: IndexWriter.Directory.Analyzer.Document.Field 1.IndexWriter IndexWr ...
lucene索引并搜索mysql数据库[转]
由于对lucene比较感兴趣,本人在网上找了点资料,终于成功地用lucene对mysql数据库进行索引创建并成功搜索,先总结如下: 首先介绍一个jdbc工具类,用于得到Connection对象: im ...
Lucene多字段搜索
最近在学习Lucene的过程中遇到了需要多域搜索并排序的问题,在网上找了找,资料不是很多,现在都列出来,又需要的可以自己认真看看,都是从其他网站粘贴过来的,所以比较乱,感谢原创的作者们! 使用 ...
WebGIS中解决使用Lucene进行兴趣点搜索排序的两种思路
文章版权由作者李晓晖和博客园共有,若转载请于明显处标明出处:http://www.cnblogs.com/naaoveGIS/. 1.背景目前跟信息采集相关的一个项目提出了这样的一个需求:中国银行等 ...
Lucene.net 高亮显示搜索词
网站搜索关键词,往往搜索的结果中,要把用户搜索的词突出显示出来,这就是高亮搜索词的含义.而lucene也恰恰支持这样的操作.在此,我用的是盘古的组件,代码如下: PanGu.HighLight.Sim ...

随机推荐

[转]详细的mysql时间和日期函数
这里是一个使用日期函数的例子.下面的查询选择了所有记录,其date_col的值是在最后30天以内: mysql> SELECT something FROM table WHERE TO_DAY ...
AWS ec2 vpn 搭建（20161014更新http://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-8.noarch.rpm）
1.原来的SoftEther VPN Server在pc端不可用了,没找到原因,因此有搜索到了一个新方法,转自http://blog.csdn.net/henryng1994/article/deta ...
LeetCode OJ 33. Search in Rotated Sorted Array
Suppose a sorted array is rotated at some pivot unknown to you beforehand. (i.e., 0 1 2 4 5 6 7 migh ...
php pdo and pdostatement
Relationship between PDO class and PDOStatement class up vote2down votefavorite I'm a php and my ...
C#生成word
using Microsoft.Office.Interop.Word; using System; using System.Collections.Generic; using System.Co ...
50个很棒的Python模块
50个很棒的Python模块我很喜欢Python,Python具有强大的扩展能力,我列出了50个很棒的Python模块,包含几乎所有的需要:比如Databases,GUIs,Images, Soun ...
javascript 如何正确使用getElementById，getElementsByName(), and getElementsByTagName()
WEB标准下可以通过getElementById(), getElementsByName(), and getElementsByTagName()访问DOCUMENT中的任一个标签. (1)get ...
下载google play上的APP
googol搜索 download apk directly,然后即可看到
其他应用和技巧-用Json格式来保存数据
-------------------- <script type="text/javascript"> //定义json变量 ...
《JS权威指南学习总结--第二章词法结构》
第二章词法结构内容要点: 一.注释 1. //表示单行注释 2. /*这里是一段注释*/ 3.一般编辑器里加注释是:选中要加注释的语句,按 ctrl+/ 二.直接量所谓直接量,就是程序中直接使用的 ...

lucene的多种搜索2-SpanQuery

lucene的多种搜索2-SpanQuery的更多相关文章

随机推荐

热门专题