lucene IndexOptions可以设置DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS DOCS,ES里也可以设置
org.apache.lucene.index
Enum Constants Enum Constant and Description DOCS_AND_FREQS
Only documents and term frequencies are indexed: positions are omitted.DOCS_AND_FREQS_AND_POSITIONS
Indexes documents, frequencies and positions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
Indexes documents, frequencies, positions and offsets.DOCS_ONLY
Only documents are indexed: term frequencies and positions are omitted.
Java Code Examples for org.apache.lucene.index.IndexOptions
Project: languagetool File: EmptyLuceneIndexCreator.java View source code | 6 votes |
public static void main(String[] args) throws IOException {
if (args.length != 1) {
System.out.println("Usage: " + EmptyLuceneIndexCreator.class.getSimpleName() + " <indexPath>");
System.exit(1);
}
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig config = new IndexWriterConfig(analyzer);
Directory directory = FSDirectory.open(new File(args[0]).toPath());
IndexWriter writer = new IndexWriter(directory, config); FieldType fieldType = new FieldType();
fieldType.setIndexOptions(IndexOptions.DOCS);
fieldType.setStored(true);
Field countField = new Field("totalTokenCount", String.valueOf(0), fieldType);
Document doc = new Document();
doc.add(countField);
writer.addDocument(doc); writer.close();
}
index_options are "options" for the index you are searching on, a
datastructure that holds "terms" to document lists (posting lists).
TermVectors are a datastructure that gives you the "terms" for a given
document and in addition their position in the document as well as their
start and end character offsets. Now the index (each field has such an
index) holds a sorted list of terms and each term points to a posting list.
these posting lists are a list of documents that contain the term. On the
posting list you can also store information like frequencies (how often did
term Y occur in document X -> useful for scoring) as well as "positions"
(at which position did term Y occur in document X -> this is required fo
phrase & span queries).
if you have for instance a field that you only use for filtering you don't
need freqs and postions so documents only will do the job. In an index the
position information is the biggest piece of data usually aside stored
fields. If you don't do phrase queries or spans you don't need them at all
so safe the disk space and improve perf by only use docs and freqs. In
previous version it wasn't possible to have only freqs but no positions
(index_options supersede omit_term_frequencies_and_positions) so this is an
improvement overall since the most common usecase might only need freqs but
no positions.
1:term_vector
TermVector.YES: Only store number of occurrences.
TermVector.WITH_POSITIONS: Store number of occurrence and positions of terms, but no offset.
TermVector.WITH_OFFSETS: Store number of occurrence and offsets of terms, but no positions.
TermVector.WITH_POSITIONS_OFFSETS:number of occurrence and positions , offsets of terms.
TermVector.NO:Don't store any term vector information.
2: index_options
Allows to set the indexing options, possible values are docs (only doc numbers are indexed), freqs (doc numbers and term frequencies), and positions (doc numbers, term frequencies and positions). Defaults to positions for analyzed fields, and to docs for not_analyzed fields. It is also possible to set it to offsets (doc numbers, term frequencies, positions and offsets).
lucene IndexOptions可以设置DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS DOCS,ES里也可以设置的更多相关文章
- ES里设置索引中倒排列表仅仅存文档ID——采用docs存储后可以降低pos文件和cfs文件大小
index_options The index_options parameter controls what information is added to the inverted index, ...
- 在package.json里面的script设置环境变量,区分开发及生产环境。注意mac与windows的设置方式不一样
在package.json里面的script设置环境变量,区分开发及生产环境. 注意mac与windows的设置方式不一样. "scripts": { "publish- ...
- 导航栏和里面的View设置的是同一颜色值,实际运行又不一样.
导航栏和里面的View设置的是同一颜色值,实际运行又不一样.如何保证两者的颜色一致呢? 答案就是:( navigationBar.translucent = NO; ) 去除 导航条的分割线(黑 ...
- 14.3.3 Locks Set by Different SQL Statements in InnoDB 不同的SQL语句在InnoDB里的锁设置
14.3.3 Locks Set by Different SQL Statements in InnoDB 不同的SQL语句在InnoDB里的锁设置 locking read, 一个UPDATE,或 ...
- Ubuntu里字符编码设置
Ubuntu里字符编码设置 Ubuntu系统在默认的状况下只支持中文UTF-8编码,但是我们写的一些文档,还有java代码编译时采用gbk编码.所以需要修改.步骤如下: www.2cto.com ...
- spring里的事物设置
有的人说事物在spring里设置有两种,其实事物设置在spring配置文件中共有五种方式:第一种方式:每个Bean都有一个代理第二种方式:所有Bean共享一个代理基类第三种方式:使用拦截器第四种方式: ...
- FL studio里的项目设置介绍
FL studio作为具有众多音乐功能,能够制作多轨音频录制,排序和混音的一款专业软件,我们可以借助VST主机,灵活的调音台,高级MIDI和ReWire支持,来创建专业品质的各种音乐曲目. 而今天我们 ...
- Android ViewPager里的所有图片设置监听打开同一活动显示不同图片
Android ViewPager里的所有图片设置监听请看前一文章 为了省时所以2层菜单只做一个点击任意图片后显示相应图片的活动 关键点是每个点击对应的图片如何传参给显示的活动 因为只启动一个活动,所 ...
- 在tomcat启动时解析xml文件,获取特定标签的属性值,并将属性值设置到静态变量里
这里以解析hibernate.cfg.xml数据库配置信息为例,运用dom4j的解析方式来解析xml文件. 1.在javaWeb工程里新建一个java类,命名为GetXmlValue.java,为xm ...
随机推荐
- Vue的学习笔记
以下文章皆为观看慕课网https://www.imooc.com/learn/796中“河畔一角”老师的讲解做的笔记,仅供参考. 一.Vue特点 Vue是MVVM的框架,也就是模型视图->视图模 ...
- 前端编译原理 笔记 -- BISON
前面总结的差不多了,这边记录下,零碎的相关阅读可以备忘的一些知识点 Bsion文档,下面是中文的地址 https://blog.csdn.net/chinamming/article/details ...
- ssh免密登陆(简单快捷)
介绍免密登陆配合下边这张图可以了解下过程: 假设现在A要通过免密登陆B 在A上的操作: 1.终端输入ssh-keygen (后边可以指定加密算法:-t 算法,如果不指定就是默认的rsa) 原理: 首先 ...
- NLP使用pytorch框架,pytorch安装
pytorch的安装方法及出现问题的解决方案: 安装pytorch,使用pip 安装,在运行代码的时候会报错,但是导包的时候不会报错,因此要采用conda的方式安装 1.找到miniconda的网 ...
- mysql(函数,存储过程,事务,索引)
函数 MySQL中提供了许多内置函数: 内置函数 一.数学函数 ROUND(x,y) 返回参数x的四舍五入的有y位小数的值 RAND() 返回0到1内的随机值,可以通过提供一个参数(种子)使RAND( ...
- CQRS项目
CQRS+ES项目解析-Diary.CQRS 在<当我们在讨论CQRS时,我们在讨论些神马>中,我们讨论了当使用CQRS的过程中,需要关心的一些问题.其中与CQRS关联最为紧密的模式莫 ...
- 新一代纳秒级高带宽仿真工具平台——HAC Express
HAC Express是基于FPGA的模型仿真开发环境,专注于高精度建模和超高速实时仿真,弥补了传统仿真工具平台无法进行纳秒级仿真的短板. HAC系列自推出以来,经历了从v ...
- Dymola — 多学科系统仿真平台
Dymola 是法国Dassault Systems公司的多学科系统仿真平台,广泛应用于国内外汽车.工业.交通.能源等行业的系统总体架构设计.指标分解以及系统功能验证及优化等.Dymo ...
- python实现查找算法
搜索是在一个项目集合中找到一个特定项目的算法过程.搜索通常的答案是真的或假的,因为该项目是否存在. 搜索的几种常见方法:顺序查找.二分法查找.二叉树查找.哈希查找 线性查找线性查找就是从头找到尾,直到 ...
- MySQL进阶15--TCL事务控制语言--建立结束事务/设置断点--默认隔离级别--脏读/幻读/不可重复读
#TCL事物控制语言 : /* Transaction control language : 事物控制语言 事务: 一个或者一组sql语句组成一个执行单元,这个执行单元要么全部执行,要么全部不执行; ...