public final class Lucene54DocValuesFormat
extends DocValuesFormat
Lucene 5.4 DocValues format.

Encodes the five per-document value types (Numeric,Binary,Sorted,SortedSet,SortedNumeric) with these strategies:

NUMERIC:

  • Delta-compressed: per-document integers written as deltas from the minimum value, compressed with bitpacking. For more information, see DirectWriter.
  • Table-compressed: when the number of unique values is very small (< 256), and when there are unused "gaps" in the range of values used (such as SmallFloat), a lookup table is written instead. Each per-document entry is instead the ordinal to this table, and those ordinals are compressed with bitpacking (DirectWriter).
  • GCD-compressed: when all numbers share a common divisor, such as dates, the greatest common denominator (GCD) is computed, and quotients are stored using Delta-compressed Numerics.
  • Monotonic-compressed: when all numbers are monotonically increasing offsets, they are written as blocks of bitpacked integers, encoding the deviation from the expected delta.
  • Const-compressed: when there is only one possible non-missing value, only the missing bitset is encoded.
  • Sparse-compressed: only documents with a value are stored, and lookups are performed using binary search.

BINARY:

  • Fixed-width Binary: one large concatenated byte[] is written, along with the fixed length. Each document's value can be addressed directly with multiplication (docID * length).
  • Variable-width Binary: one large concatenated byte[] is written, along with end addresses for each document. The addresses are written as Monotonic-compressed numerics.
  • Prefix-compressed Binary: values are written in chunks of 16, with the first value written completely and other values sharing prefixes. chunk addresses are written as Monotonic-compressed numerics. A reverse lookup index is written from a portion of every 1024th term.

SORTED:

  • Sorted: a mapping of ordinals to deduplicated terms is written as Binary, along with the per-document ordinals written using one of the numeric strategies above.

SORTED_SET:

  • Single: if all documents have 0 or 1 value, then data are written like SORTED.
  • SortedSet table: when there are few unique sets of values (< 256) then each set is assigned an id, a lookup table is written and the mapping from document to set id is written using the numeric strategies above.
  • SortedSet: a mapping of ordinals to deduplicated terms is written as Binary, an ordinal list and per-document index into this list are written using the numeric strategies above.

SORTED_NUMERIC:

  • Single: if all documents have 0 or 1 value, then data are written like NUMERIC.
  • SortedSet table: when there are few unique sets of values (< 256) then each set is assigned an id, a lookup table is written and the mapping from document to set id is written using the numeric strategies above.
  • SortedNumeric: a value list and per-document index into this list are written using the numeric strategies above.

Files:

  1. .dvd: DocValues data
  2. .dvm: DocValues metadata

转自:http://lucene.apache.org/core/6_4_2/core/org/apache/lucene/codecs/lucene54/Lucene54DocValuesFormat.html

可以看到占用空间非常小!!!

du -sm elasticsearch/nodes/0/indices/hec_test2/0/index/*
299 elasticsearch/nodes/0/indices/hec_test2/0/index/_e.fdt
1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e.fdx
1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e.fnm
148 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene50_0.doc
130 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene50_0.tim
5 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene50_0.tip
1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene54_0.dvd
1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene54_0.dvm
1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e.si
1 elasticsearch/nodes/0/indices/hec_test2/0/index/segments_7
0 elasticsearch/nodes/0/indices/hec_test2/0/index/write.lock

Lucene dvd dvm文件便是docvalues文件——就是针对field value的列存储的更多相关文章

  1. 腾讯Hermes设计概要——数据分析用的是列存储,词典文件前缀压缩,倒排文件递增id、变长压缩、依然是跳表-本质是lucene啊

    转自:http://data.qq.com/article?id=817 三.Hermes设计概要 架构描述 系统核心进程均采用分散化设计,根据业务发展需求,可随意扩缩容机器; 周期性数据直接通过td ...

  2. FileShare文件读写锁解决“文件XXX正由另一进程使用,因此该进程无法访问此文件”(转)

    开发过程中,我们往往需要大量与文件交互,读文件,写文件已成家常便饭,本地运行完美,但一上到投产环境,往往会出现很多令人措手不及的意外,或开发中的烦恼,因此,我对普通的C#文件操作做了一次总结,问题大部 ...

  3. .c和.h文件的区别(头文件与之实现文件的的关系~ )

     .c和.h文件的区别 一个简单的问题:.c和.h文件的区别 学了几个月的C语言,反而觉得越来越不懂了.同样是子程序,可以定义在.c文件中,也可以定义在.h文件中,那这两个文件到底在用法上有什么区别呢 ...

  4. [转载]webarchive文件转换成htm文件

    原文地址:webarchive文件转换成htm文件作者:xhbaxf Mac OS X系统带有文件转换功能,可以把webarchive文件变成html文件.方法是:   Step 1: 建立一个文件夹 ...

  5. 怎样将word文件转化为Latex文件:word-to-latex-2.56具体解释

    首先推荐大家读一读这篇博文:http://blog.csdn.net/ibingow/article/details/8613556 --------------------------------- ...

  6. PHP上传文件参考配置大文件上传

    PHP用超级全局变量数组$_FILES来记录文件上传相关信息的. 1.file_uploads=on/off 是否允许通过http方式上传文件 2.max_execution_time=30 允许脚本 ...

  7. R8—批量生成文件夹,批量读取文件夹名称+R文件管理系统操作函数

    一. 批量生成文件夹,批量读取文件夹名称 今日,工作中遇到这样一个问题:boss给我们提供了200多家公司的ID代码(如6007.7920等),需要根据这些ID号去搜索下载新闻,从而将下载到的新闻存到 ...

  8. Python:文件操作总结1——文件基本操作

    一.文件的操作流程 1.打开文件,得到文件句柄并赋值给一个变量 2.通过句柄对文件进行操作 3.关闭文件 二.文件的打开与关闭 A.文件的打开——open函数 语法:open(file[,mode[, ...

  9. linux下压缩成zip文件解压zip文件

    linux  zip命令的基本用法是: zip [参数] [打包后的文件名] [打包的目录路径] linux  zip命令参数列表: -a     将文件转成ASCII模式 -F     尝试修复损坏 ...

随机推荐

  1. 大数据学习——HADOOP集群搭建

    4.1 HADOOP集群搭建 4.1.1集群简介 HADOOP集群具体来说包含两个集群:HDFS集群和YARN集群,两者逻辑上分离,但物理上常在一起 HDFS集群: 负责海量数据的存储,集群中的角色主 ...

  2. NYOJ27水池数目,类似于FZU1008最大黑区域,简单搜索题~~~

    水池数目 时间限制:3000 ms  |  内存限制:65535 KB 难度:4 描述 南阳理工学院校园里有一些小河和一些湖泊,现在,我们把它们通一看成水池,假设有一张我们学校的某处的地图,这个地图上 ...

  3. php-fpm.conf

    [global]pid = /usr/local/php/var/run/php-fpm.piderror_log = /usr/local/php/var/log/php-fpm.loglog_le ...

  4. _063_Android_Android内存泄露

    深入内存泄露 Android应用的内存泄露,其实就是java虚拟机的堆内存泄漏. 当然,当应用有ndk,jni时,没有及时free,本地堆也会出现内存泄漏. 本文只是针对JVM内存泄漏应用,进行阐述分 ...

  5. 洛谷P2414 - [NOI2011]阿狸的打字机

    Portal Description 首先给出一个只包含小写字母和'B'.'P'的操作序列\(s_0(|s_0|\leq10^5)\).初始时我们有一个空串\(t\),依次按\(s_0\)的每一位进行 ...

  6. 前端接收到的json的属性的首字母会自动变成小写,解决办法如下

    使用的json包是alibaba.fastjson. 把TypeUtils.compatibleWithJavaBean = true; 如图位置:

  7. SpringBoot配置Bean的两种方式--注解以及配置文件

    一.注解方式 编写实体类: package com.example.bean; import org.springframework.boot.context.properties.Configura ...

  8. java打开本地应用程序(调用cmd)---Runtime用法详解

    有时候我们需要借助java程序打开电脑自带的一些程序,可以直接打开或者借助cmd命令窗口打开一些常用的应用程序或者脚本,在cmd窗口执行的命令都可以通过这种方式运行. 例如: package cn.x ...

  9. 蓝桥杯 算法提高 金属采集 [ 树形dp 经典 ]

    传送门 算法提高 金属采集 时间限制:1.0s   内存限制:256.0MB     锦囊1   锦囊2   锦囊3   问题描述 人类在火星上发现了一种新的金属!这些金属分布在一些奇怪的地方,不妨叫 ...

  10. win7电脑定时开机设置方法

    在BIOS设置主界面中选择“Power Management Setup”,进入“电源管理”窗口. 注:缺省情况下,“Resume By Alarm”定时开机选项是关闭的. 将鼠标移到“Resume ...