elasticsearch 2.2+ index.codec: best

官方说法，来自https://www.elastic.co/guide/en/elasticsearch/reference/2.2/index-modules.html#_static_index_settings：

index.codecThe default value compresses stored data with LZ4 compression, but this can be set tobest_compression which uses DEFLATE for a higher compression ratio, at the expense of slower stored fields performance.

注意：2.1以下都是实验特性！2.2+才稳定！

Now you can also enable better compression on the cold nodes by setting index.codec: best_compression in theirconfig/elasticsearch.yml file in order to be able to archive more data with the same amount of disk space.

摘自：https://www.elastic.co/blog/store-compression-in-lucene-and-elasticsearch

下面的数据摘自：https://www.elastic.co/blog/elasticsearch-storage-the-true-story-2.0

The test methodology hasn’t changed so you can check out the old blog post or the README in the Github repo for the details.

Test	String fields	_all	index size /w LZ4	index size /w DEFLATE	expansion ratio /w LZ4	expansion ratio /w DEFLATE	Impact of DEFLATE
Structured data file. Original file size: 67644119
1	analyzed and not_analyzed	enabled	63047579	53131592	0.932	0.785	-0.157
2	analyzed and not_analyzed	disabled	48271433	38327106	0.713	0.566	-0.206
3	not_analyzed	disabled	38920800	29014796	0.575	0.428	-0.254
3b	not_analyzed, except for 'message' field which is retained and analyzed	disabled	65382872	49532858	0.966	0.732	-0.242
4	not_analyzed, except for 'agent' field which is analyzed	disabled	43083702	32063602	0.636	0.474	-0.255
Semi-structured data file. Original file size: 75037027
1	analyzed and not_analyzed	enabled	100478376	82132782	1.339	1.094	-0.182
2	analyzed and not_analyzed	disabled	75238480	56911638	1.002	0.758	-0.243
3	not_analyzed	disabled	71866672	53553561	0.957	0.713	-0.254
3b	not_analyzed, except for 'message' field which is retained and analyzed	disabled	104638750	83824398	1.394	1.117	-0.198
4	not_analyzed, except for 'agent' field which is analyzed	disabled	72925624	54603882	0.971	0.727	-0.251

With the standard LZ4-based compression, the indexed data size to raw data size ratio ranged from 0.575 to 1.394. After enabling DEFLATE-based compression using the best_compression index.codec option, the indexed data size to raw data size ratio range came down to 0.429 to 1.117. Enabling the best_compression option resulted in a 15.7% to 25.6% reduction in indexed data size depending on the test parameters.

As you can see, the ratio of index size to raw data size can vary greatly based on your mapping configuration, what fields you decide to create/retain, and the characteristics of the data set itself. We encourage you to run similar tests yourself to determine what the data compression/expansion factor is for your data set and application requirements.

Conclusion

There were many amazing features added to Elasticsearch 2.0 worth considering. As we’ve discussed, two of these new features in particular can reduce the hardware footprint required for an Elasticsearch cluster by 15-25% or more: 1) the addition of a best_compression option and 2) enabling doc_values by default. This allows us to get to compression ratios between 0.429 and 1.117.

elasticsearch 2.2+ index.codec: best_compression启用压缩的更多相关文章

Oracle 数据库备份启用压缩以及remap
1. Oracle数据库进行备份恢复客户测试环境, 有时候需要从现场copy到公司, 压缩虽然能够减少部分空间大小,但是copy到虚拟机里面也时浪费很大量的磁盘,所以能够在备份恢复的过程中执行压缩最 ...
tomcat启用压缩的方式
<Connector port="7070" protocol="HTTP/1.1"connectionTimeout="20000" ...
使sqoop能够启用压缩的一些配置
在使用sqoop 将数据库表中数据导入至hdfs时配置启用压缩 hadoop 的命令检查本地库支持哪些 bin/hadoop checknative 需要配置native 要编译版本 ...
HBase启用压缩
1. 压缩算法的比较算法压缩比压缩解压 GZIP 13.4% 21MB/s 118MB/s LZO 20.5% 135MB/s 410MB/s Snappy/Zippy 22.2% 172MB ...
Elasticsearch：inverted index，doc_values及source
以后会用到的相关知识:索引中某些字段禁止搜索,排序等操作当我们学习Elasticsearch时,经常会遇到如下的几个概念: Reverted index doc_values source? 这个几 ...
logstash 输出到elasticsearch 自动建立index
由于es 单index 所能承受的数据量有限,之前情况是到400w数据300G左右的时候,整个数据的插入会变得特别慢(索引重建)甚至会导致集群之间的通信断开,于是我们采用每天一个index的方法来缓解 ...
hive启用压缩
<property> <name>hive.exec.compress.intermediate</name> <value>true</valu ...
kibana无法显示elasticsearch中的index
我是用的logstash将kafka中的数据同步到elasticsearch.logstash和kafka在同一台服务器,elasticsearch在另外的服务器上. 经过排查,是因为我的logsta ...
ElasticSearch(十一)Elasticsearch清空指定Index/Type数据
POST /index_name/type_name/_delete_by_query?conflicts=proceed { "query": { "match_all ...

随机推荐

c#与lua交互里，错误处理
如果是c#代码出错了 [MonoPInvokeCallbackAttribute(typeof(LuaCSFunction))] static int _g_get_down(RealStatePtr ...
[译]理解Windows消息循环
出处:http://www.cnblogs.com/zxjay/archive/2009/06/27/1512372.html 理解消息循环和整个消息传送机制对Windows编程来说非常重要.如果对消 ...
tornado web应用程序结构
tornado web 应用程序通常包含一个或者多个RequestHandler 子类,一个Application 对象来为每个控制器路由到达的请求和一个mian()函数 import tornado ...
Matlab/Simulink仿真中如何将Scope转化为Figure?
1.只需要在运行仿真后,在命令窗口内输入: ,'ShowHiddenHandle','on'); set(gcf,'menubar','figure'); scope最上方会出现一个菜单栏,选择Too ...
Android用surface直接显示yuv数据（三）
本文用Java创建UI并联合JNI层操作surface来直接显示yuv数据(yv12),开发环境为Android 4.4,全志A23平台. package com.example.myyuvviewe ...
Windows Server 2012 云硬盘如何挂载
那么首先科普一下,云服务器的数据盘(也就是我们买的云硬盘)默认是脱机状态,不自动挂载的.下面来教大家win2012环境如何挂载硬盘,其实和03.08的大同小异就是入口不同了. 点击“工具”中的“计 ...
python 脚本转成exe可执行程序
本文所使用的工具是cx_Freeze,相对py2exe和PyInstaller来说,cx_Freeze的兼容性更好,支持最新的Python 3.X,支持PyQT,并且可以跨平台支持Windows和Li ...
testng xml配置文件
简单介绍运行TestNG测试脚本有两种方式:一种是直接通过IDE运行(例如使用eclipse中的“Run TestNG tests”),另一种是从命令行运行(通过使用xml配置文件).当我们想执行某 ...
php采集
采集思路采集程序的思路很简单大体可以分为以下几个步骤: 1. 获取远程文件源代码(file_get_contents或用fopen). 2.分析代码得到自己想要的内容(这里用正则匹配,一般 ...
python数据可视化、数据挖掘、机器学习、深度学习常用库、IDE等
一.可视化方法条形图饼图箱线图(箱型图) 气泡图直方图核密度估计(KDE)图线面图网络图散点图树状图小提琴图方形图三维图二.交互式工具 Ipython.Ipython not ...

elasticsearch 2.2+ index.codec: best_compression启用压缩

Conclusion

elasticsearch 2.2+ index.codec: best_compression启用压缩的更多相关文章

随机推荐

热门专题