官方说法,来自https://www.elastic.co/guide/en/elasticsearch/reference/2.2/index-modules.html#_static_index_settings:

index.codecThe default value compresses stored data with LZ4 compression, but this can be set tobest_compression which uses DEFLATE for a higher compression ratio, at the expense of slower stored fields performance.

注意:2.1以下都是实验特性!2.2+才稳定!

Now you can also enable better compression on the cold nodes by setting index.codec: best_compression in theirconfig/elasticsearch.yml file in order to be able to archive more data with the same amount of disk space.

摘自:https://www.elastic.co/blog/store-compression-in-lucene-and-elasticsearch

下面的数据摘自:https://www.elastic.co/blog/elasticsearch-storage-the-true-story-2.0

The test methodology hasn’t changed so you can check out the old blog post or the README in the Github repo for the details.

Test String fields _all index size /w LZ4 index size /w DEFLATE expansion ratio /w LZ4 expansion ratio /w DEFLATE Impact of DEFLATE
Structured data file. Original file size: 67644119              
1 analyzed and not_analyzed  enabled 63047579 53131592 0.932 0.785 -0.157
2 analyzed and not_analyzed  disabled 48271433 38327106 0.713 0.566 -0.206
3 not_analyzed disabled 38920800 29014796 0.575 0.428 -0.254
3b not_analyzed, except for 'message' field which is retained and analyzed disabled 65382872 49532858 0.966 0.732 -0.242
4 not_analyzed, except for 'agent' field which is analyzed disabled 43083702 32063602 0.636 0.474 -0.255
Semi-structured data file.
Original file size: 75037027
             
1 analyzed and not_analyzed  enabled 100478376 82132782 1.339 1.094 -0.182
2 analyzed and not_analyzed  disabled 75238480 56911638 1.002 0.758 -0.243
3 not_analyzed disabled 71866672 53553561 0.957 0.713 -0.254
3b not_analyzed, except for 'message' field which is retained and analyzed disabled 104638750 83824398 1.394 1.117 -0.198
4 not_analyzed, except for 'agent' field which is analyzed disabled 72925624 54603882 0.971 0.727 -0.251

With the standard LZ4-based compression, the indexed data size to raw data size ratio ranged from 0.575 to 1.394. After enabling DEFLATE-based compression using the best_compression index.codec option, the indexed data size to raw data size ratio range came down to 0.429 to 1.117. Enabling the best_compression option resulted in a 15.7% to 25.6% reduction in indexed data size depending on the test parameters.

As you can see, the ratio of index size to raw data size can vary greatly based on your mapping configuration, what fields you decide to create/retain, and the characteristics of the data set itself. We encourage you to run similar tests yourself to determine what the data compression/expansion factor is for your data set and application requirements.

Conclusion

There were many amazing features added to Elasticsearch 2.0 worth considering. As we’ve discussed, two of these new features in particular can reduce the hardware footprint required for an Elasticsearch cluster by 15-25% or more: 1) the addition of a best_compression option and 2) enabling doc_values by default. This allows us to get to compression ratios between 0.429 and 1.117.

elasticsearch 2.2+ index.codec: best_compression启用压缩的更多相关文章

  1. Oracle 数据库备份启用压缩以及remap

    1. Oracle数据库进行备份恢复 客户测试环境, 有时候需要从现场copy到公司, 压缩虽然能够减少部分空间大小,但是copy到虚拟机里面也时浪费很大量的磁盘,所以能够在备份恢复的过程中执行压缩最 ...

  2. tomcat启用压缩的方式

    <Connector port="7070" protocol="HTTP/1.1"connectionTimeout="20000" ...

  3. 使sqoop能够启用压缩的一些配置

    在使用sqoop 将数据库表中数据导入至hdfs时 配置启用压缩 hadoop 的命令    检查本地库支持哪些  bin/hadoop checknative 需要配置native    要编译版本 ...

  4. HBase启用压缩

    1. 压缩算法的比较 算法 压缩比 压缩 解压 GZIP 13.4% 21MB/s 118MB/s LZO 20.5% 135MB/s 410MB/s Snappy/Zippy 22.2% 172MB ...

  5. Elasticsearch:inverted index,doc_values及source

    以后会用到的相关知识:索引中某些字段禁止搜索,排序等操作 当我们学习Elasticsearch时,经常会遇到如下的几个概念: Reverted index doc_values source? 这个几 ...

  6. logstash 输出到elasticsearch 自动建立index

    由于es 单index 所能承受的数据量有限,之前情况是到400w数据300G左右的时候,整个数据的插入会变得特别慢(索引重建)甚至会导致集群之间的通信断开,于是我们采用每天一个index的方法来缓解 ...

  7. hive启用压缩

    <property> <name>hive.exec.compress.intermediate</name> <value>true</valu ...

  8. kibana无法显示elasticsearch中的index

    我是用的logstash将kafka中的数据同步到elasticsearch.logstash和kafka在同一台服务器,elasticsearch在另外的服务器上. 经过排查,是因为我的logsta ...

  9. ElasticSearch(十一)Elasticsearch清空指定Index/Type数据

    POST /index_name/type_name/_delete_by_query?conflicts=proceed { "query": { "match_all ...

随机推荐

  1. 【MonogDB】The description of index(二) Embedded and document Index

    In this blog, we will talk about another the index which was called "The embedded ". First ...

  2. Keras网络层之“关于Keras的层(Layer)”

    关于Keras的“层”(Layer) 所有的Keras层对象都有如下方法: layer.get_weights():返回层的权重(numpy array) layer.set_weights(weig ...

  3. 0601-Zuul构建API Gateway-API gateway简介、基础使用、路由配置、负载配置

    一.API Gateway简介 参看:http://www.cnblogs.com/bjlhx/p/8794437.html 二.zuul简介[路由器和过滤器:Zuul] 在微服务架构的组成部分进行路 ...

  4. network FAQ

    @1: 参考 ifconfig eth0之后IP总是自动清除,解决方法, 修改vim /etc/network/interfaces 然后执行sudo /etc/init.d/networking r ...

  5. 转:.Net 中的反射(反射特性) - Part.3

    .Net 中的反射(反射特性) - Part.3 反射特性(Attribute) 可能很多人还不了解特性,所以我们先了解一下什么是特性.想想看如果有一个消息系统,它存在这样一个方法,用来将一则短消息发 ...

  6. Spring MVC 了解WebApplicationContext中特殊的bean类型

    Spring MVC 了解WebApplicationContext中特殊的bean类型 Spring的DispatcherServlet使用了特殊的bean来处理请求.渲染视图等,这些特定的bean ...

  7. for 循环与嵌套

    循环:反复执行某段代码.循环四要素:初始条件,循环条件,循环体,状态改变 for(初始条件;循环条件;状态改变){ 循环体} 给出初始条件,先判断是否满足循环条件,如果不满足条件则跳过for语句,如果 ...

  8. 2062326 齐力锋 实验三《敏捷开发与XP实践》实验报告

    北京电子科技学院(BESTI) 实 验 报 告 课程: 程序设计与数据结构 班级: 1623 姓名: 齐力锋 学号: 20162326 成绩: 指导教师: 娄嘉鹏/王志强 实验日期: 2017年5月1 ...

  9. MapReduce:实现文档倒序排序,且字符串拼接+年+月+日

    写出MapReduce程序完成以下功能. input1: -- a -- b -- c -- d -- a -- b -- c -- c input2: -- b -- a -- b -- d -- ...

  10. git常用的语句

    下面总结出开发中常用的指令: 1.git init:初始化git仓库 2.git add 文件名:把文件添加到暂存区中 3.git commit -m "提交信息":提交暂存区内容 ...