Compression

Compression maximizes the storage capacity of Cassandra nodes by reducing the volume of data on disk and disk I/O, particularly for read-dominated workloads. Cassandra quickly finds the location of rows in the SSTable index and decompresses the relevant row chunks.

Write performance is not negatively impacted by compression in Cassandra as it is in traditional databases. In traditional relational databases, writes require overwrites to existing data files on disk. The database has to locate the relevant pages on disk, decompress them, overwrite the relevant data, and finally recompress. In a relational database, compression is an expensive operation in terms of CPU cycles and disk I/O. Because Cassandra SSTable data files are immutable (they are not written to again after they have been flushed to disk), there is no recompression cycle necessary in order to process writes. SSTables are compressed only once when they are written to disk. Writes on compressed tables can show up to a 10 percent performance improvement.

When to compress data
Compression is best suited for tables that have many rows and each row has the same columns, or at least as many columns, as other rows.
Configuring compression
Steps for configuring compression.

When to compress data

Compression is best suited for tables that have many rows and each row has the same columns, or at least as many columns, as other rows. For example, a table containing user data such as username, email, and state, is a good candidate for compression. The greater the similarity of the data across rows, the greater the compression ratio and gain in read performance.

A table that has rows of different sets of columns is not well-suited for compression.

Don't confuse table compression with compact storage of columns, which is used for backward compatibility of old applications with CQL.

Depending on the data characteristics of the table, compressing its data can result in:

2x-4x reduction in data size
25-35% performance improvement on reads
5-10% performance improvement on writes

After configuring compression on an existing table, subsequently created SSTables are compressed. Existing SSTables on disk are not compressed immediately. Cassandra compresses existing SSTables when the normal Cassandra compaction process occurs. Force existing SSTables to be rewritten and compressed by using nodetool upgradesstables (Cassandra 1.0.4 or later) or nodetool scrub.

Compression

Cassandra offers operators the ability to configure compression on a per-table basis. Compression reduces the size of data on disk by compressing the SSTable in user-configurable compression chunk_length_in_kb. Because Cassandra SSTables are immutable, the CPU cost of compressing is only necessary when the SSTable is written - subsequent updates to data will land in different SSTables, so Cassandra will not need to decompress, overwrite, and recompress data when UPDATE commands are issued. On reads, Cassandra will locate the relevant compressed chunks on disk, decompress the full chunk, and then proceed with the remainder of the read path (merging data from disks and memtables, read repair, and so on).

Configuring Compression

Compression is configured on a per-table basis as an optional argument to CREATE TABLE or ALTER TABLE. By default, three options are relevant:

class specifies the compression class - Cassandra provides three classes (LZ4Compressor, SnappyCompressor, and DeflateCompressor ). The default isSnappyCompressor.
chunk_length_in_kb specifies the number of kilobytes of data per compression chunk. The default is 64KB.
crc_check_chance determines how likely Cassandra is to verify the checksum on each compression chunk during reads. The default is 1.0.

Users can set compression using the following syntax:

CREATE TABLE keyspace.table (id int PRIMARY KEY) WITH compression = {'class': 'LZ4Compressor'};

ALTER TABLE keyspace.table WITH compression = {'class': 'SnappyCompressor', 'chunk_length_in_kb': 128, 'crc_check_chance': 0.5};

Once enabled, compression can be disabled with ALTER TABLE setting enabled to false:

ALTER TABLE keyspace.table WITH compression = {'enabled':'false'};

Operators should be aware, however, that changing compression is not immediate. The data is compressed when the SSTable is written, and as SSTables are immutable, the compression will not be modified until the table is compacted. Upon issuing a change to the compression options via ALTER TABLE, the existing SSTables will not be modified until they are compacted - if an operator needs compression changes to take effect immediately, the operator can trigger an SSTable rewrite using nodetool scrub or nodetoolupgradesstables -a, both of which will rebuild the SSTables on disk, re-compressing the data in the process.

Benefits and Uses

Compression’s primary benefit is that it reduces the amount of data written to disk. Not only does the reduced size save in storage requirements, it often increases read and write throughput, as the CPU overhead of compressing data is faster than the time it would take to read or write the larger volume of uncompressed data from disk.

Compression is most useful in tables comprised of many rows, where the rows are similar in nature. Tables containing similar text columns (such as repeated JSON blobs) often compress very well.

Operational Impact

Compression metadata is stored off-heap and scales with data on disk. This often requires 1-3GB of off-heap RAM per terabyte of data on disk, though the exact usage varies with chunk_length_in_kb and compression ratios.
Streaming operations involve compressing and decompressing data on compressed tables - in some code paths (such as non-vnode bootstrap), the CPU overhead of compression can be a limiting factor.
The compression path checksums data to ensure correctness - while the traditional Cassandra read path does not have a way to ensure correctness of data on disk, compressed tables allow the user to set crc_check_chance (a float from 0.0 to 1.0) to allow Cassandra to probabilistically validate chunks on read to verify bits on disk are not corrupt.

Advanced Use

Advanced users can provide their own compression class by implementing the interface at org.apache.cassandra.io.compress.ICompressor.

参考：http://cassandra.apache.org/doc/latest/operating/compression.html

cassandra压缩——从文档看，本质上也应该是在做块压缩的更多相关文章

自动把动态的jsp页面（或静态html）生成PDF文档，并且上传至服务器
置顶2017年11月06日 14:41:04 阅读数:2311 这几天,任务中有一个难点是把一个打印页面自动给生成PDF文档,并且上传至服务器,然而公司框架只有手动上传文档,打印时可以保存为PDF在本 ...
[sharepoint]rest api文档库文件上传，下载，拷贝，剪切，删除文件，创建文件夹，修改文件夹属性，删除文件夹，获取文档列表
写在前面最近对文档库的知识点进行了整理,也就有了这篇文章,当时查找这些接口,并用在实践中,确实废了一些功夫,也为了让更多的人走更少的弯路. 系列文章 sharepoint环境安装过程中几点需要注意的 ...
跟我学SharePoint 2013视频培训课程——怎样创建文档库并上传文档(8)
课程简介第8天,怎样在SharePoint 2013怎样创建文档库并上传文档. 视频 SharePoint 2013 交流群 41032413
【.net 深呼吸】使用二进制格式来压缩XML文档
在相当多的情况下,咱们写入XML文件默认是使用文本格式来写入的,如果XML内容是通过网络传输,或者希望节省空间,特别是对于XML文档较大的情况,是得考虑尽可能地压缩XML文件的大小. XmlDicti ...
【SQL】SQL2012离线帮助文档安装不上的处理手记
注:解决方法在最后,心急的童鞋可以直接往下滚动. 我SQL实例装的是2008 R2版,由于该版自带的SSMS(Microsoft SQL Server Management Studio 管理工具)存 ...
JRoll 2 使用文档（史上最强大的下拉刷新，滚动，无限加载插件）
概述说明 JRoll,一款能滚起上万条数据,具有滑动加速.回弹.缩放.滚动条.滑动事件等功能,兼容CommonJS/AMD/CMD模块规范,开源,免费的轻量级html5滚动插件. JRoll第二版是 ...
Swagger文档添加file上传参数写法
想在swagger ui的yaml文档里面写一个文件上传的接口,找了半天不知道怎么写,终于搜到了,如下: /tools/upload: post: tags: - "tool" s ...
小胖求学系列之-文档生成利器(上)-smart-doc
最近小胖上课总是挂着黑眼圈,同桌小张问:你昨晚通宵啦?小胖有气无力的说到:最近开发的项目接口文档没写,昨晚补文档补了很久,哎,昨晚只睡了2个小时.小张说:不是有生成文档工具吗,类似swagger2.s ...
html的文档设置标记上（格式标记）4-5
<html> <head> <title>第四课的标题及第五课的标题</title> <meta charset="utf-8" ...

随机推荐

ArcGIS 安装中，SQL的使用出现错误的解决
1. SQL Server Configuration Manager 中 SQL Server Services出现 “远程调用失败..” 的问题解决方法是卸载
邁向IT專家成功之路的三十則鐵律鐵律十九：IT人待業之道-寬心
說來很多人可能不相信,筆者從來不把失業當作是一件嚴重的事,相反的我會把它當作是一個很好的轉機.針對一個隨時做好準備的IT人,三個月或半年沒有上班完全沒有甚麼好擔心的.只是如何善用待業的時間,說實在的真 ...
BEGINNING SHAREPOINT® 2013 DEVELOPMENT 第2章节--SharePoint 2013 App 模型概览 SharePoint 2013 App 模型
BEGINNING SHAREPOINT® 2013 DEVELOPMENT 第2章节--SharePoint 2013 App 模型概览 SharePoint 2013 App 模型你能够通过两个 ...
【翻译自mos文章】使用asmcmd命令在本地和远程 asm 实例之间拷贝asm file的方法
使用asmcmd命令在本地和远程 asm 实例之间拷贝asm file的方法參考原文: How to Copy asm files between remote ASM instances usi ...
重新认识一遍JavaScript - 2
1.JavaScript没有Java和C中的int.double,怎么识别这些类型的呢?或者说不支持问:你认为呢? 答:var 支持所有数据类型(int.double.string),取决于你输入的 ...
【Python】输出程序运行的百分比
对于一些大型的Python程序.我们须要在命令行输出其百分比,显得更加友好,以免被人误会程序陷入死循环.假死的窗口. 关键是利用到不换行的输出符\r,\r的输出.将直接覆盖掉此行的内容. 比方例如以下 ...
判断是否是iso8859-1编码
if (null == keyword || keyword.equals("关键字")) keyword = ""; if(keywor ...
NoSQL数据库-MongoDB和Redis
http://blog.csdn.net/tea_wu/article/details/19050277 http://www.uml.org.cn/sjjm/201212205.asp
深入struts2.0(七)--ActionInvocation接口以及3DefaultActionInvocation类
1.1.1 ActionInvocation类 ActionInvocation定义为一个接口.主要作用是表现action的运行状态.它拥有拦截器和action的实例.通过重复的运行inv ...
mediawiki 管理员/行政员设置
mediawiki行政员找回 mediawiki 1.22.6默认安装完毕后,无管理员/行政员.默认都是user组成员.这样不便于wiki系统维护. 注: 默认情况下.行政员组(bureaucrat) ...

cassandra压缩——从文档看，本质上也应该是在做块压缩

Compression

When to compress data

Compression

Configuring Compression

Benefits and Uses

Operational Impact

Advanced Use

cassandra压缩——从文档看，本质上也应该是在做块压缩的更多相关文章

随机推荐

热门专题