字符串压缩(二)之LZ4
本文来自博客园,作者:T-BARBARIANS,转载请注明原文链接:https://www.cnblogs.com/t-bar/p/16451185.html 谢谢!
上一篇对google精品ZSTD的压缩、解压缩方法,压缩、解压缩的性能表现,以及多线程压缩的使用方法进行了介绍。
本篇,我们从类似的角度,看看LZ4有如何表现。
一、LZ4压缩与解压
LZ4有两个压缩函数。默认压缩函数原型:
int LZ4_compress_default(const char* src, char* dst, int srcSize, int dstCapacity);
快速压缩函数原型:
int LZ4_compress_fast (const char* src, char* dst, int srcSize, int dstCapacity, int acceleration);
快速压缩函数acceleration的参数范围:[1 ~ LZ4_ACCELERATION_MAX],其中LZ4_ACCELERATION_MAX为65537。什么意思呢,简单的说就是acceleration值越大,压缩速率越快,但是压缩比就越低,后面我会用实验数据来进行说明。
另外,当acceleration = 1时,就是简化版的LZ4_compress_default,LZ4_compress_default函数默认acceleration = 1。
LZ4也有两个解缩函数。安全解缩函数原型:
int LZ4_decompress_safe (const char* src, char* dst, int compressedSize, int dstCapacity);
快速解缩函数原型:
int LZ4_decompress_fast (const char* src, char* dst, int originalSize);
快速解压函数不建议使用。因为LZ4_decompress_fast 缺少被压缩后的文本长度参数,被认为是不安全的,LZ4建议使用LZ4_decompress_safe。
同样,我们先来看看LZ4的压缩与解压缩示例。
1 #include <stdio.h>
2 #include <string.h>
3 #include <sys/time.h>
4 #include <malloc.h>
5 #include <lz4.h>
6 #include <iostream>
7
8 using namespace std;
9
10 int main()
11 {
12 char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot \
13 play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, \
14 run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.\
15 Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.\
16 Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy \
17 puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, \
18 George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having \
19 a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, \
20 George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.\
21 Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. \
22 Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.\
23 Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. \
24 Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?\
25 Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. \
26 You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy \
27 puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, \
28 it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, \
29 when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play \
30 in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are \
31 wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping \
32 up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: \
33 It's only mud.";
34
35 size_t com_space_size;
36 size_t peppa_pig_text_size;
37
38 char *com_ptr = NULL;
39
40 // compress
41 peppa_pig_text_size = strlen(peppa_pig_buf);
42 com_space_size = LZ4_compressBound(peppa_pig_text_size);
43
44 com_ptr = (char *)malloc(com_space_size);
45 if(NULL == com_ptr) {
46 cout << "compress malloc failed" << endl;
47 return -1;
48 }
49
50 memset(com_ptr, 0, com_space_size);
51
52 size_t com_size;
53 //com_size = LZ4_compress_default(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size);
54 com_size = LZ4_compress_fast(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size, 1);
55 cout << "peppa pig text size:" << peppa_pig_text_size << endl;
56 cout << "compress text size:" << com_size << endl;
57 cout << "compress ratio:" << (float)peppa_pig_text_size / (float)com_size << endl << endl;
58
59
60 // decompress
61 size_t decom_size;
62 char* decom_ptr = NULL;
63
64 decom_ptr = (char *)malloc((size_t)peppa_pig_text_size);
65 if(NULL == decom_ptr) {
66 cout << "decompress malloc failed" << endl;
67 return -1;
68 }
69
70 decom_size = LZ4_decompress_safe(com_ptr, decom_ptr, com_size, peppa_pig_text_size);
71 cout << "decompress text size:" << decom_size << endl;
72
73 // use decompress buf compare with origin buf
74 if(strncmp(peppa_pig_buf, decom_ptr, peppa_pig_text_size)) {
75 cout << "decompress text is not equal peppa pig text" << endl;
76 }
77
78 free(com_ptr);
79 free(decom_ptr);
80 return 0;
81 }
执行结果:
从结果可以发现,压缩之前的peppa pig文本长度为1848,压缩后的文本长度为1125(上一篇ZSTD为759),压缩率为1.6,解压后的长度与压缩前相等。相同文本情况下,压缩率低于ZSTD的2.4。从文本被压缩后的长度表现来说,LZ4比ZSTD要差。
下图图1是LZ4随着acceleration的递增,文本被压缩后的长度与acceleration的关系。随着acceleration的递增,文本被压缩后的长度越来越长。
图1
图2是LZ4随着acceleration的递增,压缩率与acceleration的关系。随着acceleration的递增,压缩率也越来越低。
图2
这是为什么呢?还是上一篇提到的 鱼(性能)和熊掌(压缩比)的关系。获得了压缩的高性能,失去了算法的压缩率。
二、LZ4压缩性能探索
接下来摸索一下LZ4的压缩性能,以及LZ4在不同acceleration级别下的压缩性能。
测试方法是,使用LZ4_compress_fast,连续压缩同一段文本并持续10秒。每一次分别使用不同的acceleration级别,最后得到每一种acceleration级别下每秒的平均压缩速率。测试压缩性能的代码示例如下:
1 #include <stdio.h>
2 #include <string.h>
3 #include <sys/time.h>
4 #include <malloc.h>
5 #include <lz4.h>
6 #include <iostream>
7
8 using namespace std;
9
10 int main()
11 {
12 char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot \
13 play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, \
14 run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.\
15 Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.\
16 Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy \
17 puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, \
18 George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having \
19 a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, \
20 George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.\
21 Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. \
22 Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.\
23 Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. \
24 Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?\
25 Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. \
26 You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy \
27 puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, \
28 it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, \
29 when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play \
30 in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are \
31 wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping \
32 up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: \
33 It's only mud.";
34
35 int cnt = 0;
36
37 size_t com_size;
38 size_t com_space_size;
39 size_t peppa_pig_text_size;
40
41 timeval st, et;
42 char *com_ptr = NULL;
43
44 peppa_pig_text_size = strlen(peppa_pig_buf);
45 com_space_size = LZ4_compressBound(peppa_pig_text_size);
46
47 int test_times = 6;
48 int acceleration = 1;
49
50 // compress performance test
51 while(test_times >= 1) {
52
53 gettimeofday(&st, NULL);
54 while(1) {
55
56 com_ptr = (char *)malloc(com_space_size);
57 if(NULL == com_ptr) {
58 cout << "compress malloc failed" << endl;
59 return -1;
60 }
61
62 com_size = LZ4_compress_fast(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size, acceleration);
63 if(com_size <= 0) {
64 cout << "compress failed, error code:" << com_size << endl;
65 free(com_ptr);
66 return -1;
67 }
68
69 free(com_ptr);
70
71 cnt++;
72 gettimeofday(&et, NULL);
73 if(et.tv_sec - st.tv_sec >= 10) {
74 break;
75 }
76 }
77
78 cout << "acceleration:" << acceleration << ", compress per second:" << cnt/10 << " times" << endl;
79
80 ++acceleration;
81 --test_times;
82 }
83
84 return 0;
85 }
执行结果:
结果可以总结为两点:一是acceleration为默认值1时,即LZ4_compress_default函数的默认值时,每秒的压缩性能在20W+;二是随着acceleration的递增,每秒的压缩性能也在递增,但是代价就是获得更低的压缩率。
acceleration递增与压缩速率的关系如下图所示:
图3
三、LZ4解压性能探索
接下来继续了解一下LZ4的解压性能。
测试方法是先使用LZ4_compress_fast,acceleration = 1压缩文本,再使用安全解压函数LZ4_decompress_safe,连续解压同一段文本并持续10秒,最后得到每秒的平均解压速率。测试解压性能的代码示例如下:
1 #include <stdio.h>
2 #include <string.h>
3 #include <sys/time.h>
4 #include <malloc.h>
5 #include <lz4.h>
6 #include <iostream>
7
8 using namespace std;
9
10 int main()
11 {
12 char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot \
13 play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, \
14 run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.\
15 Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.\
16 Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy \
17 puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, \
18 George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having \
19 a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, \
20 George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.\
21 Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. \
22 Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.\
23 Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. \
24 Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?\
25 Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. \
26 You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy \
27 puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, \
28 it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, \
29 when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play \
30 in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are \
31 wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping \
32 up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: \
33 It's only mud.";
34
35 int cnt = 0;
36
37 size_t com_size;
38 size_t com_space_size;
39 size_t peppa_pig_text_size;
40
41 timeval st, et;
42 char *com_ptr = NULL;
43
44 // compress
45 peppa_pig_text_size = strlen(peppa_pig_buf);
46 com_space_size = LZ4_compressBound(peppa_pig_text_size);
47
48 com_ptr = (char *)malloc(com_space_size);
49 if(NULL == com_ptr) {
50 cout << "compress malloc failed" << endl;
51 return -1;
52 }
53
54 com_size = LZ4_compress_fast(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size, 1);
55 if(com_size <= 0) {
56 cout << "compress failed, error code:" << com_size << endl;
57 free(com_ptr);
58 return -1;
59 }
60
61 // decompress
62 size_t decom_size;
63 char* decom_ptr = NULL;
64
65 // decompress performance test
66 gettimeofday(&st, NULL);
67 while(1) {
68
69 decom_ptr = (char *)malloc((size_t)peppa_pig_text_size);
70 if(NULL == decom_ptr) {
71 cout << "decompress malloc failed" << endl;
72 free(com_ptr);
73 return -1;
74 }
75
76 decom_size = LZ4_decompress_safe(com_ptr, decom_ptr, com_size, peppa_pig_text_size);
77 if(decom_size <= 0) {
78 cout << "decompress failed, error code:" << decom_size << endl;
79 free(com_ptr);
80 free(decom_ptr);
81 return -1;
82 }
83
84 free(decom_ptr);
85
86 cnt++;
87 gettimeofday(&et, NULL);
88 if(et.tv_sec - st.tv_sec >= 10) {
89 break;
90 }
91 }
92
93 free(com_ptr);
94 cout << "decompress per second:" << cnt/10 << " times" << endl;
95
96 return 0;
97 }
执行结果:
结果显示LZ4的解压性能大概在每秒54W次左右,解压速率还是非常可观。
四、LZ4对比ZSTD
使用相同的待压缩文本,分别使用ZSTD与LZ4进行压缩、解压、压缩性能、解压性能测试后有表1的数据。
表1
抛开算法的优劣对比,从实验结果来看,ZSTD更加侧重于压缩率,LZ4(acceleration = 1)更加侧重于压缩性能。
五、总结
无论任何算法,都很难做到既有高性能压缩的同时,又有特别高的压缩率。两者必须要做一个取舍,或者找到一个合适的平衡点。
如果在性能可以接受的情况下,选择具有更高压缩率的ZSTD将更加节约存储空间;如果对压缩率不是特别看中,追求更高的压缩性能,那LZ4也是一个不错的选择。
最后,看到这里是不是觉得任何长度的字符串都可以被ZSTD、LZ4之类的压缩算法进行压缩呢?欲知后事如何,请听下回分解!码字不易,还请各位技术爱好者登录点个赞呀!
本文来自博客园,作者:T-BARBARIANS,转载请注明原文链接:https://www.cnblogs.com/t-bar/p/16451185.html 谢谢!
字符串压缩(二)之LZ4的更多相关文章
- 字符串压缩(一)之ZSTD
前言 最近项目上有大量的字符串数据需要存储到内存,并且需要储存至一定时间,于是自然而然的想到了使用字符串压缩算法对"源串"进行压缩存储.由此触发了对一些优秀压缩算法的调研. 字符串 ...
- 记录新项目中遇到的技术及自己忘记的技术点【DES加密解密,MD5加密,字符串压缩、解压,字符串截取等操作】
一.DES加密.解密 #region DES加密解密 /// <summary> /// 进行DES加密 /// </summary> /// <param name=& ...
- 基于Zlib算法的流压缩、字符串压缩源码
原文:基于Zlib算法的流压缩.字符串压缩源码 Zlib.net官方源码demo中提供了压缩文件的源码算法.处于项目研发的需要,我需要对内存流进行压缩,由于zlib.net并无相关文字帮助只能自己看源 ...
- php字符串压缩
在PHP中偶尔遇到字符串的压缩,比如一个长字符串,数据库开始设计的字段存不下,但是又不想改数据库字段存储长度,就可以用压缩的方式降低数据字段字符串的长度数量级,把几百个字符的字符串压缩到几十个字符.总 ...
- 字符串压缩 stringZip
1,题目描述 通过键盘输入一串小写字母(a~z)组成的字符串.请编写一个字符串压缩程序,将字符串中连续出席的重复字母进行压缩,并输出压缩后的字符串.压缩规则:1. 仅压缩连续重复出现的字符.比如字符串 ...
- ruby直接字符串压缩与解压缩
ruby2.1.3的核心类中包含了Zlib库,其中的Zlib模块包含了对字符串压缩和解压的方法: irb(main):180:0> Zlib.class => Module irb(mai ...
- python基本数据类型之字符串(二)
python基本数据类型之字符串(二) 替换方法 python中字符串的替换方法主要有:center.rjust\ljust.expandtabs.format\format_map(格式化).str ...
- C# 使用GZip对字符串压缩和解压
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.I ...
- PAT 1078 字符串压缩与解压(20)(代码+思路)
1078 字符串压缩与解压(20 分) 文本压缩有很多种方法,这里我们只考虑最简单的一种:把由相同字符组成的一个连续的片段用这个字符和片段中含有这个字符的个数来表示.例如 ccccc 就用 5c 来表 ...
随机推荐
- 08. 树莓派安装MySQL
1. 配置国内源(如果之前设置过可跳过步骤1~步骤2) vim /etc/apt/sources.list.d/raspi.list 2. 添加源 ,文档内原先的内容在开头加#号注释掉,加上下面这个 ...
- 基础学习:MYSQL命令大全(持续更新中---最近一次:2019.12.6)
启动mysql : mysql -hlocalhost -uroot -p创建数据库:create database 数据库名字;指定要操作的数据库:use 数===据库名字;查看数据表建表语句:sh ...
- Hadoop(四)C#操作Hbase
Hbase Hbase是一种NoSql模式的数据库,采用了列式存储.而采用了列存储天然具备以下优势: 可只查涉及的列,且列可作为索引,相对高效 针对某一列的聚合及其方便 同一列的数据类型一致,方便压缩 ...
- [笔记] 2-sat
定义 简单的说就是给出 \(n\) 个集合,每个集合有两个元素,已知形如选 \(a\) 则必须选 \(b\) 的若干个条件, 问是否存在从每个集合选择一个元素满足条件的方案,通常可以题目只要求任意一种 ...
- Linux强制用户首次登录修改密码
一个执着于技术的公众号 地方 前言 Linux强制用户首次登陆修改密码,这应该是RHCE认证中用户管理部分, 属于很基础的内容了.可是我忘记了,所以就有了下面的记录~ 实验过程 1.创建用户并设置登录 ...
- JavaScript 数据结构与算法1(数组与栈)
学习数据结构的 git 代码地址: https://gitee.com/zhangning187/js-data-structure-study 1.数组 几乎所有的语言都原生支持数组类型,因为数组是 ...
- muduo源码分析之TcpServer模块
这次我们开始muduo源代码的实际编写,首先我们知道muduo是LT模式,Reactor模式,下图为Reactor模式的流程图[来源1] 然后我们来看下muduo的整体架构[来源1] 首先muduo有 ...
- Wireshark抓包分析TCP“三次握手,四次挥手”
1.目的 客户端与服务器之间建立TCP/IP连接,我们知道是通过三次握手,四次挥手实现的,但是很多地方对这个知识的描述仅限于理论层面,这次我们通过网络抓包的方式来看一下实际的TCP/IP传输过程. 2 ...
- Docker部署mysql 5.7
Docker部署mysql 5.7 准备工作 在CentOS或者Linux创建部署目录,用于存放容器的配置和MySQL数据:目的是当重装或者升级容器时,配置文件和数据不会丢失.执行以下命令: a.创建 ...
- 羽夏笔记—— AT&T 与 GCC
写在前面 本文是本人根据<AT&T 汇编语言与 GCC 内嵌汇编简介>进一步整理,修改了一些错误,并删除我并不能复现代码相关的部分.该文章一是我对 AT&T 的学习记录 ...