速度之王 — LZ4压缩算法（二）

LZ4 (Extremely Fast Compression algorithm)

项目：http://code.google.com/p/lz4/

作者：Yann Collet

本文作者：zhangskd @ csdn blog

LZ4格式

The compressed block is composed of sequences.

每个数据块可以压缩成若干个序列，格式如下：

(1) literals

length of literals. If it is 0, then there is no literal. If it is 15, then we need to add some more bytes to indicate the

full length. Each additional byte then represent a value of 0 to 255, which is added to the previous value to produce

a total length. When the byte value is 255, another byte is output.

literals are uncompressed bytes, to be copied as-is.

(2) match

offset. It represents the position of the match to be copied from.

Note that 0 is an invalid value, never used. 1 means "current position - 1 byte".

The maximum offset value is really 65535. The value is stored using "little endian" format.

matchlength. There is an baselength to apply, which is the minimum length of a match called minmatch.

This minimum is 4. As a consequence, a value of 0 means a match length of 4 bytes, and a value of 15 means a

match length of 19+ bytes. (Similar to literal length)

(3) rules

1. The last 5 bytes are always literals.

2. The last match cannot start within the last 12 bytes.

So a file within less than 13 bytes can only be represented as literals.

(4) scan strategy

a single-cell wide hash table.

Each position in the input data block gets "hashed", using the first 4 bytes (minimatch). Then the position is stored

at the hashed position. Obviously, the smaller the table, the more collisions we get, reducing compression

effectiveness. The decoder do not care of the method used to find matches, and requires no addtional memory.

(5) Streaming format

实现

(1) 哈希表

Each position in the input data block gets "hashed", using the first 4 bytes (minimatch). Then the position is stored

at the hashed position. Obviously, the smaller the table, the more collisions we get, reducing compression

effectiveness. The decoder do not care of the method used to find matches, and requires no addtional memory.

LZ4使用哈希表来查找匹配字符串。这个哈希表的映射关系(key, value)：

key为4个字节的二进制值。

value为这4个字节在块中的位置。

/* Default value is 14, for 16KB, which nicely fits into Intel x86 L1 cache。

 * Increasing memory usage improves compression ratio

 * Reduced memory usage can improve speed, due to cache effect

 */

#define MEMORY_USAGE 14

#define LZ4_HASHLOG (MEMORY_USAGE - 2) /* 哈希桶位数12 */

#define HASHTABLESIZE (1 << MEMORY_USAGE) /* 哈希表大小2^14 = 16K */

#define HASHNBCELLS4 (1 << LZ4_HASHLOG) /* 哈希桶个数2^12 = 4K */

选择哈希表的大小时，要做一个权衡：

1. 侧重压缩比，则哈希表可以大一些。

2. 侧重压缩速度，则哈希表应该适中，以便能装入L1 cache。

默认的哈希表使用的内存为16KB，能装进L1 cache，这也是LZ4压缩速度快的一个原因。

当前主流的Intel X86 L1 Data Cache为32KB，所以建议哈希表不要超过此大小。

typedef enum { byPtr, byU32, byU16} tableType_t;

哈希表存储的数据为“位置”，分三种情况：

1. inputSize小于64KB时，使用byU16，表示16位的偏移值即可。

2. inputSize大于64KB时：

2.1 指针大小为8字节，使用byU32，表示32位的偏移值，如果用指针不划算。

2.2 指针大小为4字节，使用byPtr，表示32位的指针。

采用整数哈希算法。

2654435761U是2到2^32的黄金分割素数，2654435761 / 4294967296 = 0.618033987。

计算哈希值，输入为4个字节，输出可分为2字节值、4字节值两种哈希值。

FORCE_INLINE int LZ4_hashSequence(U32 sequence, tableType_t tableType)

{

    if (tableType == byU16)

        /* 哈希表为16K，如果哈希value为16位 => 哈希key为13位 */

        return (((sequence) * 2654435761U) >> ((MINMATCH * 8) - (LZ4_HASHLOG + 1)));

    else

        /* 哈希表为16K，如果哈希value为32位 => 哈希key为12位 */

        return (((sequence) * 2654435761U) >> ((MINMATCH * 8) - LZ4_HASHLOG));

}

FORCE_INLINE int LZ4_hashPosition(const BYTE *p, tableType_t tableType) \

    { return LZ4_hashSequence(A32(p), tableType); }

把地址存入到哈希表中。

FORCE_INLINE void LZ4_putPositionOnHash(const BYTE *p, U32 h, void *tableBase, tableType_t tableType,

       const BYTE *srcBase)

{

    switch(tableType)

    {

    case byPtr: { const BYTE **hashTable = (const BYTE **) tableBase; hashTable[h] = p; break; }

    case byU32: { U32 *hashTable = (U32 *) tableBase; hashTable[h] = (U32) (p - srcBase); break; }

    case byU16: { U16 *hashTable = (U16 *) tableBase; hashTable[h] = (U16) (p - srcBase); break; }

    }

}

计算指针p指向的4字节的哈希值，然后把它的位置存入哈希表中。

FORCE_INLINE void LZ4_putPosition(const BYTE *p, void *tableBase, tableType_t tableType, const BYTE *srcBase)

{

    U32 h = LZ4_hashPosition(p, tableType); /* 计算哈希值 */

    LZ4_putPositionOnHash(p, h, tableBase, tableType, srcBase); /* 把地址存入哈希表 */

}

根据哈希值，获取地址。

FORCE_INLINE const BYTE *LZ4_getPositionOnHash(U32 h, void *tableBase, tableType_t tableType,

    const BYTE *srcBase)

{

    if (tableType == byPtr) { const BYTE **hashTable = (const BYTE **) tableBase; return hashTable[h]; }

    if (tableType == byU32) { U32 *hashTable = (U32 *) tableBase; return hashTable[h] + srcBase; }

    { U16 *hashTable = (U16 *) tableBase; return hashTable[h] + srcBase; } /* default, to ensure a return */

}

根据指针p指向的4字节，计算哈希值，并查找此哈希桶是否已有赋值。

如果此哈希桶已有赋值，则说明此时的4字节和上次的4字节很可能是一样的（如果是冲突，则是不一样的）。

FORCE_INLINE const BYTE *LZ4_getPosition(const BYTE *p, void *tableBase, tableType, const BYTE *srcBase)

{

    U32 h = LZ4_hashPosition(p, tableType);

    return LZ4_getPositionOnHash(h, tableBase, tableType, srcBase);

}

(2) 压缩

LZ4_compress()是压缩的一个入口函数，先申请哈希表，然后调用LZ4_compress_generic()。

/* LZ4_compress():

 * Compress inputSize bytes from source into dest.

 * Destination buffer must be already allocated, and must be sized to handle worst cases situations

 * (input data not compressible)

 * Worst case size evaluation is provided by function LZ4_compressBound()

 * inputSize: Max support value is LZ4_MAX_INPUT_VALUE

 * return: the number of bytes written in buffer dest or 0 if the compression fails.

 */

int LZ4_compress(const char *source, char *dest, int inputSize)

{

#if (HEAPMODE) /* 在堆中给哈希表分配内存 */

    void *ctx = ALLOCATOR(HASHNBCELLS4, 4); /* Aligned on 4-bytes boundaries */

#else /* 在栈中给哈希表分配内存，比较快，默认 */

    U32 ctx[1U << (MEMORY_USAGE - 2) ] = {0}; /* Ensure data is aligned on 4-bytes boundaries */

#endif

    int result;

    /* 输入小于64K+11，则用16位来表示滑动窗口，否则用32位*/

    if (inputSize < (int) LZ4_64KLIMIT)

        result = LZ4_compress_generic((void *)ctx, source, dest, inputSize, 0, notLimited, byU16, noPrefix);

    else

        result = LZ4_compress_generic((void *)ctx, source, dest, inputSize, 0, notLimited,

                            (sizeof(void *) == 8 ? byU32 : byPtr, noPrefix);

#if (HEAPMODE)

    FREE(ctx);

#endif

    return result;

}

#define MINMATCH 4 /* 以4字节为单位查找哈希表 */

#define COPYLENGTH 8

#define LASTLITERALS 5

#define MFLIMIT (COPYLENGTH + MINMATCH) /* 对于最后的12个字节，不进行查找匹配 */

const int LZ4_minLength = (MFLIMIT + 1); /* 一个块要>=13个字符，才会进行查找匹配 */

#define LZ4_64KLIMIT ((1<<16) + (MFLIMIT - 1)) /* 64K + 11 */

/* Increasing this value will make the compression run slower on incompressible data。

 * 用于控制查找匹配时的前进幅度，如果一直没找到匹配，则加大前进幅度。

 */

#define SKIPSTRENGTH 6

LZ4_compress_generic()是主要的压缩函数，根据指定的参数，可以执行多种不同的压缩方案。

匹配算法

1. 当前的地址为ip，它的哈希值为h。

2. 下个地址为forwardIp，它的哈希值为forwardH (下个循环赋值给ip、h)。

3. 按照哈希值h，获取哈希表中的值ref。

3.1 ref为初始值，没有匹配，A32(ip) != A32(ref)，继续。

3.2 ref不为初始值，有匹配。

3.2.1 ref不在滑动窗口内，放弃，继续。

3.2.2 ref对应的U32和ip对应的U32不一样，是冲突，继续。

3.3.3 ref在滑动窗口内，且对应的U32一样，找到了match，退出。

4. 保存ip和h的对应关系。

FORCE_INLINE int LZ4_compress_generic(void *ctx, const char *source, char *dest, int inputSize,

        int maxOutputSize, limitedOutput_directive limitedOutput, tableType_t tableType,

        prefix64k_directive prefix)

{

    const BYTE *ip = (const BYTE *) source;

    /* 用作哈希表中的srcBase */

    const BYTE *const base = (prefix == withPrefix) ? ((LZ4_Data_Structure *)ctx)->base : (const BYTE *)source);

    /* 前向窗口的起始地址 */

    const BYTE *const lowLimit = ((prefix == withPrefix) ? ((LZ4_Data_Structure *)ctx)->bufferStart : (const BYTE *)source);

    const BYTE *anchor = (const BYTE *)source;

    const BYTE *const iend = ip + inputSize; /* 输入的结束地址 */

    const BYTE *const mflimit = iend - MFLIMIT; /* iend - 12，超过此处不允许再启动一次匹配 */

    const BYTE *const matchlimit = iend - LASTLITERALS; /* iend - 5，最后5个字符不允许匹配 */

    BYTE *op = (BYTE *) dest; /* 用于操作输出缓存 */

    BYTE *const oend = op + maxOutputSize; /* 输出缓存的边界，如果有的话 */

    int length;

    const int skipStrength = SKIPSTRENGTH; /* 6 */

    U32 forwardH;

    /* Init conditions */

    if ((U32) inputSize > (U32) LZ4_MAX_INPUT_SIZE) return 0; /* 输入长度过大 */

    /* must continue from end of previous block */

    if ((prefix == withPrefix) && (ip != ((LZ4_Data_Structure *)ctx)->nextBlock)) return 0;

    /* do it now, due to potential early exit. 保存下一个块的起始地址 */

    if (prefix == withPrefix) ((LZ4_Data_Structure *)ctx)->nextBlock = iend;

    if ((tableType == byU16) && (inputSize >= LZ4_64KLIMIT)) return 0; /* Size too large (not within 64K limit) */

    if (inputSize < LZ4_minlength) goto _last_literals; /* 如果输入长度小于13，则不查找匹配 */

    /* First Byte */

    LZ4_putPosition(ip, ctx, tableType, base); /* 计算以第一个字节开头的U32的哈希值，保存其位置 */

    ip++; forwardH = LZ4_hashPosition(ip, tableType); /* 计算以第二个字节开头的U32的哈希值 */

    /* Main loop，每次循环查找一个匹配，产生一个序列 */

    for ( ; ; )

    {

        int findMatchAttempts = (1U << skipStrength) + 3;

        const BYTE *forwardIp = ip;

        const BYTE *ref;

        BYTE *token;

        /* Find a match，查找一个匹配，或者到了尽头mflimit */

        do {

            U32 h = forwardH; /* 当前ip对应的哈希值 */

            int step = findMatchAttempts++ >> skipStrength; /* forwardIp的偏移，一般是1 */

            ip = forwardIp;

            forwardIp = ip + step; /* 前向缓存中下个将检查的地址 */

            if unlikely(forwardIp > mflimit) { goto _last_literals; } /* >=12字节才会去匹配 */

            forwardH = LZ4_hashPosition(forwardIp, tableType); /* forwardIp的哈希值 */ 

            /* 这里是查找的关键：按照哈希值h，获取地址ref。

             * 1. 没有匹配，ref为srcBase。

             * 2. 有匹配。

             *     2.1 不在滑动窗口内，继续。

             *     2.2 对应的U32不一样，是冲突，继续。

             *     2.3 在滑动窗口内，且对应的U32一样，找到了match，退出。

             */

            ref = LZ4_getPositionOnHash(h, ctx, tableType, base);

            LZ4_putPositionOnHash(ip, h, ctx, tableType, base); /* 保存ip、h这个对应关系 */

        } while ((ref + MAX_DISTANCE < ip) || (A32(ref) != A32(ip)));

        /* 找到匹配之后，看能否向前扩大匹配 */

        while((ip > anchor) && (ref > lowLimit) && unlikely(ip[-1] == ref[-1])) { ip--; ref--; }

        /* Encode Literal length，赋值Literal length */

        length = (int) (ip - anchor);

        token = op++;

        /* Check output limit */

        if ((limitedOutput) & unlikely(op + length + 2 + (1 + LASTLITERALS) + (length>>8) > oend)) return 0;

        if (length >= (int) RUN_MASK) {

            int len = length - RUN_MASK;

            *token = (RUN_MASK << ML_BITS);

            for(; len >= 255; len -= 255) *op++ = 255;

            *op++ = (BYTE) len;

        } else

            *token = (BYTE) (length << ML_BITS);

        /* Copy Literals，复制不可编码字符 */

        { BYTE * end = (op) + (length); LZ4_WILDCOPY(op, anchor, end); op = end; }

_next_match: /* 向后扩展匹配 */

        /* Encode Offset，赋值offset，op += 2 */

        LZ4_WRITE_LITTLEENDIAN_16(op, (U16) (ip - ref));

        /* Start Counting */

        ip += MINMATCH; ref += MINMATCH; /* MinMatch already verified */

        anchor = ip;

        while likely(ip < matchlimit - (STEPSIZE - 1)) {

            size_t diff = AARCH(ref) ^ AARCH(ip); /* 异或，值为零表示相同 */

            if (! diff) { ip += STEPSIZE; ref += STEPSIZE; continue; }

            ip += LZ4_NbCommonBytes(diff); /* STEPSIZE不同，看其中有多少个字节是相同的 */

            goto _endCount;

        }

        if (LZ4_ARCH64)

            if ((ip < (matchlimit - 3)) && (A32(ref) == A32(ip))) { ip += 4; ref += 4; }

        if ((ip < matchlimit - 1)) && (A16(ref) == A16(ip))) { ip += 2; ref += 2; }

        if ((ip < matchlimit) && (*ref == *ip)) ip++;

_endCount:

        /* Ecode MatchLength，赋值match length */

        length = (int) (ip - anchor);

        /* Check output limit */

        if ((limitedOutput) && unlikely(op + (1 + LASTLITERALS) + (length >> 8) > oend)) return 0;

        if (length >= (int) ML_MASK) {

            *token += ML_MASK;

            length -= ML_MASK;

            for (; length > 509; length -= 510) { *op++ = 255; *op++ = 255; }

            if (length >= 255) { length -= 255; *op++ = 255; }

            *op++ = (BYTE) (length);

        } else

            *token += (BYTE) (length);

        /* Test end of chunk */

        if (ip > mflimit) { anchor = ip; break; } /* 不再进行匹配了 */

        /* Fill table，顺便保存 */

        LZ4_putPosition(ip - 2, ctx, tableType, base);   

        /* Test next position，尝试着找匹配 */

        ref = LZ4_getPosition(ip, ctx, tableType, base);

        LZ4_putPosition(ip, ctx, tableType, base);

        /* 如果找到匹配，说明没有literals，可以直接跳过查找、赋值literal length */

        if ((ref + MAX_DISTANCE >= ip) && (A32(ref) == A32(ip))) { token = op++; *token = 0; goto _next_match; }   

        /* Prepare next loop，准备进行下个循环 */

        anchor = ip++;

        forwardH = LZ4_hashPosition(ip, tableType);

    }

_last_literals:

    /* Encode Last Literals */

    {

        int lastRun = (int) (iend - anchor); /* 最后原字符串长度 */

        if ((limitedOutput) && (((char *)op - dest) + lastRun + 1 + ((lastRun + 255 - RUN_MASK) / 255) >

                (U32) maxOutputSize))

            return 0; /* check output limit */

        if (lastRun >= (int) RUN_MASK) {

            *op ++ = (RUN_MASK << ML_BITS);

            lastRun -= RUN_MASK;

            for (; lastRun >= 255; lastRun -=255) *op++ = 255;

            *op++ = (BYTE) lastRun;

        } else

            *op++ = (BYTE) (lastRun << ML_BITS);

        memcpy(op, anchor, iend - anchor); /* 复制literals */

        op += iend - anchor;

    }

    /* End */

    return (int) (((char *)op) - dest); /* 返回压缩后的长度 */

}

#define LZ4_MAX_INPUT_SIZE 0x7E000000 /* 2 113 929 216 bytes */

#define ML_BITS 4 /* Token: 4-low-bits, match length */

#define ML_MASK ((1U << ML_BITS) - 1)

#define RUN_BITS (8 - ML_BITS) /* Token: 4-high-bits, literal length */

#define RUN_MASK ((1U << RUN_BITS) - 1)

#define MAXD_LOG 16 /* 滑动窗口的位数 */

#define MAX_DISTANCE ((1 << MAXD_LOG) - 1) /* 滑动窗口的最大值 */

速度之王 — LZ4压缩算法（二）的更多相关文章

速度之王 — LZ4压缩算法（一）
LZ4 (Extremely Fast Compression algorithm) 项目:http://code.google.com/p/lz4/ 作者:Yann Collet 本文作者:zhan ...
速度之王 — LZ4压缩算法（三）
LZ4使用 make / make clean 得到可执行程序:lz4.lz4c Usage: ./lz4 [arg] [input] [output] input : a filename Argu ...
lucene底层数据结构——FST，针对field使用列存储，delta encode压缩doc ids数组，LZ4压缩算法
参考: http://www.slideshare.net/lucenerevolution/what-is-inaluceneagrandfinal http://www.slideshare.ne ...
Lucene4.2源码解析之fdt和fdx文件的读写——fdx文件存储一个个的Block，每个Block管理着一批Chunk，通过docID读取到document需要完成Segment、Block、Chunk、document四级查询，引入了LZ4算法对fdt的chunk docs进行了实时压缩/解压
前言通常在搜索打分完毕后,IndexSearcher会返回一个docID序列,但是仅仅有docID我们是无法看到存储在索引中的document,这时候就需要通过docID来得到完整Document信 ...
Druid.io系列（二）：基本概念与架构
原文链接: https://blog.csdn.net/njpjsoftdev/article/details/52955788 在介绍Druid架构之前,我们先结合有关OLAP的基本原理来理解Dr ...
AssetBundle打包
为热更新打基础(xlua\tolua) 素材.源码链接:http://www.sikiedu.com/course/74/task/1812/show 一.AssetBundle的定义和作用 1,As ...
FreeNAS-9.10虚拟机测试安装
虚拟机安装NreeNAS-9.10步骤需求:网络监控磁盘要扩容测试环境: CPU 内存系统盘共享盘网卡 2核 2G 20G 20G 桥接系统版本:FreeNAS-9.10 一.配置虚拟机 ...
MGR架构~ 整体性能架构的调优
一简介:MGR集群架构的调优二过程:本文将从各个角度来具体阐述下三硬件 1 硬件选择相同配置的服务器,磁盘,内存,cpu性能越高越好四网络 1 0丢包和最好万兆网卡五 MGR本身 ...
AssetBundle资源打包与加载
AssetBundle资源打包 1.AssetLabels资源标签文件名:资源打包成AssetBundle后的文件名,类似于压缩包的名字后缀:自定义文件名和后缀名都是小写格式(大写会自动转为小 ...

随机推荐

“出错了”和报告Bug的艺术
"出错了." 没有那句话能像"出错了"一样让程序员/开发者如此沮丧,心里翻江倒海,怒火一点即燃,还要死掉一大片脑细胞. 这句生硬的开场白通常标志着让开发者恐惧的 ...
64位Linux下安装mysql-5.7.13-linux-glibc2.5-x86_64 || 转载：http://www.cnblogs.com/gaojupeng/p/5727069.html
由于公司临时让将Oracle的数据移植到mysql上面,所以让我在公司服务器上面安装一下mysql.下面就是我的安装过程以及一些错误解决思路.其实对于不同版本安装大体都有差不多. 1. 从官网下载 m ...
JAVA面向对象-----instanceof 关键字
instanceof 关键字 1:快速演示instanceof Person p=new Person(); System.out.println( p instanceof Person); 2:i ...
XML之DOM解析模型
<?xml version= "1.0" encoding = "UTF-8"> <articles> <article cate ...
android开发之调试技巧
我们都知道,android的调试打了断点之后运行时要使用debug as->android application 但是这样的运行效率非常低,那么我们有没有快速的方法呢? 当然有. 我们打完断点 ...
Android开发模式之MVC，MVP和MVVM的简单介绍与区别
相信大家对MVC,MVP和MVVM都不陌生,作为三个最耳熟能详的Android框架,它们的应用可以是非常广泛的,但是对于一些新手来说,可能对于区分它们三个都有困难,更别说在实际的项目中应用了,有些时候 ...
gcov辅助脚本
gcov辅助脚本(金庆的专栏)代码覆盖测试查看结果时,需要进入代码所在目录,调用gcov,然后vi查看.因为代码目录结构复杂,进出子目录太麻烦,所以用以下脚本直接生成与查看.一般是用TSVN列出有更改 ...
《java入门第一季》之泛型方法和泛型接口
一.泛型方法. /* * 泛型方法:把泛型定义在方法上.格式:public <泛型类型> 返回类型方法名(泛型类型 t) public <T> void show(T t){ ...
Android源码浅析（三）——Android AOSP 5.1.1源码的同步sync和编译make，搭建Samba服务器进行更便捷的烧录刷机
Android源码浅析(三)--Android AOSP 5.1.1源码的同步sync和编译make,搭建Samba服务器进行更便捷的烧录刷机最近比较忙,而且又要维护自己的博客,视频和公众号,也就没 ...
【一天一道LeetCode】#90. Subsets II
一天一道LeetCode 本系列文章已全部上传至我的github,地址:ZeeCoder's Github 欢迎大家关注我的新浪微博,我的新浪微博欢迎转载,转载请注明出处 (一)题目 Given a ...

速度之王 — LZ4压缩算法（二）

LZ4格式

实现

速度之王 — LZ4压缩算法（二）的更多相关文章

随机推荐

热门专题