Redis之ziplist源码分析

一、ziplist简介

从上一篇分析我们知道quicklist的底层存储使用了ziplist（压缩列表），由于压缩列表本身也有不少内容，所以重新开了一篇，在正式源码之前，还是先看下ziplist的特点：

1. ziplist是一种特殊编码的双向列表，特殊编码是为了节省存储空间。

2. ziplist允许同时存放字符串和整型类型，并且整型数被编码成真实的整型数而不是字符串序列(节省空间)。

3. ziplist列表支持在头部和尾部进行push和pop操作的时间复杂度都在常量范围O(1)，但是每次操作都涉及内存重新分配，尤其在头部操作时，会涉及大段的内存移动操作，增加了操作的复杂性。

上面粗体部分会在下面的代码分析中一一体现（ziplist.h和ziplist.c）。

二、ziplist数据结构

下面我们先看一下ziplist的结构示意图：

上面示意图展示了ziplist的整体结构，由于ziplist和entry的长度是不定长的，因此代码中也没有这两个接口的定义，这里先给出一个示意结构定义，方便理解：

struct ziplist<T>{

    unsigned int zlbytes; // ziplist的长度字节数，包含头部、所有entry和zipend。

    unsigned int zloffset; // 从ziplist的头指针到指向最后一个entry的偏移量，用于快速反向查询

    unsigned short int zllength; // entry元素个数

    T[] entry;              // 元素值

    unsigned char zlend;   // ziplist结束符，值固定为0xFF

}

struct entry{

  char[var] prevlen; // 前面一个entry的字节长度值。

  char[var] encoding; // 元素编码类型

  char[] content;  // 元素内容

}

代码中对ziplist的变量的读取和赋值都是通过宏来实现的，如下：

#define ZIPLIST_BYTES(zl)       (*((uint32_t*)(zl)))

#define ZIPLIST_TAIL_OFFSET(zl) (*((uint32_t*)((zl)+sizeof(uint32_t))))

#define ZIPLIST_LENGTH(zl)      (*((uint16_t*)((zl)+sizeof(uint32_t)*2)))

#define ZIPLIST_HEADER_SIZE     (sizeof(uint32_t)*2+sizeof(uint16_t))

#define ZIPLIST_END_SIZE        (sizeof(uint8_t))

#define ZIPLIST_ENTRY_HEAD(zl)  ((zl)+ZIPLIST_HEADER_SIZE)

#define ZIPLIST_ENTRY_TAIL(zl)  ((zl)+intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl)))

#define ZIPLIST_ENTRY_END(zl)   ((zl)+intrev32ifbe(ZIPLIST_BYTES(zl))-1)

entry的结构要稍微复杂一些了，里面对prevlen和encoding做了特殊编码以节省空间，ziplist的精髓也正是在这里体现的。

首先看prevlen赋值方法的源代码：

/* Encode the length of the previous entry and write it to "p". Return the

 * number of bytes needed to encode this length if "p" is NULL.

   把前一个entry的长度编码后写入当前entry的prevLen字段，编码规则：

   1. 如果len<254，prevLen占用一个字节，并写入当前entry的第一个字节。

   2. 如果len>=254，prevLen占用五个字节，第一个字节固定写入254，第二个至第五个字节写入实际的长度。

 */

static unsigned int zipPrevEncodeLength(unsigned char *p, unsigned int len) {

    if (p == NULL) { // 此时只是计算len所需的存储长度

        return (len < ZIP_BIGLEN) ?  : sizeof(len)+;

    } else {

        if (len < ZIP_BIGLEN) {

            p[] = len;

            return ;

        } else {

            p[] = ZIP_BIGLEN;

            memcpy(p+,&len,sizeof(len));

            memrev32ifbe(p+);

            return +sizeof(len);

        }

    }

}

下面再来分析encoding字段，redis根据存储元素的值做不同的编码（long long类型和String类型），long long类型编码也是为了节省空间，其是在zipTryEncoding方法中进行：

/* Check if string pointed to by 'entry' can be encoded as an integer.

 * Stores the integer value in 'v' and its encoding in 'encoding'.

 当存储内容可以转化为long long类型时，encoding占用一个字节，其中前2位固定都是1，后面6位根据value值大小不同，具体如下：

    a. OX11000000 表示content内容是int16，长度是2个字节。

    b. OX11010000 表示content内容是int32，长度是4个字节。

    c. OX11100000 表示content内容是int64，长度是8个字节。

    d. OX11110000 表示content内容是int24，长度是3个字节。

    e. OX11111110 表示content内容是int8，长度是1个字节。

    f. OX11111111 表示ziplist的结束。

    g. 0X1111xxxx 表示极小数，存储0-12的值，由于0000和1111都不能使用，所以它的实际值将是1至13，程序在取得这4位的值之后，还需要减去1，才能计算出正确的值，比如说，如果后4位为0001 = 1，那么程序返回的值将是1-1=0。

 */

static int zipTryEncoding(unsigned char *entry, unsigned int entrylen, long long *v, unsigned char *encoding) {

    long long value;

    if (entrylen >=  || entrylen == ) return ;

    if (string2ll((char*)entry,entrylen,&value)) {

        /* Great, the string can be encoded. Check what's the smallest

         * of our encoding types that can hold this value. */

        if (value >=  && value <= ) {

            *encoding = ZIP_INT_IMM_MIN+value;

        } else if (value >= INT8_MIN && value <= INT8_MAX) {

            *encoding = ZIP_INT_8B;

        } else if (value >= INT16_MIN && value <= INT16_MAX) {

            *encoding = ZIP_INT_16B;

        } else if (value >= INT24_MIN && value <= INT24_MAX) {

            *encoding = ZIP_INT_24B;

        } else if (value >= INT32_MIN && value <= INT32_MAX) {

            *encoding = ZIP_INT_32B;

        } else {

            *encoding = ZIP_INT_64B;

        }

        *v = value;

        return ;

    }

    return ;

}

上述方法定义了是否可以编码为long long类型，如果不能，则编码为String类型并赋值，编码代码在zipEncodeLength方法：

/* Encode the length 'rawlen' writing it in 'p'. If p is NULL it just returns

 * the amount of bytes required to encode such a length.

 本方法对encoding是String类型时，进行编码并赋值（如果entry内容可以转化为long long类型，在zipTryEncoding方法中进行编码），并根据不同长度的字符串来编码encoding的值，具体如下：

    a. 0X00xxxxxx 前两位00表示最大长度为63的字符串，后面6位表示实际字符串长度，encoding占用1个字节。

    b. 0X01xxxxxx xxxxxxxx 前两位01表示中等长度的字符串（大于63小于等于16383），后面14位表示实际长度，encoding占用两个字节。

    c. OX10000000 xxxxxxxx xxxxxxxx xxxxxxxx 表示特大字符串，第一个字节固定128(0X80)，后面四个字节存储实际长度，encoding占用5个字节。

 */

static unsigned int zipEncodeLength(unsigned char *p, unsigned char encoding, unsigned int rawlen) {

    unsigned char len = , buf[];

    if (ZIP_IS_STR(encoding)) {

        /* Although encoding is given it may not be set for strings,

         * so we determine it here using the raw length. */

        if (rawlen <= 0x3f) {

            if (!p) return len;

            buf[] = ZIP_STR_06B | rawlen;

        } else if (rawlen <= 0x3fff) {

            len += ;

            if (!p) return len;

            buf[] = ZIP_STR_14B | ((rawlen >> ) & 0x3f);

            buf[] = rawlen & 0xff;

        } else {

            len += ;

            if (!p) return len;

            buf[] = ZIP_STR_32B;

            buf[] = (rawlen >> ) & 0xff;

            buf[] = (rawlen >> ) & 0xff;

            buf[] = (rawlen >> ) & 0xff;

            buf[] = rawlen & 0xff;

        }

    } else {

        /* Implies integer encoding, so length is always 1. */

        if (!p) return len;

        buf[] = encoding;

    }

    /* Store this length at p */

    memcpy(p,buf,len);

    return len;

}

三、ziplist增删改查

1. 创建ziplist

在执行lpush命令时，如果当前quicklistNode是新建的，则需要新建一个ziplist：

/* Add new entry to head node of quicklist.

 *

 * Returns 0 if used existing head.

 * Returns 1 if new head created.

 在quicklist的头部节点添加新元素：

 如果新元素添加在head中，返回0，否则返回1.

 */

int quicklistPushHead(quicklist *quicklist, void *value, size_t sz) {

    quicklistNode *orig_head = quicklist->head;

    // 如果head不为空，且空间大小满足新元素的存储要求，则新元素添加到head中，否则新加一个quicklistNode

    if (likely(

            _quicklistNodeAllowInsert(quicklist->head, quicklist->fill, sz))) {

        quicklist->head->zl =

            ziplistPush(quicklist->head->zl, value, sz, ZIPLIST_HEAD);

        quicklistNodeUpdateSz(quicklist->head);

    } else {

        // 创建新的quicklistNode

        quicklistNode *node = quicklistCreateNode();

        // 把新元素添加到新建的ziplist中

        node->zl = ziplistPush(ziplistNew(), value, sz, ZIPLIST_HEAD);

        // 更新ziplist的长度到quicklistNode的sz字段

        quicklistNodeUpdateSz(node);

        // 把新node添加到quicklist中，即添加到原head前面

        _quicklistInsertNodeBefore(quicklist, quicklist->head, node);

    }

    quicklist->count++;

    quicklist->head->count++;

    return (orig_head != quicklist->head);

}

/* Create a new empty ziplist. */

unsigned char *ziplistNew(void) {

    unsigned int bytes = ZIPLIST_HEADER_SIZE+;

    unsigned char *zl = zmalloc(bytes);

    ZIPLIST_BYTES(zl) = intrev32ifbe(bytes);

    ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(ZIPLIST_HEADER_SIZE);

    ZIPLIST_LENGTH(zl) = ;

    zl[bytes-] = ZIP_END;

    return zl;

}

2. 添加entry

添加entry的代码在ziplistPush方法中：

unsigned char *ziplistPush(unsigned char *zl, unsigned char *s, unsigned int slen, int where) {

    unsigned char *p;

    p = (where == ZIPLIST_HEAD) ? ZIPLIST_ENTRY_HEAD(zl) : ZIPLIST_ENTRY_END(zl);

    return __ziplistInsert(zl,p,s,slen);

}

/* Insert item at "p". zl中添加一个元素 */

static unsigned char *__ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen) {

    size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), reqlen;

    unsigned int prevlensize, prevlen = ;

    size_t offset;

    int nextdiff = ;

    unsigned char encoding = ;

    long long value = ; /* initialized to avoid warning. Using a value

                                    that is easy to see if for some reason

                                    we use it uninitialized. */

    zlentry tail;

    /* Find out prevlen for the entry that is inserted. */

    if (p[] != ZIP_END) {

        ZIP_DECODE_PREVLEN(p, prevlensize, prevlen);

    } else {

        // 当之前的操作从尾巴删除元素时，ZIPLIST_ENTRY_TAIL指针会向前迁移，此时ptail[0] != ZIP_END

        unsigned char *ptail = ZIPLIST_ENTRY_TAIL(zl);

        if (ptail[] != ZIP_END) {

            prevlen = zipRawEntryLength(ptail);

        }

    }

    /* See if the entry can be encoded */

    // 检查entry的value是否可以编码为long long类型，如果可以就把值保存在value中，

    // 并把所需最小字节长度保存在encoding

    if (zipTryEncoding(s,slen,&value,&encoding)) {

        /* 'encoding' is set to the appropriate integer encoding */

        reqlen = zipIntSize(encoding);

    } else {

        /* 'encoding' is untouched, however zipEncodeLength will use the

         * string length to figure out how to encode it. */

        reqlen = slen;

    }

    /* We need space for both the length of the previous entry and

     * the length of the payload. */

    reqlen += zipPrevEncodeLength(NULL,prevlen);

    reqlen += zipEncodeLength(NULL,encoding,slen);

    /* When the insert position is not equal to the tail, we need to

     * make sure that the next entry can hold this entry's length in

     * its prevlen field. */

    nextdiff = (p[] != ZIP_END) ? zipPrevLenByteDiff(p,reqlen) : ;

    // reqlen是zlentry所需大小，nextdiff是待插入位置原entry中prelen与新entry中prelen所需存储空间的大小差值。

    /* Store offset because a realloc may change the address of zl. */

    offset = p-zl;

    zl = ziplistResize(zl,curlen+reqlen+nextdiff);

    p = zl+offset;

    /* Apply memory move when necessary and update tail offset. */

    if (p[] != ZIP_END) {

        /* Subtract one because of the ZIP_END bytes */

        // 原数据向后移动，腾出空间写入新的zlentry

        memmove(p+reqlen,p-nextdiff,curlen-offset-+nextdiff);

        /* Encode this entry's raw length in the next entry. */

        // 新entry的长度写入下一个zlentry的prelen

        zipPrevEncodeLength(p+reqlen,reqlen);

        /* Update offset for tail */

        // 更新ZIPLIST_TAIL_OFFSET指向原来的tail entry。

        ZIPLIST_TAIL_OFFSET(zl) =

            intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+reqlen);

        /* When the tail contains more than one entry, we need to take

         * "nextdiff" in account as well. Otherwise, a change in the

         * size of prevlen doesn't have an effect on the *tail* offset. */

        zipEntry(p+reqlen, &tail);

        // 如果原插入位置的entry不是最后的tail元素，需要调整ZIPLIST_TAIL_OFFSET值（增加nextdiff）

        if (p[reqlen+tail.headersize+tail.len] != ZIP_END) {

            ZIPLIST_TAIL_OFFSET(zl) =

                intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);

        }

    } else {

        /* This element will be the new tail. */

        // ZIPLIST_TAIL_OFFSET指向新加的entry，即新加的entry是tail元素

        ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(p-zl);

    }

    /* When nextdiff != 0, the raw length of the next entry has changed, so

     * we need to cascade the update throughout the ziplist */

    if (nextdiff != ) {

        // 如果nextdiff不为0，需要循环更新后续entry中的prelen，最差情况下，所有entry都需要更新一遍

        offset = p-zl;

        zl = __ziplistCascadeUpdate(zl,p+reqlen);

        p = zl+offset;

    }

    /* Write the entry */

    // 给新加的entry赋值

    p += zipPrevEncodeLength(p,prevlen);

    p += zipEncodeLength(p,encoding,slen);

    if (ZIP_IS_STR(encoding)) {

        memcpy(p,s,slen);

    } else {

        zipSaveInteger(p,value,encoding);

    }

    ZIPLIST_INCR_LENGTH(zl,);

    return zl;

}

从上面代码可以看出，如果是在头部添加元素时，需要把执行memmove方法把当前ziplist中的所有元素后移一段距离，消耗还是比较大的。

3. 删除entry

删除操作在ziplistDelete方法中实现，其逻辑和添加刚刚相反，就不再赘述了。

至此，ziplist的主体代码就分析结束了，从代码可以看到，ziplist的实现非常精妙，尽可能的节省存储空间，但是在头部操作时，会有大量的内存移动操作，消耗挺大，在尾部操作时，无内存移动，效率则要高很多。

本篇内容参考了钱文品的《Redis深度历险：核心原理与应用实践》，特此感谢！

Redis之ziplist源码分析的更多相关文章

Redis 内存管理源码分析
要想了解redis底层的内存管理是如何进行的,直接看源码绝对是一个很好的选择下面是我添加了详细注释的源码,需要注意的是,为了便于源码分析,我把redis为了弥补平台差异的那部分代码删了,只需要知道有 ...
Redis网络模型的源码分析
Redis的网络模型是基于I/O多路复用程序来实现的.源码中包含四种多路复用函数库epoll.select.evport.kqueue.在程序编译时会根据系统自动选择这四种库其中之一.下面以epoll ...
Redis之quicklist源码分析
一.quicklist简介 Redis列表是简单的字符串列表,按照插入顺序排序.你可以添加一个元素到列表的头部(左边)或者尾部(右边). 一个列表最多可以包含 232 - 1 个元素 (4294967 ...
Redis 数据结构-字符串源码分析
相关文章 Redis 初探-安装与使用 Redis常用指令本文将从以下几个部分进行介绍 1.前言 2.常用命令 3.字符串结构 4.字符串实现 5.命令是如果操作字符串的前言平时在使用 Redi ...
Redis网络库源码分析(1)之介绍篇
一.前言 Redis网络库是一个单线程EPOLL模型的网络库,和Memcached使用的libevent相比,它没有那么庞大,代码一共2000多行,因此比较容易分析.其实网上已经有非常多有关这个网络库 ...
第10课：[实战] Redis 网络通信模块源码分析（3）
redis-server 接收到客户端的第一条命令 redis-cli 给 redis-server 发送的第一条数据是 *1\r\n\$7\r\nCOMMAND\r\n .我们来看下对于这条数据如何 ...
第09课：【实战】Redis网络通信模块源码分析（2）
侦听 fd 与客户端 fd 是如何挂载到 EPFD 上去的同样的方式,要把一个 fd 挂载到 EPFD 上去,需要调用系统 API epoll_ctl ,搜索一下这个函数名.在文件 ae_epoll ...
第08课：【实战】Redis网络通信模块源码分析（1）
我们这里先研究redis-server端的网络通信模块.除去Redis本身的业务功能以外,Redis的网络通信模块实现思路和细节非常有代表性.由于网络通信模块的设计也是Linux C++后台开发一个很 ...
Redis网络库源码分析(3)之ae.c
一.aeCreateEventLoop & aeCreateFileEvent 上一篇文章中,我们已经将服务器启动,只是其中有些细节我们跳过了,比如aeCreateEventLoop函数到底做 ...

随机推荐

在一台Linux服务器上安装多个MySQL实例（一）--使用mysqld_multi方式
(一)MySQL多实例概述实例是进程与内存的一个概述,所谓MySQL多实例,就是在服务器上启动多个相同的MySQL进程,运行在不同的端口(如3306,3307,3308),通过不同的端口对外提供服务 ...
Servlet(四)----Request
## Request 1.request对象和response对象的原理 1.request和response对象是由服务器创建的.我们来使用他们. 2.request对象是来获取请求消息,resp ...
JavaScript----DOM和事件的简单学习
##DOM简单学习 * 功能:控制html文档的内容 * 代码:获取页面标签(元素)对象:Element * document.getElementById("id值"):通 ...
[dfs] HDU 2019 Multi-University Training Contest 10 - Block Breaker
Block Breaker Time Limit: 2000/2000 MS (Java/Others) Memory Limit: 524288/524288 K (Java/Others)T ...
朴素贝叶斯分类器（Naive Bayesian Classifier）
本博客是基于对周志华教授所著的<机器学习>的"第7章贝叶斯分类器"部分内容的学习笔记. 朴素贝叶斯分类器,顾名思义,是一种分类算法,且借助了贝叶斯定理.另外,它是一种 ...
万字综述，核心开发者全面解读PyTorch内部机制
斯坦福大学博士生与 Facebook 人工智能研究所研究工程师 Edward Z. Yang 是 PyTorch 开源项目的核心开发者之一.他在 5 月 14 日的 PyTorch 纽约聚会上做了一个 ...
《闲扯Redis三》Redis五种数据类型之List型
一.前言 Redis 提供了5种数据类型:String(字符串).Hash(哈希).List(列表).Set(集合).Zset(有序集合),理解每种数据类型的特点对于redis的开发和运维非常重要. ...
web样式css
css样式什么是css 层叠样式表(Cascading Style Sheets),是一种用来表现HTML(标准通用标记语言的一个应用)或XML(标准通用标记语言的一个子集)等文件样式的计算机语言. ...
php基本数据类型解说
一.简介: php语言是弱类型语言,声明变量的时候不需要指定数据类型.但每个数值都是有数据类型的.PHP共有九种数据类型. php基本数据类型共有四种:boolean(布尔型),integer(整型) ...
Mac 中命令行启动、停止、重启Mysql
启动: ~$ sudo /usr/local/mysql/support-files/mysql.server start 停止: ~$ sudo /usr/local/mysql/support-f ...

Redis之ziplist源码分析

Redis之ziplist源码分析的更多相关文章

随机推荐

热门专题