Redis 数据结构之dict（2）

本文及后续文章，Redis版本均是v3.2.8

上篇文章《Redis 数据结构之dict》，我们对dict的结构有了大致的印象。此篇文章对dict是如何维护数据结构的做个详细的理解。

老规矩还是打开Redis的源码，文件dict.c

一、dict数据结构的维护

1、dictCreate - 创建一个新的哈希表

/* Reset a hash table already initialized with ht_init().

* NOTE: This function should only be called by ht_destroy(). */

static void _dictReset(dictht *ht)

{

ht->table = NULL;// hash table初始化

ht->size = 0;

ht->sizemask = 0;

ht->used = 0;

}

/* Create a new hash table */

dict *dictCreate(dictType *type,

void *privDataPtr)

{

dict *d = zmalloc(sizeof(*d)); // 分配内存

_dictInit(d,type,privDataPtr);// dict初始化

return d;

}

/* Initialize the hash table */

int _dictInit(dict *d, dictType *type,

void *privDataPtr)

{

_dictReset(&d->ht[0]);

_dictReset(&d->ht[1]);

d->type = type;

d->privdata = privDataPtr;

d->rehashidx = -1;

d->iterators = 0;

return DICT_OK;

}

从上述的代码中，可以看出dictCreate为dict的数据结构分配空间并为各个变量赋初值。其中两个哈希表ht[0]和ht[1]起始都没有分配空间，table指针都赋为NULL。这就说明要等第一个数据插入时才会真正分配空间。

2、dictFind - dict查找

dictEntry *dictFind(dict *d, const void *key)

{

dictEntry *he;

unsigned int h, idx, table;

if (d->ht[0].used + d->ht[1].used == 0) return NULL; /* dict is empty */

if (dictIsRehashing(d)) _dictRehashStep(d);

h = dictHashKey(d, key);

for (table = 0; table <= 1; table++) {

idx = h & d->ht[table].sizemask;

he = d->ht[table].table[idx];

while(he) {

if (key==he->key || dictCompareKeys(d, key, he->key))

return he;

he = he->next;

}

if (!dictIsRehashing(d)) return NULL;

}

return NULL;

}

从上述的代码中，dictFind主要是根据dict是否正在重哈希，进行如下操作：

如果当前正在重哈希，那么就调用_dictRehashStep(d)【稍后在详细看下实现】。
调用dictHashKey，计算key的哈希值
两层for循环，其实就是上面定义的两个hash table。首先在在第一个哈希表h[0]上查找，在table数组上定位到哈希值所对应的位置（通过哈希值与sizemask进行按位与计算），然后在对应的dictEntry链表上查找。在遍历dictEntry链表时，需要对key进行比较即调用dictCompareKeys(d, key, he->key)，dictCompareKeys里面的实现会调用keyCompare。如果找到就返回该项。否则，进行下一步。
接下来判断是否正在重哈希，如果没有，那么在ht[0]上找的结果就是最终的结果（如果没有找到，就返回NULL）；否则，执行第二次遍历即在ht[1]上查找，过程如ht[0]一致。

3、dictAdd和dictReplace - dict插入

/* Add an element to the target hash table */

int dictAdd(dict *d, void *key, void *val)

{

dictEntry *entry = dictAddRaw(d,key);

if (!entry) return DICT_ERR;

dictSetVal(d, entry, val);

return DICT_OK;

}

/* Low level add. This function adds the entry but instead of setting

* a value returns the dictEntry structure to the user, that will make

* sure to fill the value field as he wishes.

*

* This function is also directly exposed to the user API to be called

* mainly in order to store non-pointers inside the hash value, example:

*

* entry = dictAddRaw(dict,mykey);

* if (entry != NULL) dictSetSignedIntegerVal(entry,1000);

*

* Return values:

*

* If key already exists NULL is returned.

* If key was added, the hash entry is returned to be manipulated by the caller.

*/

dictEntry *dictAddRaw(dict *d, void *key)

{

int index;

dictEntry *entry;

dictht *ht;

if (dictIsRehashing(d)) _dictRehashStep(d);

/* Get the index of the new element, or -1 if

* the element already exists. */

if ((index = _dictKeyIndex(d, key)) == -1)

return NULL;

/* Allocate the memory and store the new entry.

* Insert the element in top, with the assumption that in a database

* system it is more likely that recently added entries are accessed

* more frequently. */

ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0];

entry = zmalloc(sizeof(*entry));

entry->next = ht->table[index];//将新元素添加到桶中链表的头节点

ht->table[index] = entry;

ht->used++;

/* Set the hash entry fields. */

dictSetKey(d, entry, key);

return entry;

}

_dictKeyIndex

/* Returns the index of a free slot that can be populated with

* a hash entry for the given 'key'.

* If the key already exists, -1 is returned.

*

* Note that if we are in the process of rehashing the hash table, the

* index is always returned in the context of the second (new) hash table. */

static int _dictKeyIndex(dict *d, const void *key)

{

unsigned int h, idx, table;

dictEntry *he;

/* Expand the hash table if needed */

if (_dictExpandIfNeeded(d) == DICT_ERR)

return -1;

/* Compute the key hash value */

h = dictHashKey(d, key);

for (table = 0; table <= 1; table++) {

idx = h & d->ht[table].sizemask;

/* Search if this slot does not already contain the given key */

he = d->ht[table].table[idx];

while(he) {

if (key==he->key || dictCompareKeys(d, key, he->key))

return -1;

he = he->next;

}

if (!dictIsRehashing(d)) break;

}

return idx;

}

/* Add an element, discarding the old if the key already exists.

* Return 1 if the key was added from scratch, 0 if there was already an

* element with such key and dictReplace() just performed a value update

* operation. */

int dictReplace(dict *d, void *key, void *val)

{

dictEntry *entry, auxentry;

/* Try to add the element. If the key

* does not exists dictAdd will suceed. */

if (dictAdd(d, key, val) == DICT_OK)

return 1;

/* It already exists, get the entry */

entry = dictFind(d, key);

/* Set the new value and free the old one. Note that it is important

* to do that in this order, as the value may just be exactly the same

* as the previous one. In this context, think to reference counting,

* you want to increment (set), and then decrement (free), and not the

* reverse. */

auxentry = *entry;

dictSetVal(d, entry, val);

dictFreeVal(d, &auxentry);

return 0;

}

dictAdd和dictReplace都有插入的功能，它们又有何区别：

dictAdd插入新的一对key和value，如果key已经存在，则插入失败。
dictReplace是在dictAdd的基础上实现的。dictReplace也是插入一对key和value，不过在key存在的时候，它会更新value。这其实相当于两次查找过程dictFind。

从dictAdd和dictReplace的代码的注释，我们大致了解函数的实现过程和原理：

dictAdd和dictReplace也会调用_dictRehashStep(d)，触发推进一步重哈希
如果正在重哈希中，则会把数据插入到ht[1]，否则数据插入到ht[0]。
在对应bucket中插入数据的时候，数据总是插入dictEntry链表的头部，因为最近添加的数据更可能被访问的概率更频繁。
dictKeyIndex，可能会存在哈希表的内存扩展。_dictExpandIfNeeded(d)，它将哈希表的长度扩展为原来的两倍。
_dictKeyIndex，在dict查找元素插入的位置。从代码中，看到ht[0]、ht[1]的遍历，如果不在重哈希过程中，它只查找ht[0]；否则查找ht[0]和ht[1]。

4、dictDelete - dict删除

/* Search and remove an element */

static int dictGenericDelete(dict *d, const void *key, int nofree)

{

unsigned int h, idx;

dictEntry *he, *prevHe;

int table;

if (d->ht[0].size == 0) return DICT_ERR; /* d->ht[0].table is NULL */

if (dictIsRehashing(d)) _dictRehashStep(d);

h = dictHashKey(d, key);

for (table = 0; table <= 1; table++) {

idx = h & d->ht[table].sizemask;

he = d->ht[table].table[idx];

prevHe = NULL;

while(he) {

if (key==he->key || dictCompareKeys(d, key, he->key)) {

/* Unlink the element from the list */

if (prevHe)

prevHe->next = he->next;

else

d->ht[table].table[idx] = he->next;

if (!nofree) {

dictFreeKey(d, he);

dictFreeVal(d, he);

}

zfree(he);

d->ht[table].used--;

return DICT_OK;

}

prevHe = he;

he = he->next;

}

if (!dictIsRehashing(d)) break;

}

return DICT_ERR; /* not found */

}

int dictDelete(dict *ht, const void *key) {

return dictGenericDelete(ht,key,0);

}

int dictDeleteNoFree(dict *ht, const void *key) {

return dictGenericDelete(ht,key,1);

}

从dictDelete代码中，可以看到

dictDelete也会触发推进一步重哈希（_dictRehashStep）
如果当前不在重哈希过程中，它只在ht[0]中查找要删除的key；否则ht[0]和ht[1]它都要查找。
删除成功后会调用key和value的析构函数（keyDestructor和valDestructor）。

从dictCreate、dictFind、dictAdd\dictReplace、dictDelete代码中，看到这些函数中都有_dictRehashStep(d)函数的调用(将哈希推进一步)。此举的目的就将重哈希过程分散到各个查找、插入和删除操作中去了，而不是集中在某一个操作中一次性做完。

5、_dictRehashStep源码实现

/* This function performs just a step of rehashing, and only if there are

* no safe iterators bound to our hash table. When we have iterators in the

* middle of a rehashing we can't mess with the two hash tables otherwise

* some element can be missed or duplicated.

*

* This function is called by common lookup or update operations in the

* dictionary so that the hash table automatically migrates from H1 to H2

* while it is actively used. */

static void _dictRehashStep(dict *d) {

if (d->iterators == 0) dictRehash(d,1);

}

/* Performs N steps of incremental rehashing. Returns 1 if there are still

* keys to move from the old to the new hash table, otherwise 0 is returned.

*

* Note that a rehashing step consists in moving a bucket (that may have more

* than one key as we use chaining) from the old to the new hash table, however

* since part of the hash table may be composed of empty spaces, it is not

* guaranteed that this function will rehash even a single bucket, since it

* will visit at max N*10 empty buckets in total, otherwise the amount of

* work it does would be unbound and the function may block for a long time. */

int dictRehash(dict *d, int n) {

int empty_visits = n*10; /* Max number of empty buckets to visit. */

if (!dictIsRehashing(d)) return 0;

while(n-- && d->ht[0].used != 0) {

dictEntry *de, *nextde;

/* Note that rehashidx can't overflow as we are sure there are more

* elements because ht[0].used != 0 */

assert(d->ht[0].size > (unsigned long)d->rehashidx);

while(d->ht[0].table[d->rehashidx] == NULL) {//跳过数组中为空的桶

d->rehashidx++;

if (--empty_visits == 0) return 1;//如果访问空桶次数超过限制，则直接返回

}

de = d->ht[0].table[d->rehashidx];//ht[0]中正在rehash的桶元素的头节点

/* Move all the keys in this bucket from the old to the new hash HT */

while(de) {

unsigned int h;

nextde = de->next;

/* Get the index in the new hash table */

h = dictHashKey(d, de->key) & d->ht[1].sizemask;//计算ht[0]中元素进行rehash后在ht[1]中的索引

de->next = d->ht[1].table[h];//并插入到链表的头部

d->ht[1].table[h] = de;

d->ht[0].used--;

d->ht[1].used++;

de = nextde;

}

d->ht[0].table[d->rehashidx] = NULL;

d->rehashidx++;//该桶处理完成后，准备处理下一个桶 }

}

/* Check if we already rehashed the whole table... */

//ht[0]剩余元素个数为0，表明ht[0]中的元素已经全部rehash到ht[1]中，因此rehash过程已经完成

if (d->ht[0].used == 0) {

zfree(d->ht[0].table);//可以释放ht[0]，并将ht[1]赋给ht[0]后重置ht[1]

d->ht[0] = d->ht[1];

_dictReset(&d->ht[1]);

d->rehashidx = -1;//表明rehash已经结束

return 0;

}

/* More to rehash... */

return 1;//否则还处于rehash过程中

}

_dictRehashStep，可以理解为增量式重哈希。

dictRehash每次将重哈希至少向前推进N步（除非不到N步整个重哈希就结束了），每一步都将ht[0]上某一个bucket（即一个dictEntry链表）上的每一个dictEntry移动到ht[1]上，它在ht[1]上的新位置根据ht[1]的sizemask进行重新计算。rehashidx记录了当前尚未迁移（有待迁移）的ht[0]的bucket位置。

如果dictRehash被调用的时候，rehashidx指向的bucket里一个dictEntry也没有，那么它就没有可迁移的数据。这时它尝试在ht[0].table数组中不断向后遍历，直到找到下一个存有数据的bucket位置。如果一直找不到，则最多走N*10步，本次重哈希暂告结束。

最后，如果ht[0]上的数据都迁移到ht[1]上了（即d->ht[0].used == 0），那么整个重哈希结束，ht[0]变成ht[1]的内容，而ht[1]重置为空。

对于重哈希过程的分析，正如上篇文章对dict结构图中所展示的正是rehashidx=2时的情况，前面两个bucket（ht[0].table[0]和ht[0].table[1]）都已经迁移到ht[1]上去了。

总结

Rehash操作分为扩展和收缩两种情况，

dict中有两个hash表，ht[0]和ht[1]。从代码中看出，dict的rehash并不是一次性完成的，而是分多次、渐进式的完成的。具体的说dict有两种不同的策略：

1、_dictRehashStep：所有的数据都是存在放dict的ht[0]中，ht[1]只在rehash的时候使用。dict进行rehash的时候，将ht[0]中的所有数据rehash到ht[1]中。

2、dictRehashMilliseconds：每次执行一段固定的时间，时间到了就暂停rehash操作。

为什么要Rehash？

1、从感性上说，随着HashTable中的数据增多，冲突的元素增多，ht[0]的链表增长，查找元素效率就越低，因此就需要Rehash。

2、从代码角度看，哈希表利用负载因子loadfactor = used/size来表明hash表当前的存储情况。当负载因子过大时操作的时间复杂度增大，负载因子过小时说明hash表的填充率很低，浪费内存。由于Redis中的数据都是存储在内存中的，因此我们必须尽量的节省内存。因此我们必须将loadfactor控制在一定的范围内，同时保证操作的时间复杂度接近O(1)和内存尽量被占用。

-EOF-

Redis 数据结构之dict（2）的更多相关文章

Redis 数据结构之dict
上篇文章<Redis数据结构概述>中,了解了常用数据结构.我们知道Redis以高效的方式实现了多种数据结构,因此把Redis看做为数据结构服务器也未尝不可.研究Redis的数据结构和正确. ...
redis数据结构存储Dict设计细节（redis的设计与实现笔记）
说到redis的Dict(字典),虽说算法上跟市面上一般的Dict实现没有什么区别,但是redis的Dict有2个特殊的地方那就是它的rehash(重新散列)和它的字典节点单向链表. 以下是dict用 ...
Redis 数据结构的底层实现 (二) dict skiplist intset
一.REDIS_INCODING_HT (dict字典,hashtable) dict是一个用于维护key和value映射关系的数据结构.redis的一个database中所有的key到value的映 ...
Redis数据结构详解（2）-redis中的字典dict
前提知识字典,又被称为符号表(symbol table)或映射(map),其实简单地可以理解为键值对key-value. 比如Java的常见集合类HashMap,就是用来存储键值对的. 字典中的键( ...
Redis数据结构底层知识总结
Redis数据结构底层总结本篇文章是基于作者黄建宏写的书Redis设计与实现而做的笔记数据结构与对象 Redis中数据结构的底层实现包括以下对象: 对象解释简单动态字符串字符串的底层实现链 ...
Redis 数据结构与内存管理策略（上）
Redis 数据结构与内存管理策略(上) 标签: Redis Redis数据结构 Redis内存管理策略 Redis数据类型 Redis类型映射 Redis 数据类型特点与使用场景 String.Li ...
Redis 数据结构与内存管理策略（下）
Redis 数据结构与内存管理策略(下) 标签: Redis Redis数据结构 Redis内存管理策略 Redis数据类型 Redis类型映射 Redis 数据类型特点与使用场景 String.Li ...
Redis数据结构之intset
本文及后续文章,Redis版本均是v3.2.8 上篇文章<Redis数据结构之robj>,我们说到redis object数据结构,其有5中数据类型:OBJ_STRING,OBJ_LIST ...
Redis数据结构之robj
本文及后续文章,Redis版本均是v3.2.8 我们知道一个database内的这个映射关系是用一个dict来维护的.dict的key固定用一种数据结构来表达,这这数据结构就是动态字符串sds.而va ...

随机推荐

JarvisOJ Misc shell流量分析
分析一下shell流量,得到flag 看着一大推的数据记录头都大了,并没有什么wireshark的使用经验,开始胡搞首先用notepad++打开,搜索flag字样找到了一个类似于python脚本的东 ...
Zookeeper 启蒙
2018-12-14 关键词: Zookeeper入门介绍 . Zookeeper是什么.Zookeeper架构解析.Zookeeper应用场景.Zookeeper有什么用本篇文章系笔者依据当前所掌 ...
MT【316】常数变易法
已知数列$\{a_n\}$满足$a_1=0,a_{n+1}=\dfrac{n+2}{n}a_n+1$,求$a_n$ 解答:$\dfrac{a_{n+1}}{n(n+1)}=\dfrac{a_n}{n( ...
「LibreOJ NOI Round #1」验题
麻烦的动态DP写了2天简化题意:给树,求比给定独立集字典序大k的独立集是哪一个主要思路: k排名都是类似二分的按位确定过程. 字典序比较本质是LCP下一位,故枚举LCP,看多出来了多少个独立集,然 ...
iView页面Modal中内嵌Tabs，重新显示Modal时默认选中Tabs的第一项
文档中说激活面板的name用value,页面第一次加载的时候可以,放在modal里就不好使了,每次打开的时候总显示上一次离开时的界面. 真正能用的是 this.$refs.tabs.activeKey ...
CentOS7 Zabbix3.4安装
依赖于lnmp或者lamp环境: 1.下载源码包 # wget -O zabbix-3.4.2.tar.gz http://sourceforge.net/projects/zabbix/files/ ...
saltstack主机管理项目：动态调用插件解析-模块解析（五）
一.动态调用插件解析 1.目录结构 1.base_module代码解析: def syntax_parser(self,section_name,mod_name,mod_data): print(& ...
Linux查看系统信息的命令及已安装软件包的命令
系统 uname -a查看内核/操作系统/CPU信息head -n 1 /etc/issue查看操作系统版本cat /proc/cpuinfo查看CPU信息hostname查看计算机名lspci -t ...
HTML（四）HTML常用标签（a，img）
a元素 <a>元素 (或HTML锚元素, Anchor Element)通常用来表示一个锚点/链接.但严格来说,<a>元素不是一个链接,而是超文本锚点,可以链接到一个新文件.用 ...
C++自定义String字符串类,支持子串搜索
C++自定义String字符串类实现了各种基本操作,包括重载+号实现String的拼接 findSubStr函数,也就是寻找目标串在String中的位置,用到了KMP字符串搜索算法. #includ ...

Redis 数据结构之dict（2）

Redis 数据结构之dict（2）的更多相关文章

随机推荐

热门专题