jedis中的一致性hash算法

【http://my.oschina.net/u/866190/blog/192286】

jredis是redis的java客户端，通过sharde实现负载路由，一直很好奇jredis的sharde如何实现，翻开jredis源码研究了一番，所谓sharde其实就是一致性hash算法。其实，通过其源码可以看出一致性hash算法实现还是比较简单的。主要实现类是redis.clients.util.Sharded<R, S>，关键的地方添加了注释：

publicclassSharded<R, S extendsShardInfo<R>> { //S类封装了redis节点的信息，如name、权重

publicstaticfinalintDEFAULT_WEIGHT = 1;//默认权重为1

privateTreeMap<Long, S> nodes;//存放虚拟节点

privatefinalHashing algo;//hash算法

......

publicSharded(List<S> shards, Hashing algo, Pattern tagPattern) {

this.algo = algo;

this.tagPattern = tagPattern;

initialize(shards);

}

privatevoidinitialize(List<S> shards) {

nodes = newTreeMap<Long, S>();//基于红黑树实现排序map, 是根据key排序的 ,注意这里key放的是long类型,最多放2^32个

for(inti = 0; i != shards.size(); ++i) {

finalS shardInfo = shards.get(i);

if(shardInfo.getName() == null)

for(intn = 0; n < 160* shardInfo.getWeight(); n++) {

//一个真实redis节点关联多个虚拟节点  , 通过计算虚拟节点hash值,可很好平衡把它分散到2^32个整数上

nodes.put(this.algo.hash("SHARD-"+ i + "-NODE-"+ n), shardInfo);

}

else

for(intn = 0; n < 160* shardInfo.getWeight(); n++) {

//一个真实redis节点关联多个虚拟节点  , 通过计算虚拟节点hash值,可很好平衡把它分散到2^32个整数上

nodes.put(this.algo.hash(shardInfo.getName() + "*"+ shardInfo.getWeight() + n), shardInfo);

}

resources.put(shardInfo, shardInfo.createResource());

}

/**

* 计算key的hash值查找实际实际节点S

* @param key

* @return

*/

publicS getShardInfo(byte[] key) {

SortedMap<Long, S> tail = nodes.tailMap(algo.hash(key));//取出比较key的hash大的

if(tail.isEmpty()) {//取出虚拟节点为空,直接取第一个

returnnodes.get(nodes.firstKey());

}

returntail.get(tail.firstKey());//取出虚拟节点第一个

}

......

}

整个算法可总结为：首先生成一个长度为2^32个整数环，通过计算虚拟节点hash值映射到整数环上，间接也把实际节点也放到这个环上（因为虚拟节点会关联上一个实际节点）。然后根据需要缓存数据的key的hash值在整数环上查找，环顺时针找到距离这个key的hash值最近虚拟节点，这样就完成了根据key到实际节点之间的路由了。

一致性hash核心是思想是增加虚拟节点这一层来解决实际节点变动而不破坏整体的一致性。这种增加层的概念来解决问题对于我们来说一点都不陌生，如软件开发中分层设计，操作系统层解决了应用层和硬件的协调工作，java虚拟机解决了跨平台。

还有一个问题值得关注是一个实际节点虚拟多少个节点才是合适呢？认真看过上述代码同学会注意160这个值，这个实际上是经验值，太多会影响性能，太少又会影响不均衡。通过调整weight值，可实现实际节点权重，这个很好理解，虚拟出节点越多，落到这个节点概率越高。

参考资料

http://blog.csdn.net/sparkliang/article/details/5279393

http://my.oschina.net/u/90679/blog/188750

【Redis Dict 中的MurmurHash2算法算法】【http://my.oschina.net/fuckphp/blog/270258】

Redis 中很多地方用到了hash算法，比如在向 key space中插入新的key的时候，或者在实现hashset数据结构的时候都用到了hash算法，今天主要记录一下dict中用到的两种hash算法：djb2 hash function 和 MurmurHash2两种算法。

djb2 算法：

unsigned long hash(unsigned char *str)

{

//hash种子

unsigned long hash = 5381;

int c;

//遍历字符串中每一个字符

while (c = *str++)

//对hash种子进行位运算 hash << 5表示 hash乘以32次方，再加上 hash 表示hash乘以33

//然后再加上字符的ascii码，之后循环次操作

hash = ((hash << 5) + hash) + c; /* hash * 33 + c */

return hash;

}

至于种子为什么选择 5381，通过搜索得到以下结论，该数算一个魔法常量：

5381是个奇数
5381是质数
5381是缺数
二进制分布均匀：001/010/100/000/101

由于本人对算法是一窍不通，以上特点对hash结果会有什么影响实在不懂，希望高手们能解释一下。

Redis算法对djbhash的实现方法如下（以下代码在 src/dict.c ）：

//hash种子，默认为 5381

static uint32_t dict_hash_function_seed = 5381;

//设置hash种子

void dictSetHashFunctionSeed(uint32_t seed) {

dict_hash_function_seed = seed;

}

//获取hash种子

uint32_t dictGetHashFunctionSeed(void) {

return dict_hash_function_seed;

}

/* And a case insensitive hash function (based on djb hash) */

unsigned int dictGenCaseHashFunction(const unsigned char *buf, intlen) {

//得到hash种子

unsigned int hash = (unsigned int)dict_hash_function_seed;

//遍历字符串

while (len--)

//使用dbj算法反复乘以33并加上字符串转小写后的ascii码

hash = ((hash << 5) + hash) + (tolower(*buf++)); /* hash * 33 + c */

return hash;

}

Redis对djbhash做了一个小小的修改，将需要处理的字符串进行了大小写的转换，是的hash算法的结果与大小写无关。

MurmurHash2算法：

uint32_t MurmurHash2( const void * key, int len, uint32_t seed )

{

// 'm' and 'r' are mixing constants generated offline.

// They're not really 'magic', they just happen to work well.

const uint32_t m = 0x5bd1e995;

const int r = 24;

// Initialize the hash to a 'random' value

uint32_t h = seed ^ len;

// Mix 4 bytes at a time into the hash

const unsigned char * data = (const unsigned char *)key;

while(len >= 4)

{

//每次循环都将4个字节的字符转成一个int类型

uint32_t k = *(uint32_t*)data;

k *= m;

k ^= k >> r;

k *= m;

h *= m;

h ^= k;

data += 4;

len -= 4;

}

// Handle the last few bytes of the input array

//处理结尾不足4个字节的数据，通过移位操作将其转换为一个int型数据

switch(len)

{

case 3: h ^= data[2] << 16;

case 2: h ^= data[1] << 8;

case 1: h ^= data[0];

h *= m;

};

// Do a few final mixes of the hash to ensure the last few

// bytes are well-incorporated.

h ^= h >> 13;

h *= m;

h ^= h >> 15;

return h;

}

unsigned int dictGenHashFunction(const void *key, int len) {

/* 'm' and 'r' are mixing constants generated offline.

They're not really 'magic', they just happen to work well. */

uint32_t seed = dict_hash_function_seed;

const uint32_t m = 0x5bd1e995;

const int r = 24;

/* Initialize the hash to a 'random' value */

uint32_t h = seed ^ len;

/* Mix 4 bytes at a time into the hash */

const unsigned char *data = (const unsigned char *)key;

while(len >= 4) {

uint32_t k = *(uint32_t*)data;

k *= m;

k ^= k >> r;

k *= m;

h *= m;

h ^= k;

data += 4;

len -= 4;

}

/* Handle the last few bytes of the input array */

switch(len) {

case 3: h ^= data[2] << 16;

case 2: h ^= data[1] << 8;

case 1: h ^= data[0]; h *= m;

};

/* Do a few final mixes of the hash to ensure the last few

* bytes are well-incorporated. */

h ^= h >> 13;

h *= m;

h ^= h >> 15;

return (unsigned int)h;

}

参考资料：

http://lenky.info/archives/2012/12/2150

Redis2.8.9源码 src/dict.h src/dict.c

Redis 设计与实现（第一版）

djb hash function

http://code.google.com/p/smhasher/

jedis中的一致性hash算法的更多相关文章

Jedis中的一致性hash
Jedis中的一致性hash 本文仅供大家参考,不保证正确性,有问题请及时指出一致性hash就不多说了,网上有很多说的很好的文章,这里说说Jedis中的Shard是如何使用一致性hash的,也为大家 ...
分布式缓存技术memcached学习（四）—— 一致性hash算法原理
分布式一致性hash算法简介当你看到“分布式一致性hash算法”这个词时,第一时间可能会问,什么是分布式,什么是一致性,hash又是什么.在分析分布式一致性hash算法原理之前,我们先来了解一下这几 ...
【转载】一致性hash算法释义
http://www.cnblogs.com/haippy/archive/2011/12/10/2282943.html 一致性Hash算法背景一致性哈希算法在1997年由麻省理工学院的Karge ...
一致性Hash算法及使用场景
一.问题产生背景在使用分布式对数据进行存储时,经常会碰到需要新增节点来满足业务快速增长的需求.然而在新增节点时,如果处理不善会导致所有的数据重新分片,这对于某些系统来说可能是灾难性的. 那 ...
分布式缓存技术memcached学习系列（四）—— 一致性hash算法原理
分布式一致性hash算法简介当你看到"分布式一致性hash算法"这个词时,第一时间可能会问,什么是分布式,什么是一致性,hash又是什么.在分析分布式一致性hash算法原理之前, ...
[转载] 一致性hash算法释义
转载自http://www.cnblogs.com/haippy/archive/2011/12/10/2282943.html 一致性Hash算法背景一致性哈希算法在1997年由麻省理工学院的Ka ...
分布式缓存设计:一致性Hash算法
缓存作为数据库前的一道屏障,它的可用性与缓存命中率都会直接影响到数据库,所以除了配置主从保证高可用之外还需要设计分布式缓存来扩充缓存的容量,将数据分布在多台机器上如果有一台不可用了对整体影响也比较小. ...
一致性Hash算法（Consistent Hash）
分布式算法在做服务器负载均衡时候可供选择的负载均衡的算法有很多,包括: 轮循算法(Round Robin).哈希算法(HASH).最少连接算法(Least Connection).响应速度算法(Re ...
理解一致性Hash算法
简介一致性哈希算法在1997年由麻省理工学院的Karger等人在解决分布式Cache中提出的,设计目标是为了解决因特网中的热点(Hot spot)问题,初衷和CARP十分类似.一致性哈希修正了CAR ...

随机推荐

批量索引以提高索引速度 -d --data-binary
index create update 第1.2行分别为:信息行.数据行,在索引中增加或更换文档delete 移除文档,只包含信息行 Bulk API | Elasticsearch Referenc ...
PAT 1067. 试密码(20)
当你试图登录某个系统却忘了密码时,系统一般只会允许你尝试有限多次,当超出允许次数时,账号就会被锁死.本题就请你实现这个小功能. 输入格式: 输入在第一行给出一个密码(长度不超过20的.不包含空格.Ta ...
solr-6.4.2安装+分词器配置
一.solr安装 solr下载地址:http://archive.apache.org/dist/lucene/solr/6.4.2/ 1.解压solr软件包:tar xf solr-6.4.2.tg ...
node+npm安裝配置
控制臺輸入node 根據提示安裝 sudo apt-get install -g npm配置淘寶源 npm config set registry https://registry.npm.tao ...
loadrunder之脚本篇——脚本基础知识和常用操作
1)编码工具设置自动补全输入Tools->General Options->Environment->Auto complete word 显示功能语法Tools->Genr ...
iOS NSSet 学习 “无序数组” & 去重案例
“NSSet,NSMutableSet,和NSCountedSet类声明编程接口对象的无序集合(散列存储:在内存中的存储位置不连续). 而NSArray,NSDictionary类声明编程接口对象的有 ...
登陆weblogic后页面控制台卡主
输入http://localhost:7001/console进入控制页面,能登陆进去,但是登陆进去后页面就马上卡死,可以看到页面头部,其余都显示不出来. 重启后启动访问,能够正常进入,关闭weblo ...
iOS 学习之 UITabBarController
- (IBAction)btnClick:(id)sender { UITabBarController *tabBarCtrl = [[[UITabBarController alloc] init ...
爬虫实例之使用requests和Beautifusoup爬取糗百热门用户信息
这次主要用requests库和Beautifusoup库来实现对糗百的热门帖子的用户信息的收集,由于糗百的反爬虫不是很严格,也不需要先登录才能获取数据,所以较简单. 思路,先请求首页的热门帖子获得用户 ...
[Android]开源中国源码分析之一---启动界面
开源中国android端版本号:2.4 启动界面: 在AndroidManifest.xml中找到程序的入口, <activity android:name=".AppStart&qu ...

jedis中的一致性hash算法

jedis中的一致性hash算法的更多相关文章

随机推荐

热门专题