Redis(八):zset/zadd/zrange/zrembyscore 命令源码解析
前面几篇文章,我们完全领略了redis的string,hash,list,set数据类型的实现方法,相信对redis已经不再神秘。
本篇我们将介绍redis的最后一种数据类型: zset 的相关实现。
本篇过后,我们对redis的各种基础功能,应该不会再有疑惑。有可能的话,我们后续将会对redis的高级功能的实现做解析。(如复制、哨兵模式、集群模式)
回归本篇主题,zset。zset 又称有序集合(sorted set),即是序版本的set。经过上篇的介绍,大家可以看到,redis的读取功能相当有限,许多是基于随机数的方式进行读取,其原因就是set是无序的。当set有序之后,查询能力就会得到极大的提升。1. 可以根据下标进行定位元素; 2. 可以范围查询元素; 这是有序带来的好处。
那么,我们不妨先思考一下,如何实现有序?两种方法:1. 根据添加顺序定义,1、2、3... ; 2. 自定义排序值; 第1种方法实现简单,添加时复杂度小,但是功能受限;第2种方法相对自由,对于每次插入都可能涉及重排序问题,但是查询相对稳定,可以不必完全受限于系统实现;
同样,我们以功能列表,到数据结构,再功能实现的思路,来解析redis的zset有序集合的实现方式吧。
零、redis zset相关操作方法
zset: Redis 有序集合是string类型元素的集合,且不允许重复的成员。每个元素都会关联一个double类型的分数,通过分数来为集合中的成员进行从小到大的排序。
使用场景如: 保存任务队列,该队列由后台定时扫描; 排行榜;
从官方手册上查到相关使用方法如下:
1> ZADD key score1 member1 [score2 member2]
功能: 向有序集合添加一个或多个成员,或者更新已存在成员的分数
返回值: 添加成功的元素个数(已存在的添加不成功)2> ZCARD key
功能: 获取有序集合的成员数
返回值: 元素个数或03> ZCOUNT key min max
功能: 计算在有序集合中指定区间分数的成员数
返回值: 区间内的元素个数4> ZINCRBY key increment member
功能: 有序集合中对指定成员的分数加上增量 increment
返回值: member增加后的分数5> ZINTERSTORE destination numkeys key [key ...]
功能: 计算给定的一个或多个有序集的交集并将结果集存储在新的有序集合 key 中
返回值: 交集元素个数6> ZLEXCOUNT key min max
功能: 在有序集合中计算指定字典区间内成员数量
返回值: 区间内的元素个数7> ZRANGE key start stop [WITHSCORES]
功能: 通过索引区间返回有序集合指定区间内的成员
返回值: 区间内元素列表8> ZRANGEBYLEX key min max [LIMIT offset count]
功能: 通过字典区间返回有序集合的成员
返回值: 区间内元素列表9> ZRANGEBYSCORE key min max [WITHSCORES] [LIMIT]
功能: 通过分数返回有序集合指定区间内的成员
返回值: 区间内元素列表10> ZRANK key member
功能: 返回有序集合中指定成员的索引
返回值: member的排名或者 nil11> ZREM key member [member ...]
功能: 移除有序集合中的一个或多个成员
返回值: 成功移除的元素个数12> ZREMRANGEBYLEX key min max
功能: 移除有序集合中给定的字典区间的所有成员
返回值: 成功移除的元素个数13> ZREMRANGEBYRANK key start stop
功能: 移除有序集合中给定的排名区间的所有成员
返回值: 成功移除的元素个数14> ZREMRANGEBYSCORE key min max
功能: 移除有序集合中给定的分数区间的所有成员
返回值: 成功移除的元素个数15> ZREVRANGE key start stop [WITHSCORES]
功能: 返回有序集中指定区间内的成员,通过索引,分数从高到低
返回值: 区间内元素列表及分数16> ZREVRANGEBYSCORE key max min [WITHSCORES]
功能: 返回有序集中指定分数区间内的成员,分数从高到低排序
返回值: 区间内元素列表及分数17> ZREVRANK key member
功能: 返回有序集合中指定成员的排名,有序集成员按分数值递减(从大到小)排序
返回值: member排名或者 nil18> ZSCORE key member
功能: 返回有序集中,成员的分数值
返回值: member分数19> ZUNIONSTORE destination numkeys key [key ...]
功能: 计算给定的一个或多个有序集的并集,并存储在新的 key 中
返回值: 存储到新key的元素个数20> ZSCAN key cursor [MATCH pattern] [COUNT count]
功能: 迭代有序集合中的元素(包括元素成员和元素分值)
返回值: 元素列表21> ZPOPMAX/ZPOPMIN/BZPOPMAX/BZPOPMIN
一、zset 相关数据结构
zset 的实现,使用了 ziplist, zskiplist 和 dict 进行实现。
/* ZSETs use a specialized version of Skiplists */
typedef struct zskiplistNode {
sds ele;
double score;
struct zskiplistNode *backward;
struct zskiplistLevel {
struct zskiplistNode *forward;
unsigned int span;
} level[];
} zskiplistNode;
// 跳跃链表
typedef struct zskiplist {
struct zskiplistNode *header, *tail;
unsigned long length;
int level;
} zskiplist;
// zset 主数据结构,dict + zskiplist
typedef struct zset {
dict *dict;
zskiplist *zsl;
} zset;
// zset 在合适场景下,将先使用 ziplist 存储数据
typedef struct zlentry {
unsigned int prevrawlensize, prevrawlen;
unsigned int lensize, len;
unsigned int headersize;
unsigned char encoding;
unsigned char *p;
} zlentry;
二、zadd 添加成员操作
从添加实现中,我们可以完整领略数据结构的运用。
// 用法: ZADD key score1 member1 [score2 member2]
// t_zset.c
void zaddCommand(client *c) {
// zadd 的多个参数变形, 使用 flags 进行区分复用
zaddGenericCommand(c,ZADD_NONE);
}
void zaddGenericCommand(client *c, int flags) {
static char *nanerr = "resulting score is not a number (NaN)";
robj *key = c->argv[];
robj *zobj;
sds ele;
double score = , *scores = NULL, curscore = 0.0;
int j, elements;
int scoreidx = ;
/* The following vars are used in order to track what the command actually
* did during the execution, to reply to the client and to trigger the
* notification of keyspace change. */
int added = ; /* Number of new elements added. */
int updated = ; /* Number of elements with updated score. */
int processed = ; /* Number of elements processed, may remain zero with
options like XX. */ /* Parse options. At the end 'scoreidx' is set to the argument position
* of the score of the first score-element pair. */
// 从第三位置开始尝试解析特殊标识(用法规范)
// 按位与到 flags 中
scoreidx = ;
while(scoreidx < c->argc) {
char *opt = c->argv[scoreidx]->ptr;
// NX: 不更新已存在的元素,只做添加操作
if (!strcasecmp(opt,"nx")) flags |= ZADD_NX;
// XX: 只做更新操作,不做添加操作
else if (!strcasecmp(opt,"xx")) flags |= ZADD_XX;
// CH: 将返回值从添加的新元素数修改为已更改元素的总数。 更改的元素是第添加的新元素以及已为其更新分数的现有元素。 因此,命令行中指定的具有与过去相同分数的元素将不计算在内。 注意:通常,ZADD的返回值仅计算添加的新元素的数量。
else if (!strcasecmp(opt,"ch")) flags |= ZADD_CH;
// INCR: 使用指定元素增加指定分数, 与 ZINCRBY 类似,此场景下,只允许操作一个元素
else if (!strcasecmp(opt,"incr")) flags |= ZADD_INCR;
else break;
scoreidx++;
} /* Turn options into simple to check vars. */
int incr = (flags & ZADD_INCR) != ;
int nx = (flags & ZADD_NX) != ;
int xx = (flags & ZADD_XX) != ;
int ch = (flags & ZADD_CH) != ; /* After the options, we expect to have an even number of args, since
* we expect any number of score-element pairs. */
// 把特殊标识去除后,剩下的参数列表应该2n数,即 score-element 一一配对的,否则语法错误
elements = c->argc-scoreidx;
if (elements % ) {
addReply(c,shared.syntaxerr);
return;
}
elements /= ; /* Now this holds the number of score-element pairs. */ /* Check for incompatible options. */
// 互斥项
if (nx && xx) {
addReplyError(c,
"XX and NX options at the same time are not compatible");
return;
}
// 语法检查,INCR 只能针对1个元素操作
if (incr && elements > ) {
addReplyError(c,
"INCR option supports a single increment-element pair");
return;
} /* Start parsing all the scores, we need to emit any syntax error
* before executing additions to the sorted set, as the command should
* either execute fully or nothing at all. */
// 解析所有的 score 值为double类型,赋值到 scores 中
scores = zmalloc(sizeof(double)*elements);
for (j = ; j < elements; j++) {
if (getDoubleFromObjectOrReply(c,c->argv[scoreidx+j*],&scores[j],NULL)
!= C_OK) goto cleanup;
} /* Lookup the key and create the sorted set if does not exist. */
// 语法检查
zobj = lookupKeyWrite(c->db,key);
if (zobj == NULL) {
if (xx) goto reply_to_client; /* No key + XX option: nothing to do. */
// 创建原始key对象
// 默认 zset_max_ziplist_entries=OBJ_ZSET_MAX_ZIPLIST_ENTRIES: 128
// 默认 zset_max_ziplist_value=OBJ_ZSET_MAX_ZIPLIST_VALUE: 64
// 所以此处默认主要是检查 第1个member的长度是大于 64
if (server.zset_max_ziplist_entries == ||
server.zset_max_ziplist_value < sdslen(c->argv[scoreidx+]->ptr))
{
// 2. 通用情况使用 dict+quicklist 型的zset
zobj = createZsetObject();
} else {
// 1. 元素比较小的情况下创建 ziplist 型的 zset
zobj = createZsetZiplistObject();
}
// 将对象添加到db中,后续所有操作针对 zobj 操作即是对db的操作 (引用传递)
dbAdd(c->db,key,zobj);
} else {
if (zobj->type != OBJ_ZSET) {
addReply(c,shared.wrongtypeerr);
goto cleanup;
}
}
// 一个个元素循环添加
for (j = ; j < elements; j++) {
score = scores[j]; ele = c->argv[scoreidx++j*]->ptr;
// 分当前zobj的编码不同进行添加 (ziplist, skiplist)
// 3. ZIPLIST 编码下的zset添加操作
if (zobj->encoding == OBJ_ENCODING_ZIPLIST) {
unsigned char *eptr;
// 3.1. 查找是否存在要添加的元素 (确定添加或更新)
if ((eptr = zzlFind(zobj->ptr,ele,&curscore)) != NULL) {
if (nx) continue;
if (incr) {
score += curscore;
if (isnan(score)) {
addReplyError(c,nanerr);
goto cleanup;
}
} /* Remove and re-insert when score changed. */
if (score != curscore) {
// 3.2. 元素更新操作,先删再插入
zobj->ptr = zzlDelete(zobj->ptr,eptr);
zobj->ptr = zzlInsert(zobj->ptr,ele,score);
server.dirty++;
updated++;
}
processed++;
} else if (!xx) {
/* Optimize: check if the element is too large or the list
* becomes too long *before* executing zzlInsert. */
zobj->ptr = zzlInsert(zobj->ptr,ele,score);
// 5. 超过一条件后,做 ziplist->skiplist 转换
// 默认 元素个数>128, 当前元素>64
// 这两个判断不会重复吗?? 两个原因: 1. 转换函数内部会重新判定; 2. 下一次循环时不会再走当前逻辑;
if (zzlLength(zobj->ptr) > server.zset_max_ziplist_entries)
zsetConvert(zobj,OBJ_ENCODING_SKIPLIST);
if (sdslen(ele) > server.zset_max_ziplist_value)
zsetConvert(zobj,OBJ_ENCODING_SKIPLIST);
server.dirty++;
added++;
processed++;
}
}
// 4. skiplist 下的zset元素添加
else if (zobj->encoding == OBJ_ENCODING_SKIPLIST) {
zset *zs = zobj->ptr;
zskiplistNode *znode;
dictEntry *de;
// 判断ele是否已存在,使用hash查找,快速
de = dictFind(zs->dict,ele);
if (de != NULL) {
if (nx) continue;
curscore = *(double*)dictGetVal(de); if (incr) {
score += curscore;
if (isnan(score)) {
addReplyError(c,nanerr);
/* Don't need to check if the sorted set is empty
* because we know it has at least one element. */
goto cleanup;
}
} /* Remove and re-insert when score changes. */
// 先删再插入 skiplist
if (score != curscore) {
zskiplistNode *node;
serverAssert(zslDelete(zs->zsl,curscore,ele,&node));
znode = zslInsert(zs->zsl,score,node->ele);
/* We reused the node->ele SDS string, free the node now
* since zslInsert created a new one. */
node->ele = NULL;
zslFreeNode(node);
/* Note that we did not removed the original element from
* the hash table representing the sorted set, so we just
* update the score. */
// 更新dict中的分数引用
dictGetVal(de) = &znode->score; /* Update score ptr. */
server.dirty++;
updated++;
}
processed++;
} else if (!xx) {
ele = sdsdup(ele);
znode = zslInsert(zs->zsl,score,ele);
// 添加skiplist的同时,也往 dict 中添加一份数据,因为hash的查找永远是最快的
serverAssert(dictAdd(zs->dict,ele,&znode->score) == DICT_OK);
server.dirty++;
added++;
processed++;
}
} else {
serverPanic("Unknown sorted set encoding");
}
} reply_to_client:
if (incr) { /* ZINCRBY or INCR option. */
if (processed)
addReplyDouble(c,score);
else
addReply(c,shared.nullbulk);
} else { /* ZADD. */
addReplyLongLong(c,ch ? added+updated : added);
} cleanup:
zfree(scores);
if (added || updated) {
signalModifiedKey(c->db,key);
notifyKeyspaceEvent(NOTIFY_ZSET,
incr ? "zincr" : "zadd", key, c->db->id);
}
} // 1. 元素比较小的情况下创建 ziplist 型的 zset
// object.c, 创建ziplist 的zset
robj *createZsetZiplistObject(void) {
unsigned char *zl = ziplistNew();
robj *o = createObject(OBJ_ZSET,zl);
o->encoding = OBJ_ENCODING_ZIPLIST;
return o;
}
// 2. 创建通用的 zset 实例
// object.c
robj *createZsetObject(void) {
zset *zs = zmalloc(sizeof(*zs));
robj *o;
// zsetDictType 稍有不同
zs->dict = dictCreate(&zsetDictType,NULL);
// 首次遇到 skiplist, 咱去瞅瞅是如何创建的
zs->zsl = zslCreate();
o = createObject(OBJ_ZSET,zs);
o->encoding = OBJ_ENCODING_SKIPLIST;
return o;
}
// server.c, zset创建时使用的dict类型,与hash有不同
/* Sorted sets hash (note: a skiplist is used in addition to the hash table) */
dictType zsetDictType = {
dictSdsHash, /* hash function */
NULL, /* key dup */
NULL, /* val dup */
dictSdsKeyCompare, /* key compare */
NULL, /* Note: SDS string shared & freed by skiplist */
NULL /* val destructor */
};
// 创建 skiplist 对象
/* Create a new skiplist. */
zskiplist *zslCreate(void) {
int j;
zskiplist *zsl; zsl = zmalloc(sizeof(*zsl));
zsl->level = ;
zsl->length = ;
// 创建header节点,ZSKIPLIST_MAXLEVEL 32
zsl->header = zslCreateNode(ZSKIPLIST_MAXLEVEL,,NULL);
// 初始化header
for (j = ; j < ZSKIPLIST_MAXLEVEL; j++) {
zsl->header->level[j].forward = NULL;
zsl->header->level[j].span = ;
}
zsl->header->backward = NULL;
zsl->tail = NULL;
return zsl;
}
/* Create a skiplist node with the specified number of levels.
* The SDS string 'ele' is referenced by the node after the call. */
zskiplistNode *zslCreateNode(int level, double score, sds ele) {
zskiplistNode *zn =
zmalloc(sizeof(*zn)+level*sizeof(struct zskiplistLevel));
zn->score = score;
zn->ele = ele;
return zn;
} // 3. ZIPLIST 编码下的zset添加操作
// 3.1. 查找是否存在要添加的元素 (确定添加或更新)
// t_zset.c, 查找指定ele
unsigned char *zzlFind(unsigned char *zl, sds ele, double *score) {
unsigned char *eptr = ziplistIndex(zl,), *sptr;
// 遍历所有ziplist
// 可见,此时的ziplist并没有表现出有序啊
while (eptr != NULL) {
// eptr 相当于是 key
// sptr 相当于score
sptr = ziplistNext(zl,eptr);
serverAssert(sptr != NULL); if (ziplistCompare(eptr,(unsigned char*)ele,sdslen(ele))) {
/* Matching element, pull out score. */
// 找到相应的 key 后,解析下一值,即 score
if (score != NULL) *score = zzlGetScore(sptr);
return eptr;
}
/* Move to next element. */
// 移动两次对象,才会到下一元素(因为存储是 key-score 相邻存储)
eptr = ziplistNext(zl,sptr);
}
return NULL;
}
// t_zset.c, 获取元素的score
double zzlGetScore(unsigned char *sptr) {
unsigned char *vstr;
unsigned int vlen;
long long vlong;
char buf[];
double score; serverAssert(sptr != NULL);
serverAssert(ziplistGet(sptr,&vstr,&vlen,&vlong));
// 带小数点不带小数点
if (vstr) {
memcpy(buf,vstr,vlen);
buf[vlen] = '\0';
// 做类型转换
score = strtod(buf,NULL);
} else {
score = vlong;
} return score;
} // 3.2. 元素更新操作,先删再插入
// t_zset.c
/* Delete (element,score) pair from ziplist. Use local copy of eptr because we
* don't want to modify the one given as argument. */
unsigned char *zzlDelete(unsigned char *zl, unsigned char *eptr) {
unsigned char *p = eptr; /* TODO: add function to ziplist API to delete N elements from offset. */
zl = ziplistDelete(zl,&p);
zl = ziplistDelete(zl,&p);
return zl;
}
// 添加 ele-score 到 ziplist 中
/* Insert (element,score) pair in ziplist. This function assumes the element is
* not yet present in the list. */
unsigned char *zzlInsert(unsigned char *zl, sds ele, double score) {
unsigned char *eptr = ziplistIndex(zl,), *sptr;
double s;
// 在上面查找时,我们看到ziplist也是遍历,以为是无序的ziplist
// 然而实际上,插入时是维护了顺序的哟
while (eptr != NULL) {
sptr = ziplistNext(zl,eptr);
serverAssert(sptr != NULL);
s = zzlGetScore(sptr);
// 找到第一个比score大的位置,在其前面插入 ele-score
if (s > score) {
/* First element with score larger than score for element to be
* inserted. This means we should take its spot in the list to
* maintain ordering. */
zl = zzlInsertAt(zl,eptr,ele,score);
break;
} else if (s == score) {
/* Ensure lexicographical ordering for elements. */
// 当分数相同时,按字典顺序排列
if (zzlCompareElements(eptr,(unsigned char*)ele,sdslen(ele)) > ) {
zl = zzlInsertAt(zl,eptr,ele,score);
break;
}
} /* Move to next element. */
eptr = ziplistNext(zl,sptr);
} /* Push on tail of list when it was not yet inserted. */
// 以上遍历完成都没有找到相应位置,说明当前score是最大值,将其插入尾部
if (eptr == NULL)
zl = zzlInsertAt(zl,NULL,ele,score);
return zl;
}
// 在eptr的前面插入 ele-score
unsigned char *zzlInsertAt(unsigned char *zl, unsigned char *eptr, sds ele, double score) {
unsigned char *sptr;
char scorebuf[];
int scorelen;
size_t offset; scorelen = d2string(scorebuf,sizeof(scorebuf),score);
if (eptr == NULL) {
// 直接插入到尾部
zl = ziplistPush(zl,(unsigned char*)ele,sdslen(ele),ZIPLIST_TAIL);
zl = ziplistPush(zl,(unsigned char*)scorebuf,scorelen,ZIPLIST_TAIL);
} else {
/* Keep offset relative to zl, as it might be re-allocated. */
offset = eptr-zl;
// 直接在 eptr 位置添加 ele, 其他元素后移
zl = ziplistInsert(zl,eptr,(unsigned char*)ele,sdslen(ele));
eptr = zl+offset; /* Insert score after the element. */
// 此时的 eptr 已经插入ele之后的位置,后移一位后,就可以找到 score 的存储位置
serverAssert((sptr = ziplistNext(zl,eptr)) != NULL);
zl = ziplistInsert(zl,sptr,(unsigned char*)scorebuf,scorelen);
}
return zl;
} // 4. skiplist 下的zset元素添加
// 4.1. 添加元素
// t_zset.c, 添加 ele-score 到 skiplist 中
/* Insert a new node in the skiplist. Assumes the element does not already
* exist (up to the caller to enforce that). The skiplist takes ownership
* of the passed SDS string 'ele'. */
zskiplistNode *zslInsert(zskiplist *zsl, double score, sds ele) {
// ZSKIPLIST_MAXLEVEL 32
zskiplistNode *update[ZSKIPLIST_MAXLEVEL], *x;
unsigned int rank[ZSKIPLIST_MAXLEVEL];
int i, level; serverAssert(!isnan(score));
x = zsl->header;
// 初始 zsl->level = 1
// 从header的最高层开始遍历
for (i = zsl->level-; i >= ; i--) {
/* store rank that is crossed to reach the insert position */
// 计算出每层可以插入的位置
rank[i] = i == (zsl->level-) ? : rank[i+];
// 当前level的score小于需要添加的元素时,往前推进skiplist
while (x->level[i].forward &&
(x->level[i].forward->score < score ||
(x->level[i].forward->score == score &&
sdscmp(x->level[i].forward->ele,ele) < )))
{
rank[i] += x->level[i].span;
x = x->level[i].forward;
}
update[i] = x;
}
/* we assume the element is not already inside, since we allow duplicated
* scores, reinserting the same element should never happen since the
* caller of zslInsert() should test in the hash table if the element is
* already inside or not. */
// 得到一随机的level, 决定要写的节点数
// 如果当前的level过小,则变更level, 重新初始化大的level
level = zslRandomLevel();
if (level > zsl->level) {
for (i = zsl->level; i < level; i++) {
rank[i] = ;
update[i] = zsl->header;
update[i]->level[i].span = zsl->length;
}
zsl->level = level;
}
// 构建新的 skiplist 节点,为每一层节点添加同样的数据
x = zslCreateNode(level,score,ele);
for (i = ; i < level; i++) {
// 让i层的节点与x关联
x->level[i].forward = update[i]->level[i].forward;
update[i]->level[i].forward = x; /* update span covered by update[i] as x is inserted here */
x->level[i].span = update[i]->level[i].span - (rank[] - rank[i]);
update[i]->level[i].span = (rank[] - rank[i]) + ;
} /* increment span for untouched levels */
// 如果当前level较小,则存在有的level未赋值情况,需要主动+1
for (i = level; i < zsl->level; i++) {
update[i]->level[i].span++;
}
// 关联好header后,设置backward指针
x->backward = (update[] == zsl->header) ? NULL : update[];
if (x->level[].forward)
x->level[].forward->backward = x;
else
// 同有后继节点,说明是尾节点,赋值tail
zsl->tail = x;
zsl->length++;
return x;
}
ziplist添加没啥好说的,skiplist可以稍微提提,大体步骤为四步:
1. 找位置, 从最高层开始, 判断是否后继节点小,如果小则直接在本层迭代,否则转到下一层迭代; (每一层都要迭代至相应的位置)
2. 计算得到一新的随机level,用于决定当前节点的层级;
3. 依次对每一层与原跳表做关联;
4. 设置backward指针;(双向链表)
相对说,skiplist 还是有点抽象,我们画个图来描述下上面的操作:
// 补充,我们看一下随机level的计算算法
// t_zset.c
/* Returns a random level for the new skiplist node we are going to create.
* The return value of this function is between 1 and ZSKIPLIST_MAXLEVEL
* (both inclusive), with a powerlaw-alike distribution where higher
* levels are less likely to be returned. */
int zslRandomLevel(void) {
int level = ;
// n次随机值得到 level, ZSKIPLIST_P:0.25
// 按随机概率,应该是有1/4的命中概率(如果不是呢??)
while ((random()&0xFFFF) < (ZSKIPLIST_P * 0xFFFF))
level += ;
return (level<ZSKIPLIST_MAXLEVEL) ? level : ZSKIPLIST_MAXLEVEL;
}
先看插入过程的目的,主要是为了先理解 skiplist 的构造过程。而在zset的更新过程,是先删除原节点,再进行插入的这么个过程。所以咱们还是有必要再来看看 skiplist 的删除节点过程。
// t_zset.c, 删除skiplist的指定节点
/* Delete an element with matching score/element from the skiplist.
* The function returns 1 if the node was found and deleted, otherwise
* 0 is returned.
*
* If 'node' is NULL the deleted node is freed by zslFreeNode(), otherwise
* it is not freed (but just unlinked) and *node is set to the node pointer,
* so that it is possible for the caller to reuse the node (including the
* referenced SDS string at node->ele). */
int zslDelete(zskiplist *zsl, double score, sds ele, zskiplistNode **node) {
zskiplistNode *update[ZSKIPLIST_MAXLEVEL], *x;
int i; x = zsl->header;
// 与添加时查找对应位置一样,先进行遍历,找到最每个层级最接近 node 的位置
for (i = zsl->level-; i >= ; i--) {
while (x->level[i].forward &&
(x->level[i].forward->score < score ||
(x->level[i].forward->score == score &&
sdscmp(x->level[i].forward->ele,ele) < )))
{
x = x->level[i].forward;
}
update[i] = x;
}
/* We may have multiple elements with the same score, what we need
* is to find the element with both the right score and object. */
// 进行精确比对,相同才进行删除
x = x->level[].forward;
if (x && score == x->score && sdscmp(x->ele,ele) == ) {
// 执行删除动作
zslDeleteNode(zsl, x, update);
if (!node)
zslFreeNode(x);
else
*node = x;
return ;
}
return ; /* not found */
}
// 删除 x对应的节点
// update 是node的每一层级对应的前驱节点
/* Internal function used by zslDelete, zslDeleteByScore and zslDeleteByRank */
void zslDeleteNode(zskiplist *zsl, zskiplistNode *x, zskiplistNode **update) {
int i;
for (i = ; i < zsl->level; i++) {
if (update[i]->level[i].forward == x) {
update[i]->level[i].span += x->level[i].span - ;
update[i]->level[i].forward = x->level[i].forward;
} else {
// 不相等说明该层不存在指向 x 的引用
update[i]->level[i].span -= ;
}
}
// 更新第0层尾节点指针
if (x->level[].forward) {
x->level[].forward->backward = x->backward;
} else {
zsl->tail = x->backward;
}
// 降低 skiplist 的层级,直到第一个非空的节点为止
while(zsl->level > && zsl->header->level[zsl->level-].forward == NULL)
zsl->level--;
zsl->length--;
}
skiplist 删除过程的示意图如下:
最后,我们再来看另一种情况,即zset发生编码转换时,是如何做的。即如何从 ziplist 转换到 skiplist 中呢?
// t_zset.c, 编码类型转换
void zsetConvert(robj *zobj, int encoding) {
zset *zs;
zskiplistNode *node, *next;
sds ele;
double score;
// 编码相同,直接返回
if (zobj->encoding == encoding) return;
// ziplist -> skiplist 转换
if (zobj->encoding == OBJ_ENCODING_ZIPLIST) {
unsigned char *zl = zobj->ptr;
unsigned char *eptr, *sptr;
unsigned char *vstr;
unsigned int vlen;
long long vlong; if (encoding != OBJ_ENCODING_SKIPLIST)
serverPanic("Unknown target encoding"); zs = zmalloc(sizeof(*zs));
zs->dict = dictCreate(&zsetDictType,NULL);
zs->zsl = zslCreate(); eptr = ziplistIndex(zl,);
serverAssertWithInfo(NULL,zobj,eptr != NULL);
sptr = ziplistNext(zl,eptr);
serverAssertWithInfo(NULL,zobj,sptr != NULL); while (eptr != NULL) {
score = zzlGetScore(sptr);
serverAssertWithInfo(NULL,zobj,ziplistGet(eptr,&vstr,&vlen,&vlong));
if (vstr == NULL)
ele = sdsfromlonglong(vlong);
else
ele = sdsnewlen((char*)vstr,vlen);
// 依次插入 skiplist 和 dict 中即可
node = zslInsert(zs->zsl,score,ele);
serverAssert(dictAdd(zs->dict,ele,&node->score) == DICT_OK);
// zzlNext 封装了同时迭代 eptr 和 sptr 方法
zzlNext(zl,&eptr,&sptr);
} zfree(zobj->ptr);
zobj->ptr = zs;
zobj->encoding = OBJ_ENCODING_SKIPLIST;
}
// skiplist -> ziplist 逆向转换
else if (zobj->encoding == OBJ_ENCODING_SKIPLIST) {
unsigned char *zl = ziplistNew();
if (encoding != OBJ_ENCODING_ZIPLIST)
serverPanic("Unknown target encoding"); /* Approach similar to zslFree(), since we want to free the skiplist at
* the same time as creating the ziplist. */
zs = zobj->ptr;
dictRelease(zs->dict);
node = zs->zsl->header->level[].forward;
zfree(zs->zsl->header);
zfree(zs->zsl);
// 正向迭代转换
while (node) {
zl = zzlInsertAt(zl,NULL,node->ele,node->score);
next = node->level[].forward;
zslFreeNode(node);
node = next;
} zfree(zs);
zobj->ptr = zl;
zobj->encoding = OBJ_ENCODING_ZIPLIST;
} else {
serverPanic("Unknown sorted set encoding");
}
}
// 基于ziplist, 同时迭代 ele-score
/* Move to next entry based on the values in eptr and sptr. Both are set to
* NULL when there is no next entry. */
void zzlNext(unsigned char *zl, unsigned char **eptr, unsigned char **sptr) {
unsigned char *_eptr, *_sptr;
serverAssert(*eptr != NULL && *sptr != NULL); _eptr = ziplistNext(zl,*sptr);
if (_eptr != NULL) {
_sptr = ziplistNext(zl,_eptr);
serverAssert(_sptr != NULL);
} else {
/* No next entry. */
_sptr = NULL;
} *eptr = _eptr;
*sptr = _sptr;
}
至此,整个添加过程结束。本身是不太复杂的,主要针对 ziplist 和 skiplist 的分别处理(注意有逆向编码)。但为了讲清整体关系,稍显杂乱。
三、zrange 范围查询
范围查询功能,redis提供了好几个,zrange/zrangebyscore/zrangebylex... 应该说查询方式都不太一样,不过我们也不必纠结这些,只管理会大概就行。就挑一个以 下标进行范围查询的实现讲解下就行。
// 用法: ZRANGE key start stop [WITHSCORES]
// t_zset.c
void zrangeCommand(client *c) {
zrangeGenericCommand(c,);
} void zrangeGenericCommand(client *c, int reverse) {
robj *key = c->argv[];
robj *zobj;
int withscores = ;
long start;
long end;
int llen;
int rangelen; if ((getLongFromObjectOrReply(c, c->argv[], &start, NULL) != C_OK) ||
(getLongFromObjectOrReply(c, c->argv[], &end, NULL) != C_OK)) return; if (c->argc == && !strcasecmp(c->argv[]->ptr,"withscores")) {
withscores = ;
} else if (c->argc >= ) {
addReply(c,shared.syntaxerr);
return;
} if ((zobj = lookupKeyReadOrReply(c,key,shared.emptymultibulk)) == NULL
|| checkType(c,zobj,OBJ_ZSET)) return; /* Sanitize indexes. */
// 小于0,则代表反向查询,但实际的输出顺序不是按此值运算的(提供了 reverse 方法)
llen = zsetLength(zobj);
if (start < ) start = llen+start;
if (end < ) end = llen+end;
if (start < ) start = ; /* Invariant: start >= 0, so this test will be true when end < 0.
* The range is empty when start > end or start >= length. */
if (start > end || start >= llen) {
addReply(c,shared.emptymultibulk);
return;
}
if (end >= llen) end = llen-;
rangelen = (end-start)+; /* Return the result in form of a multi-bulk reply */
addReplyMultiBulkLen(c, withscores ? (rangelen*) : rangelen);
// 同样,分 ZIPLIST 和 SKIPLIST 编码分别实现
if (zobj->encoding == OBJ_ENCODING_ZIPLIST) {
unsigned char *zl = zobj->ptr;
unsigned char *eptr, *sptr;
unsigned char *vstr;
unsigned int vlen;
long long vlong;
// ziplist 以 ele-score 方式存储,所以步长是 2
if (reverse)
eptr = ziplistIndex(zl,--(*start));
else
eptr = ziplistIndex(zl,*start); serverAssertWithInfo(c,zobj,eptr != NULL);
sptr = ziplistNext(zl,eptr);
// 依次迭代输出
while (rangelen--) {
serverAssertWithInfo(c,zobj,eptr != NULL && sptr != NULL);
serverAssertWithInfo(c,zobj,ziplistGet(eptr,&vstr,&vlen,&vlong));
if (vstr == NULL)
addReplyBulkLongLong(c,vlong);
else
addReplyBulkCBuffer(c,vstr,vlen); if (withscores)
addReplyDouble(c,zzlGetScore(sptr));
// ziplist 提供正向迭代,返回迭代功能,其实就是 offset的加减问题
if (reverse)
zzlPrev(zl,&eptr,&sptr);
else
zzlNext(zl,&eptr,&sptr);
} } else if (zobj->encoding == OBJ_ENCODING_SKIPLIST) {
zset *zs = zobj->ptr;
zskiplist *zsl = zs->zsl;
zskiplistNode *ln;
sds ele; /* Check if starting point is trivial, before doing log(N) lookup. */
// 反向使用 tail 迭代,否则使用header迭代
if (reverse) {
ln = zsl->tail;
if (start > )
// 获取下标元素应该只是一个迭代循环问题,不过还是稍微细看一下skiplist实现
ln = zslGetElementByRank(zsl,llen-start);
} else {
ln = zsl->header->level[].forward;
if (start > )
ln = zslGetElementByRank(zsl,start+);
} while(rangelen--) {
serverAssertWithInfo(c,zobj,ln != NULL);
ele = ln->ele;
addReplyBulkCBuffer(c,ele,sdslen(ele));
if (withscores)
addReplyDouble(c,ln->score);
// 直接正向或反向迭代即可
ln = reverse ? ln->backward : ln->level[].forward;
}
} else {
serverPanic("Unknown sorted set encoding");
}
}
// 根据排名查找元素
/* Finds an element by its rank. The rank argument needs to be 1-based. */
zskiplistNode* zslGetElementByRank(zskiplist *zsl, unsigned long rank) {
zskiplistNode *x;
unsigned long traversed = ;
int i; x = zsl->header;
// 好像没有相像中的简单哦
// 请仔细品
for (i = zsl->level-; i >= ; i--) {
while (x->level[i].forward && (traversed + x->level[i].span) <= rank)
{
// span 的作用??
traversed += x->level[i].span;
x = x->level[i].forward;
}
if (traversed == rank) {
return x;
}
}
return NULL;
}
根据范围查找元素,整体是比较简单,迭代输出而已。只是 skiplist 的span维护,得好好想想。
四、zrembyscore 根据分数删除元素
zrembyscore, 首先这是个删除命令,其实它是根据分数查询,我们可以同时解析这两种情况。
// t_zset.c,
void zremrangebyscoreCommand(client *c) {
// 几个范围删除,都复用 zremrangeGenericCommand
// ZRANGE_RANK/ZRANGE_SCORE/ZRANGE_LEX
zremrangeGenericCommand(c,ZRANGE_SCORE);
}
void zremrangeGenericCommand(client *c, int rangetype) {
robj *key = c->argv[];
robj *zobj;
int keyremoved = ;
unsigned long deleted = ;
// score 存储使用另外的数据结构
zrangespec range;
zlexrangespec lexrange;
long start, end, llen; /* Step 1: Parse the range. */
// 解析参数,除了 rank 方式的查询,其他两个都使用 另外的专门数据结构存储参数
if (rangetype == ZRANGE_RANK) {
if ((getLongFromObjectOrReply(c,c->argv[],&start,NULL) != C_OK) ||
(getLongFromObjectOrReply(c,c->argv[],&end,NULL) != C_OK))
return;
} else if (rangetype == ZRANGE_SCORE) {
if (zslParseRange(c->argv[],c->argv[],&range) != C_OK) {
addReplyError(c,"min or max is not a float");
return;
}
} else if (rangetype == ZRANGE_LEX) {
if (zslParseLexRange(c->argv[],c->argv[],&lexrange) != C_OK) {
addReplyError(c,"min or max not valid string range item");
return;
}
} /* Step 2: Lookup & range sanity checks if needed. */
if ((zobj = lookupKeyWriteOrReply(c,key,shared.czero)) == NULL ||
checkType(c,zobj,OBJ_ZSET)) goto cleanup; if (rangetype == ZRANGE_RANK) {
/* Sanitize indexes. */
llen = zsetLength(zobj);
if (start < ) start = llen+start;
if (end < ) end = llen+end;
if (start < ) start = ; /* Invariant: start >= 0, so this test will be true when end < 0.
* The range is empty when start > end or start >= length. */
if (start > end || start >= llen) {
addReply(c,shared.czero);
goto cleanup;
}
if (end >= llen) end = llen-;
} /* Step 3: Perform the range deletion operation. */
if (zobj->encoding == OBJ_ENCODING_ZIPLIST) {
// 针对不同的删除类型,使用不同的删除方法
// 所以,这段代码的复用体现在哪里呢???
switch(rangetype) {
case ZRANGE_RANK:
zobj->ptr = zzlDeleteRangeByRank(zobj->ptr,start+,end+,&deleted);
break;
case ZRANGE_SCORE:
// 3.1. 我们只看 score 的删除 --ziplist
zobj->ptr = zzlDeleteRangeByScore(zobj->ptr,&range,&deleted);
break;
case ZRANGE_LEX:
zobj->ptr = zzlDeleteRangeByLex(zobj->ptr,&lexrange,&deleted);
break;
}
if (zzlLength(zobj->ptr) == ) {
dbDelete(c->db,key);
keyremoved = ;
}
} else if (zobj->encoding == OBJ_ENCODING_SKIPLIST) {
zset *zs = zobj->ptr;
switch(rangetype) {
case ZRANGE_RANK:
deleted = zslDeleteRangeByRank(zs->zsl,start+,end+,zs->dict);
break;
case ZRANGE_SCORE:
// 3.2. skiplist 的删除rangeByScore 方法
deleted = zslDeleteRangeByScore(zs->zsl,&range,zs->dict);
break;
case ZRANGE_LEX:
deleted = zslDeleteRangeByLex(zs->zsl,&lexrange,zs->dict);
break;
}
if (htNeedsResize(zs->dict)) dictResize(zs->dict);
if (dictSize(zs->dict) == ) {
dbDelete(c->db,key);
keyremoved = ;
}
} else {
serverPanic("Unknown sorted set encoding");
} /* Step 4: Notifications and reply. */
if (deleted) {
char *event[] = {"zremrangebyrank","zremrangebyscore","zremrangebylex"};
signalModifiedKey(c->db,key);
notifyKeyspaceEvent(NOTIFY_ZSET,event[rangetype],key,c->db->id);
if (keyremoved)
notifyKeyspaceEvent(NOTIFY_GENERIC,"del",key,c->db->id);
}
server.dirty += deleted;
addReplyLongLong(c,deleted); cleanup:
if (rangetype == ZRANGE_LEX) zslFreeLexRange(&lexrange);
}
// server.h, 范围查询参数存储
/* Struct to hold a inclusive/exclusive range spec by score comparison. */
typedef struct {
double min, max;
int minex, maxex; /* are min or max exclusive? */
} zrangespec; // 3.1. ziplist 的删除range方法
// t_zset.c
unsigned char *zzlDeleteRangeByScore(unsigned char *zl, zrangespec *range, unsigned long *deleted) {
unsigned char *eptr, *sptr;
double score;
unsigned long num = ;
if (deleted != NULL) *deleted = ;
// 找到首个在范围内的指针,进行迭代
eptr = zzlFirstInRange(zl,range);
if (eptr == NULL) return zl; /* When the tail of the ziplist is deleted, eptr will point to the sentinel
* byte and ziplistNext will return NULL. */
while ((sptr = ziplistNext(zl,eptr)) != NULL) {
score = zzlGetScore(sptr);
// 肯定是比 min 大的,所以只需确认比 max 小即可
if (zslValueLteMax(score,range)) {
/* Delete both the element and the score. */
zl = ziplistDelete(zl,&eptr);
zl = ziplistDelete(zl,&eptr);
num++;
} else {
/* No longer in range. */
break;
}
} if (deleted != NULL) *deleted = num;
return zl;
} /* Find pointer to the first element contained in the specified range.
* Returns NULL when no element is contained in the range. */
unsigned char *zzlFirstInRange(unsigned char *zl, zrangespec *range) {
unsigned char *eptr = ziplistIndex(zl,), *sptr;
double score; /* If everything is out of range, return early. */
// 比较第1个元素和最后 一个元素,即可确认是否在范围内
if (!zzlIsInRange(zl,range)) return NULL; while (eptr != NULL) {
sptr = ziplistNext(zl,eptr);
serverAssert(sptr != NULL); score = zzlGetScore(sptr);
// score >= min
if (zslValueGteMin(score,range)) {
/* Check if score <= max. */
if (zslValueLteMax(score,range))
return eptr;
return NULL;
} /* Move to next element. */
eptr = ziplistNext(zl,sptr);
} return NULL;
}
// 检查zl是否在range范围内
// 检查第1个分数和最后一个数即可
/* Returns if there is a part of the zset is in range. Should only be used
* internally by zzlFirstInRange and zzlLastInRange. */
int zzlIsInRange(unsigned char *zl, zrangespec *range) {
unsigned char *p;
double score; /* Test for ranges that will always be empty. */
if (range->min > range->max ||
(range->min == range->max && (range->minex || range->maxex)))
return ; p = ziplistIndex(zl,-); /* Last score. */
if (p == NULL) return ; /* Empty sorted set */
score = zzlGetScore(p);
// scoreMax >= min
if (!zslValueGteMin(score,range))
return ; p = ziplistIndex(zl,); /* First score. */
serverAssert(p != NULL);
score = zzlGetScore(p);
// scoreMin <= max
if (!zslValueLteMax(score,range))
return ; return ;
} // 3.2. 删除 skiplist 中的range元素
/* Delete all the elements with score between min and max from the skiplist.
* Min and max are inclusive, so a score >= min || score <= max is deleted.
* Note that this function takes the reference to the hash table view of the
* sorted set, in order to remove the elements from the hash table too. */
unsigned long zslDeleteRangeByScore(zskiplist *zsl, zrangespec *range, dict *dict) {
zskiplistNode *update[ZSKIPLIST_MAXLEVEL], *x;
unsigned long removed = ;
int i; x = zsl->header;
// 找出每层小于 range->min 的元素
for (i = zsl->level-; i >= ; i--) {
while (x->level[i].forward && (range->minex ?
x->level[i].forward->score <= range->min :
x->level[i].forward->score < range->min))
x = x->level[i].forward;
update[i] = x;
} /* Current node is the last with score < or <= min. */
x = x->level[].forward;
// 从第0层开始,依次删除引用,删除元素
// 同有找到符合条件的元素时,一次循环也不会成立
/* Delete nodes while in range. */
while (x &&
(range->maxex ? x->score < range->max : x->score <= range->max))
{
// 保留下一次迭代
zskiplistNode *next = x->level[].forward;
zslDeleteNode(zsl,x,update);
// 同步删除 dict 数据
dictDelete(dict,x->ele);
zslFreeNode(x); /* Here is where x->ele is actually released. */
removed++;
x = next;
}
return removed;
}
删除的逻辑比较清晰,ziplist和skiplist分开处理。大体思路相同是:找到第一个符合条件的元素,然后迭代,直到第一个不符合条件的元素为止。
set虽然从定义上与zset有很多相通之处,然而在实现上却是截然不同的。由于很多东西和之前介绍的知识有重合的地方,也没啥好特别说的。zset 的解析差不多就到这里了。
你觉得zset还有什么有意思的实现呢?欢迎讨论。
Redis(八):zset/zadd/zrange/zrembyscore 命令源码解析的更多相关文章
- Redis(四):del/unlink 命令源码解析
上一篇文章从根本上理解了set/get的处理过程,相当于理解了 增.改.查的过程,现在就差一个删了.本篇我们来看一下删除过程. 对于客户端来说,删除操作无需区分何种数据类型,只管进行 del 操作即可 ...
- Redis(五):hash/hset/hget 命令源码解析
Redis作为nosql数据库,kv string型数据的支持是最基础的,但是如果仅有kv的操作,也不至于有redis的成功.(memcache就是个例子) Redis除了string, 还有hash ...
- Redis(七):set/sadd/sismember/sinter/sdiffstore 命令源码解析
上两篇我们讲了hash和list数据类型相关的主要实现方法,同时加上前面对框架服务和string相关的功能介绍,已揭开了大部分redis的实用面纱. 现在还剩下两种数据类型: set, zset. 本 ...
- Redis(六):list/lpush/lrange/lpop 命令源码解析
上一篇讲了hash数据类型的相关实现方法,没有茅塞顿开也至少知道redis如何搞事情的了吧. 本篇咱们继续来看redis中的数据类型的实现: list 相关操作实现. 同样,我们以使用者的角度,开始理 ...
- Redis系列(十):数据结构Set源码解析和SADD、SINTER、SDIFF、SUNION、SPOP命令
1.介绍 Hash是以K->V形式存储,而Set则是K存储,空间节省了很多 Redis中Set是String类型的无序集合:集合成员是唯一的. 这就意味着集合中不能出现重复的数据.可根据应用场景 ...
- Redis系列(九):数据结构Hash源码解析和HSET、HGET命令
2.源码解析 1.相关命令如下: {"hset",hsetCommand,,"wmF",,NULL,,,,,}, {"hsetnx",hse ...
- .Net Core缓存组件(Redis)源码解析
上一篇文章已经介绍了MemoryCache,MemoryCache存储的数据类型是Object,也说了Redis支持五中数据类型的存储,但是微软的Redis缓存组件只实现了Hash类型的存储.在分析源 ...
- Linux 查看命令源码
一.简介 有时候想看看ls.cat.more等命令的源代码,本文介绍相应查看方法. 二.方法 参考: http://blog.csdn.net/silentpebble/article/details ...
- memcached学习笔记——存储命令源码分析下篇
上一篇回顾:<memcached学习笔记——存储命令源码分析上篇>通过分析memcached的存储命令源码的过程,了解了memcached如何解析文本命令和mencached的内存管理机制 ...
随机推荐
- CSV 文件的存取
CSV 文件介绍 CSV(Comma-Separated Values),中文通常叫做逗号分割值.CSV文件由任意数目的记录(行)组成,每条记录由一些字段(列)组成,字段之间通常以逗号分割,当然也可以 ...
- mysql主从之双主配置
mysql双主配置 mysql双主其实就是互相同步,互为主从 任意一台都能够执行插入动作 生产环境用得非常少,因为还是担心数据一致的问题 生产环境一般来说主从已经够用 172.19.132.121的配 ...
- Linux 学习笔记 4 创建、复制、移动、文件的基本操作
写在前面 通过上一节的学习,我们基本的了解到在Linux 里面对于设备的挂载.卸载以及设备存在的目录.挂载目录.都有了一个基本的了解 本节主要了解文件.以及目录的相关操作,比如文件.目录的创建.以及删 ...
- (Go) 1. go环境配置
第一步: 下载配置环境 转载: https://www.liwenzhou.com/posts/Go/go_menu/ 1.下载地址: https://golang.google.cn/dl/ 2.安 ...
- 【题解】BZOJ5093图的价值(二项式+NTT)
[题解]BZOJ5093图的价值(二项式+NTT) 今天才做这道题,是我太弱了 强烈吐槽c++这种垃圾语言tmd数组越界不re反倒去别的数组里搞事情我只想说QAQ 推了一张A4纸的式子 考虑每个点的度 ...
- LOJ 北校门外的回忆 倍增+线段树
正解:倍增+线段树 解题报告: 传送门! $umm$这题有个对正解毫无启发的部分分还有个正解,都挺神仙的所以我都写了趴$QAQ$ 先说部分分 可以考虑把$x$向$x+lowbit(x)$连边,然后当$ ...
- 洛谷$P2523\ [HAOI2011]\ Problem\ c$ $dp$
正解:$dp$ 解题报告: 传送门$QwQ$ 首先港下不合法的情况.设$sum_i$表示$q\geq i$的人数,当且仅当$sum_i>n-i+1$时无解. 欧克然后考虑这题咋做$QwQ$. 一 ...
- Theia APIs——事件
上一篇:Theia APIs——Preferences 事件 Theia中的事件或许会让你感到困惑,希望本节能阐述清楚. 来看下面的代码: (来自logger-watcher.ts) @injecta ...
- Java 发展简史:初生遇低谷,崛起于互联网
Java 起源与诞生 20世纪90年代,单片式计算机系统诞生,单片式计算机系统不仅廉价,而且功能强大,使用它可以大幅度提升消费性电子产品的智能化程度. SUN公司为了抢占市场先机,在1991年成立了一 ...
- hadoop配置环境变量
hadoop安装包解压 tar -xvf hadoop-2.7.7.tar.gz 解压成功ll查看文件 配置环境变量 1. vi /home/wj/hadoop-2.7.7/etc/hadoop/h ...