随着redis的运行,aof会不断膨胀(对于一个key会有多条aof日志),导致通过aof恢复数据时,耗费大量不必要的时间。redis提供的解决方案是aof rewrite。根据db的内容,对于每个key,生成一条日志。aof触发的时机: 
1)用户调用BGREWRITEAOF命令 
2)aof日志大小超过预设的限额

1. AOF Rewrite触发时机

首先看一下,BGREWRITEAOF的处理函数:

void bgrewriteaofCommand(redisClient *c) {
if (server.aof_child_pid != -1) {
addReplyError(c,"Background append only file rewriting already in progress");
} else if (server.rdb_child_pid != -1) {
server.aof_rewrite_scheduled = 1;
addReplyStatus(c,"Background append only file rewriting scheduled");
} else if (rewriteAppendOnlyFileBackground() == REDIS_OK) {
addReplyStatus(c,"Background append only file rewriting started");
} else {
addReply(c,shared.err);
}
}

aof_child_pid指示进行aof rewrite进程的pid,rdb_child_pid指示进行rdb dump的进程pid。 
1)如果当前正在进行aof rewrite,则返回客户端错误。 
2)如果当前正在进行rdb dump,为了避免对磁盘造成压力,将aof_rewrite_scheduled置为1,随后在没有进行aof rewrite和rdb dump时,再开启rewrite。
3)如果当前没有aof rewrite和rdb dump在进行,则调用rewriteAppendOnlyFileBackground进行aof rewrite。 
4)异常情况,直接返回错误。 
下面,看一下serverCron中是如何触发aof rewrite的。第一个触发点是,避免与rdb dump冲突,延迟触发rewrite。

    /* Start a scheduled AOF rewrite if this was requested by the user while
* a BGSAVE was in progress. */
if (server.rdb_child_pid == -1 && server.aof_child_pid == -1 &&
server.aof_rewrite_scheduled)
{
rewriteAppendOnlyFileBackground();
}

需要确认当前没有aof rewrite和rdb dump在进行,并且设置了aof_rewrite_scheduled,调用rewirteAppendOnlyFileBackground进行aof rewrite。 
第二个触发位置是aof文件的大小超过预定的百分比。

         /* Trigger an AOF rewrite if needed */
if (server.rdb_child_pid == -1 &&
server.aof_child_pid == -1 &&
server.aof_rewrite_perc &&
server.aof_current_size > server.aof_rewrite_min_size)
{
long long base = server.aof_rewrite_base_size ?
server.aof_rewrite_base_size : 1;
long long growth = (server.aof_current_size*100/base) - 100;
if (growth >= server.aof_rewrite_perc) {
redisLog(REDIS_NOTICE,"Starting automatic rewriting of AOF on %lld%% growth",growth);
rewriteAppendOnlyFileBackground();
}
}

当aof文件超过了预定的最小值,并且超过了上一次aof文件的一定百分比,则会触发aof rewrite。

2. AOF Rewrite

rewrite的大致流程是:创建子进程,获取当前快照,同时将之后的命令记录到aof_rewrite_buf中,子进程遍历db生成aof 临时文件,然后退出;父进程wait子进程,待结束后,将aof_rewrite_buf中的数据追加到该aof文件中,最后重命名该临时文件为正式的aof文件。 
下面看具体代码,首先是rewriteAppendOnlyFileBackground。

    pid_t childpid;
long long start; // <MM>
// 避免同时多个进程进行rewrite
// </MM>
if (server.aof_child_pid != -1) return REDIS_ERR;

如果有其他aof rewrite进程正在进行,直接返回错误。

    start = ustime();
if ((childpid = fork()) == 0) {
char tmpfile[256]; /* Child */
// <MM>
// 子进程不能接受连接
// </MM>
closeListeningSockets(0);
redisSetProcTitle("redis-aof-rewrite"); // <MM>
// 生成临时aof文件名
// </MM>
snprintf(tmpfile,256,"temp-rewriteaof-bg-%d.aof", (int) getpid());
if (rewriteAppendOnlyFile(tmpfile) == REDIS_OK) {
size_t private_dirty = zmalloc_get_private_dirty(); if (private_dirty) {
redisLog(REDIS_NOTICE,
"AOF rewrite: %zu MB of memory used by copy-on-write",
private_dirty/(1024*1024));
}
exitFromChild(0);
} else {
exitFromChild(1);
}

去当前时间,用于统计fork耗时。然后调用fork,进入子进程的流程。子进程首先关闭监听socket,避免接收客户端连接。同时设置进程的title。然后,生成rewirte要写入的临时文件名。接下来调用rewriteAppendOnlyFile进行rewrite。如果rewrite成功,统计copy-on-write的脏页并记录日志,然后以退出码0退出进程。如果rewrite失败,则退出进程并返回1作为退出码。 
下面看一下父进程的流程:

    } else {
/* Parent */
server.stat_fork_time = ustime()-start;
server.stat_fork_rate = (double) zmalloc_used_memory() * 1000000 / server.stat_fork_time / (1024*1024*1024); /* GB per second. */
latencyAddSampleIfNeeded("fork",server.stat_fork_time/1000);
if (childpid == -1) {
redisLog(REDIS_WARNING,
"Can't rewrite append only file in background: fork: %s",
strerror(errno));
return REDIS_ERR;
}
redisLog(REDIS_NOTICE,
"Background append only file rewriting started by pid %d",childpid);
server.aof_rewrite_scheduled = 0;
server.aof_rewrite_time_start = time(NULL);
server.aof_child_pid = childpid;
updateDictResizePolicy();
/* We set appendseldb to -1 in order to force the next call to the
* feedAppendOnlyFile() to issue a SELECT command, so the differences
* accumulated by the parent into server.aof_rewrite_buf will start
* with a SELECT statement and it will be safe to merge. */
server.aof_selected_db = -1;
replicationScriptCacheFlush();
return REDIS_OK;
}

父进程首先统计fork耗时并采样。如果fork失败,记录日志并返回错误。如果fork成功,对aof_rewrite_scheduled清零,记录rewrite开始时间以及aof_child_pid(redis通过这个属性判断是否有aof rewrite在进行)。调用updateDictResizePolicy调整db的key space的rehash策略,由于创建了子进程,避免copy-on-write复制大量内存页,这里会禁止dict的rehash。 
将aof_selected_db置为-1,目的是,下一条aof会首先生成一条select db的日志,同时会写到aof_rewrite_buf中,这样就可以将aof_rewrite_buf正常的追加到rewrite之后的文件。replicationScriptCacheFlush暂时没看到这,之后再补。 
下面看一下子进程进行aof rewrite的过程,进入rewriteAppendOnlyFile函数。大体上,就是遍历所有key,进行序列化,然后记录到aof文件中。

    dictIterator *di = NULL;
dictEntry *de;
rio aof;
FILE *fp;
char tmpfile[256];
int j;
long long now = mstime(); /* Note that we have to use a different temp name here compared to the
* one used by rewriteAppendOnlyFileBackground() function. */
snprintf(tmpfile,256,"temp-rewriteaof-%d.aof", (int) getpid());
fp = fopen(tmpfile,"w");
if (!fp) {
redisLog(REDIS_WARNING, "Opening the temp file for AOF rewrite in rewriteAppendOnlyFile(): %s", strerror(errno));
return REDIS_ERR;
}

获取当前时间,生成临时文件名并创建该文件。

    rioInitWithFile(&aof,fp);
if (server.aof_rewrite_incremental_fsync)
rioSetAutoSync(&aof,REDIS_AOF_AUTOSYNC_BYTES);

rio就是面向流的I/O接口,底层可以有不同实现,目前提供了文件和内存buffer的实现。这里对rio进行初始化。如果配置了server.aof_rewrite_incremental_fsync,则在写aof时会增量地进行fsync,这里配置的是每写入32M就sync一次。避免集中sync导致磁盘跑满。 
接下来是一个循环,用于遍历redis的每个db,对其进行rewirte。直接看循环内部:

        char selectcmd[] = "*2\r\n$6\r\nSELECT\r\n";
redisDb *db = server.db+j;
dict *d = db->dict;
if (dictSize(d) == 0) continue;
di = dictGetSafeIterator(d);
if (!di) {
fclose(fp);
return REDIS_ERR;
} /* SELECT the new DB */
if (rioWrite(&aof,selectcmd,sizeof(selectcmd)-1) == 0) goto werr;
if (rioWriteBulkLongLong(&aof,j) == 0) goto werr;

首先,生成对应db的select命令,然后查看如果db为空的话,就跳过,rewrite下一个db。然后获取该db的迭代器,如果获取失败,直接返回错误。最后将select db的命令写入文件。 
接下来还是一个循环,用于遍历db的每一个key,生成相应的命令。

while ((de = dictNext(di)) != NULL) {
// ...
}
dictReleaseIterator(di);

继续看循环内部:

            sds keystr;
robj key, *o;
long long expiretime; keystr = dictGetKey(de);
o = dictGetVal(de);
initStaticStringObject(key,keystr); expiretime = getExpire(db,&key); /* If this key is already expired skip it */
if (expiretime != -1 && expiretime < now) continue;

de是dict的一个entry,包含了key和value。这里,首先获取key和value,并将key转换成robj类型。然后,获取key对应的超时时间。如果已经超时,则跳过这个key。

            /* Save the key and associated value */
if (o->type == REDIS_STRING) {
/* Emit a SET command */
char cmd[]="*3\r\n$3\r\nSET\r\n";
if (rioWrite(&aof,cmd,sizeof(cmd)-1) == 0) goto werr;
/* Key and value */
if (rioWriteBulkObject(&aof,&key) == 0) goto werr;
if (rioWriteBulkObject(&aof,o) == 0) goto werr;
} else if (o->type == REDIS_LIST) {
if (rewriteListObject(&aof,&key,o) == 0) goto werr;
} else if (o->type == REDIS_SET) {
if (rewriteSetObject(&aof,&key,o) == 0) goto werr;
} else if (o->type == REDIS_ZSET) {
if (rewriteSortedSetObject(&aof,&key,o) == 0) goto werr;
} else if (o->type == REDIS_HASH) {
if (rewriteHashObject(&aof,&key,o) == 0) goto werr;
} else {
redisPanic("Unknown object type");
}

接下来,根据对象的类型,序列化成相应的命令。并将命令写入aof文件中。具体各个对象的序列化,这里不再详述。

            /* Save the expire time */
if (expiretime != -1) {
char cmd[]="*3\r\n$9\r\nPEXPIREAT\r\n";
if (rioWrite(&aof,cmd,sizeof(cmd)-1) == 0) goto werr;
if (rioWriteBulkObject(&aof,&key) == 0) goto werr;
if (rioWriteBulkLongLong(&aof,expiretime) == 0) goto werr;
}

如果有超时时间,同样序列化成命令记录到aof文件。
所有db的rewrite结束后,进行清理工作。

    /* Make sure data will not remain on the OS's output buffers */
if (fflush(fp) == EOF) goto werr;
if (fsync(fileno(fp)) == -1) goto werr;
if (fclose(fp) == EOF) goto werr; /* Use RENAME to make sure the DB file is changed atomically only
* if the generate DB file is ok. */
if (rename(tmpfile,filename) == -1) {
redisLog(REDIS_WARNING,"Error moving temp append only file on the final destination: %s", strerror(errno));
unlink(tmpfile);
return REDIS_ERR;
}
redisLog(REDIS_NOTICE,"SYNC append only file rewrite performed");
return REDIS_OK;

调用fflush,fsync将数据落地到磁盘,最后close文件。将临时文件重命名,确保生成的aof文件完全ok,避免出现aof不完整的情况。最后,打印日志并返回。

werr:
fclose(fp);
unlink(tmpfile);
redisLog(REDIS_WARNING,"Write error writing append only file on disk: %s", strerror(errno));
if (di) dictReleaseIterator(di);
return REDIS_ERR;

在打开文件后,任何一个步出错,都会跳到werr,进行错误处理。这里,需要将文件close,删除临时文件,如果dict的迭代器没有释放的话,需要进行释放。最后,返回error。 
到这,子进程的aof rewrite任务就完成了,现在rewrite后的文件已经生成,但是在rewrite过程中得日志并没有记录到aof文件,所以还需部分收尾工作,这是在主进程中完成的。

3. AOF Rewrite Buffer追加

多进程编程中,子进程退出后,父进程需要对其进行清理,否则子进程会编程僵尸进程。同样是在serverCron函数中,主进程完成对rewrite进程的清理。

    /* Check if a background saving or AOF rewrite in progress terminated. */
if (server.rdb_child_pid != -1 || server.aof_child_pid != -1) {
int statloc;
pid_t pid; if ((pid = wait3(&statloc,WNOHANG,NULL)) != 0) {
int exitcode = WEXITSTATUS(statloc);
int bysignal = 0; if (WIFSIGNALED(statloc)) bysignal = WTERMSIG(statloc); if (pid == server.rdb_child_pid) {
backgroundSaveDoneHandler(exitcode,bysignal);
} else if (pid == server.aof_child_pid) {
backgroundRewriteDoneHandler(exitcode,bysignal);
} else {
redisLog(REDIS_WARNING,
"Warning, detected child with unmatched pid: %ld",
(long)pid);
}
updateDictResizePolicy();
}
} else {

如果正在进程rdb dump或者aof rewrite,主进程会非阻塞的调用wait3函数,以便在子进程退出后,获取其退出状态。如果退出的进程是aof rewrite进程的话,会调用backgroundRewriteDoneHandler函数进行最后的收尾工作。下面看一下这个函数。 
如果正常退出的情况下,就是没有被信号kill,并且退出码等于0。

        int newfd, oldfd;
char tmpfile[256];
long long now = ustime();
mstime_t latency; redisLog(REDIS_NOTICE,
"Background AOF rewrite terminated with success"); /* Flush the differences accumulated by the parent to the
* rewritten AOF. */
latencyStartMonitor(latency);
snprintf(tmpfile,256,"temp-rewriteaof-bg-%d.aof",
(int)server.aof_child_pid);
newfd = open(tmpfile,O_WRONLY|O_APPEND);
if (newfd == -1) {
redisLog(REDIS_WARNING,
"Unable to open the temporary AOF produced by the child: %s", strerror(errno));
goto cleanup;
}

首先是记录日志,然后打开临时写入的rewrite文件。

        // <MM>
// 将rewrite buf追加到文件
// </MM>
if (aofRewriteBufferWrite(newfd) == -1) {
redisLog(REDIS_WARNING,
"Error trying to flush the parent diff to the rewritten AOF: %s", strerror(errno));
close(newfd);
goto cleanup;
}
latencyEndMonitor(latency);
latencyAddSampleIfNeeded("aof-rewrite-diff-write",latency); redisLog(REDIS_NOTICE,
"Parent diff successfully flushed to the rewritten AOF (%lu bytes)", aofRewriteBufferSize());

接下来,将aof rewrite buffer追加到文件。

        /* The only remaining thing to do is to rename the temporary file to
* the configured file and switch the file descriptor used to do AOF
* writes. We don't want close(2) or rename(2) calls to block the
* server on old file deletion.
*
* There are two possible scenarios:
*
* 1) AOF is DISABLED and this was a one time rewrite. The temporary
* file will be renamed to the configured file. When this file already
* exists, it will be unlinked, which may block the server.
*
* 2) AOF is ENABLED and the rewritten AOF will immediately start
* receiving writes. After the temporary file is renamed to the
* configured file, the original AOF file descriptor will be closed.
* Since this will be the last reference to that file, closing it
* causes the underlying file to be unlinked, which may block the
* server.
*
* To mitigate the blocking effect of the unlink operation (either
* caused by rename(2) in scenario 1, or by close(2) in scenario 2), we
* use a background thread to take care of this. First, we
* make scenario 1 identical to scenario 2 by opening the target file
* when it exists. The unlink operation after the rename(2) will then
* be executed upon calling close(2) for its descriptor. Everything to
* guarantee atomicity for this switch has already happened by then, so
* we don't care what the outcome or duration of that close operation
* is, as long as the file descriptor is released again. */
if (server.aof_fd == -1) {
// <MM>
// 没有开启AOF,由命令触发的aof rewrite
// </MM>
/* AOF disabled */ /* Don't care if this fails: oldfd will be -1 and we handle that.
* One notable case of -1 return is if the old file does
* not exist. */
oldfd = open(server.aof_filename,O_RDONLY|O_NONBLOCK);
} else {
/* AOF enabled */
oldfd = -1; /* We'll set this to the current AOF filedes later. */
} /* Rename the temporary file. This will not unlink the target file if
* it exists, because we reference it with "oldfd". */
latencyStartMonitor(latency);
if (rename(tmpfile,server.aof_filename) == -1) {
redisLog(REDIS_WARNING,
"Error trying to rename the temporary AOF file: %s", strerror(errno));
close(newfd);
if (oldfd != -1) close(oldfd);
goto cleanup;
}
latencyEndMonitor(latency);
latencyAddSampleIfNeeded("aof-rename",latency); if (server.aof_fd == -1) {
/* AOF disabled, we don't need to set the AOF file descriptor
* to this new file, so we can close it. */
close(newfd);
} else {
/* AOF enabled, replace the old fd with the new one. */
oldfd = server.aof_fd;
server.aof_fd = newfd;
if (server.aof_fsync == AOF_FSYNC_ALWAYS)
aof_fsync(newfd);
else if (server.aof_fsync == AOF_FSYNC_EVERYSEC)
aof_background_fsync(newfd);
server.aof_selected_db = -1; /* Make sure SELECT is re-issued */
aofUpdateCurrentSize();
server.aof_rewrite_base_size = server.aof_current_size; /* Clear regular AOF buffer since its contents was just written to
* the new AOF from the background rewrite buffer. */
sdsfree(server.aof_buf);
server.aof_buf = sdsempty();
}

然后,将临时文件重命名为最终的aof文件。

        server.aof_lastbgrewrite_status = REDIS_OK;

        redisLog(REDIS_NOTICE, "Background AOF rewrite finished successfully");
/* Change state from WAIT_REWRITE to ON if needed */
if (server.aof_state == REDIS_AOF_WAIT_REWRITE)
server.aof_state = REDIS_AOF_ON; /* Asynchronously close the overwritten AOF. */
if (oldfd != -1) bioCreateBackgroundJob(REDIS_BIO_CLOSE_FILE,(void*)(long)oldfd,NULL,NULL); redisLog(REDIS_VERBOSE,
"Background AOF rewrite signal handler took %lldus", ustime()-now);

最后,更新状态,异步关闭之前的aof文件。 
如果rewrite子进程异常退出,由信号kill或者退出码非0,则只是记录 日志。

    } else if (!bysignal && exitcode != 0) {
server.aof_lastbgrewrite_status = REDIS_ERR; redisLog(REDIS_WARNING,
"Background AOF rewrite terminated with error");
} else {
server.aof_lastbgrewrite_status = REDIS_ERR; redisLog(REDIS_WARNING,
"Background AOF rewrite terminated by signal %d", bysignal);
}

在追加rewrite buffer或者重命名文件失败时,需要进行清理工作,有cleanup分支处理:

cleanup:
aofRewriteBufferReset();
aofRemoveTempFile(server.aof_child_pid);
server.aof_child_pid = -1;
server.aof_rewrite_time_last = time(NULL)-server.aof_rewrite_time_start;
server.aof_rewrite_time_start = -1;
/* Schedule a new rewrite if we are waiting for it to switch the AOF ON. */
if (server.aof_state == REDIS_AOF_WAIT_REWRITE)
server.aof_rewrite_scheduled = 1;

主要就是重置状态,以便进行下一次rewrite。 
上面就是aof rewrite的整体流程,下面会介绍rdb相关部分。

redis源码分析——aofrewrite的更多相关文章

  1. Redis源码分析:serverCron - redis源码笔记

    [redis源码分析]http://blog.csdn.net/column/details/redis-source.html   Redis源代码重要目录 dict.c:也是很重要的两个文件,主要 ...

  2. redis源码分析之事务Transaction(下)

    接着上一篇,这篇文章分析一下redis事务操作中multi,exec,discard三个核心命令. 原文地址:http://www.jianshu.com/p/e22615586595 看本篇文章前需 ...

  3. redis源码分析之发布订阅(pub/sub)

    redis算是缓存界的老大哥了,最近做的事情对redis依赖较多,使用了里面的发布订阅功能,事务功能以及SortedSet等数据结构,后面准备好好学习总结一下redis的一些知识点. 原文地址:htt ...

  4. redis源码分析之事务Transaction(上)

    这周学习了一下redis事务功能的实现原理,本来是想用一篇文章进行总结的,写完以后发现这块内容比较多,而且多个命令之间又互相依赖,放在一篇文章里一方面篇幅会比较大,另一方面文章组织结构会比较乱,不容易 ...

  5. redis源码分析之有序集SortedSet

    有序集SortedSet算是redis中一个很有特色的数据结构,通过这篇文章来总结一下这块知识点. 原文地址:http://www.jianshu.com/p/75ca5a359f9f 一.有序集So ...

  6. Redis源码分析(intset)

    源码版本:4.0.1 源码位置: intset.h:数据结构的定义 intset.c:创建.增删等操作实现 1. 整数集合简介 intset是Redis内存数据结构之一,和之前的 sds. skipl ...

  7. Redis源码分析(dict)

    源码版本:redis-4.0.1 源码位置: dict.h:dictEntry.dictht.dict等数据结构定义. dict.c:创建.插入.查找等功能实现. 一.dict 简介 dict (di ...

  8. Redis源码分析系列

    0.前言 Redis目前热门NoSQL内存数据库,代码量不是很大,本系列是本人阅读Redis源码时记录的笔记,由于时间仓促和水平有限,文中难免会有错误之处,欢迎读者指出,共同学习进步,本文使用的Red ...

  9. redis源码分析(一)-sds实现

    redis支持多种数据类型,sds(simple dynamic string)是最基本的一种,redis中的字符串类型大多使用sds保存,它支持动态的扩展与压缩,并提供许多工具函数.这篇文章将分析s ...

随机推荐

  1. P2756 飞行员配对方案问题(网络流24题之一)

    题目背景 第二次世界大战时期.. 题目描述 英国皇家空军从沦陷国征募了大量外籍飞行员.由皇家空军派出的每一架飞机都需要配备在航行技能和语言上能互相配合的2 名飞行员,其中1 名是英国飞行员,另1名是外 ...

  2. [洛谷P4910]帕秋莉的手环

    题目大意:有一个$n(n\leqslant10^{18})$个点的环,每个点可以是$0$或$1$,要求相邻点中至少一个$1$,问方案数,多组询问. 题解:先考虑是一条链的情况,令$f_{i,j}$表示 ...

  3. CF 566A Matching Names

    CF 566A Matching Names 题目描述 给出n个名字和n个昵称,求一个名字和昵称的劈配方案,使得被劈配的名字和昵称的最长公共前缀长度的和最大. 1<=n<=100000 字 ...

  4. Linux内核分析第七周———可执行程序的装载

    Linux内核分析第七周---可执行程序的装载 李雪琦+原创作品转载请注明出处 + <Linux内核分析>MOOC课程http://mooc.study.163.com/course/US ...

  5. 洛谷八连测R5题解

    woc居然忘了早上有八连测T T 还好明早还有一场...今天的题除了T3都挺NOIP的... T1只需要按横坐标第一关键字,纵坐标第二关键字排序一个一个取就好了... #include<iost ...

  6. 解题:ZJOI 2014 力

    题面 事实说明只会FFT板子是没有用的,还要把式子推成能用FFT/转化一下卷积的方式 虽然这个题不算难的多项式卷积 稍微化简一下可以发现实际是$q_i$和$\frac{1}{(i-j)^2}$在卷,然 ...

  7. 高性能相关、Scrapy框架

    高性能相关 在编写爬虫时,性能的消耗主要在IO请求中,当单进程单线程模式下请求URL时必然会引起等待,从而使得请求整体变慢. import requests def fetch_async(url): ...

  8. 基于 Quartz.NET 实现可中断的任务

    基于 Quartz.NET 实现可中断的任务 Quartz.NET 是一个开源的作业调度框架,非常适合在平时的工作中,定时轮询数据库同步,定时邮件通知,定时处理数据等. Quartz.NET 允许开发 ...

  9. CodeChef DGCD

    You're given a tree on N vertices. Each vertex has a positive integer written on it, number on the i ...

  10. Listener 介绍

    当 web 应用在 web 容器中运行时,web 应用内部会不断地发生各种事件:如 web 应用启动.web 应用停止,用户 session 开始.用户 session 结束.用户请求到达等. 实际上 ...