http://pauladamsmith.com/articles/redis-under-the-hood.html#redis-under-the-hood

How does the Redis server work?

I was curious to learn more about Redis’s internals, so I’ve been familiarizing myself with the source, largely by reading and jumping around in Emacs. After I had peeled back enough of the onion’s layers, I realized I was trying to keep track of too many details in my head, and it wasn’t clear how it all hung together. I decided to write out in narrative form how an instance of the Redis server starts up and initializes itself, and how it handles the request/response cycle with a client, as a way of explaining it to myself, hopefully in a clear fashion. Luckily, Redis has a nice, clean code base that is easy to read and follow along. Armed with a TAGS file, my $EDITOR, and GDB, I set out to see how it all works under the hood. (Incidentally, I was working with the Redis code base as of commit b4f2e41. Of course, internals such as I outline below are subject to change. However, the broad architecture of the server is unlikely to change very much, and I tried to keep that in mind as I went along.)

This article examines server startup and takes a high-level view of the request/response processing cycle. In a subsequent article, I’ll dive in to greater detail and trace a simple SET/GET command pair as they make their way through Redis.

Startup

Let’s begin with the main() function in redis.c.

Beginning global server state initialization

First, initServerConfig() is called. This partially initializes a variable server, which has the type struct redisServer, that serves as the global server state.

// redis.h:338
struct redisServer {
pthread_t mainthread;
int port;
int fd;
redisDb *db;
// ...
}; // redis.c:69
struct redisServer server; /* server global state */

There are a huge number of members in this struct, but they generally fall into the following categories:

  • general server state
  • statistics
  • configuration from config file
  • replication
  • sort parameters
  • virtual memory config, state, I/O threads, & stats
  • zip structure
  • event loop helpers
  • pub/sub

For example, this struct includes members that map to options in the configuration file (usually named redis.conf) such as the port the server listens on and how verbose logging should be, pointers to linked lists of connected clients and slave Redis servers as well as the Redis database(s) itself, and counters for statistics like the number of commands processed since startup.

initServerConfig() provides default values for the members that can be configured by the user via the redis.conf config file.

Setting up command table

The next thing main() does is sort the table of Redis commands. These are defined in a global variable readonlyCommandTable which is an array of struct redisCommands.

// redis.c:70
struct redisCommand *commandTable;
struct redisCommand readonlyCommandTable[] = {
{"get",getCommand,2,REDIS_CMD_INLINE,NULL,1,1,1},
{"set",setCommand,3,REDIS_CMD_BULK|REDIS_CMD_DENYOOM,NULL,0,0,0},
{"setnx",setnxCommand,3,REDIS_CMD_BULK|REDIS_CMD_DENYOOM,NULL,0,0,0},
{"setex",setexCommand,4,REDIS_CMD_BULK|REDIS_CMD_DENYOOM,NULL,0,0,0},
{"append",appendCommand,3,REDIS_CMD_BULK|REDIS_CMD_DENYOOM,NULL,1,1,1},
// ...
}; // redis.h:458
typedef void redisCommandProc(redisClient *c);
// ...
struct redisCommand {
char *name;
redisCommandProc *proc;
int arity;
int flags;
// ...
};

redisCommand struct keeps track of its name — the mnemonic, i.e., “get” — a pointer to the actual C function that performs the command, the command’s arity, command flags such as whether it returns a bulk response, and a number of VM-specific members.)

The read-only table is ordered in source code so that commands are grouped by category, such as string commands, list commands, set commands, etc., to make it easier for a programmer to scan the table for similar commands. The sorted table of commands is pointed to by the global variable commandTable, and is used to lookup Redis commands with a standard binary search (lookupCommand(), which returns a pointer to a redisCommand).

Loading config file

main() moves on to processing command-line options given by the user starting up the redis-server executable. Currently, there is only a single argument that Redis takes — aside from the usual version -vand help -h flags — which is a path to a config file. If the path is given, Redis loads the config file and overrides any defaults already set by initServerConfig() by calling loadServerConfig(). This function is fairly straightforward, looping over each line in the config file and converting values that match directive names to appropriate types for the matching member in the serverstruct. At this point, Redis will daemonize and detach from the controlling terminal if it has been configured to do so.

initServer()

initServer() finishes the job of initializing the server struct that was begun by initServerConfig(). First, it sets up signal handling (SIGHUP and SIGPIPE signals are ignored — there is an opportunity to improve Redis by adding the ability to reload its config file when it receives a SIGHUP, in the fashion of other daemons), including printing a stacktrace if the server receives a SIGSEGV (and other related signals), see segvHandler().

A number of doubly-linked lists (see adlist.h) are created to keep track of clients, slaves, monitors (a client that has sent theMONITOR command), and an object free list.

Shared objects

One interesting thing Redis does is create a number of shared objects, which are accessible via the global shared struct. Common Redis objects that are required by many different commands, response strings and error messages, for example, can be shared without having to allocate them each time, saving memory, with the tradeoff of a bit more initialization effort at startup time.

// redis.c:662
shared.crlf = createObject(REDIS_STRING,sdsnew("\r\n"));
shared.ok = createObject(REDIS_STRING,sdsnew("+OK\r\n"));
shared.err = createObject(REDIS_STRING,sdsnew("-ERR\r\n"));
shared.emptybulk = createObject(REDIS_STRING,sdsnew("$0\r\n\r\n"));
// ...
Shared integers

The greatest impact in terms of memory savings with shared objects comes from a large pool of shared integers.

// redis.c:705
for (j = 0; j < REDIS_SHARED_INTEGERS; j++) {
shared.integers[j] = createObject(REDIS_STRING,(void*)(long)j);
shared.integers[j]->encoding = REDIS_ENCODING_INT;
}

createSharedObjects() creates an array of the first 10,000 non-negative integers as Redis objects (strings with an integer encoding). Various Redis objects like strings, sets, and lists often contain many small integers (for IDs or counters), and they can reuse the same objects already allocated in memory, for a large potential savings. One could imagine the constant that defines the number of shared integers to be created, REDIS_SHARED_INTEGERS, being exposed to configuration to give users the opportunity, based on knowledge of their applications and needs, to increase the number of shared integers for greater memory savings. The tradeoff is that Redis statically allocates slightly more memory at startup time, but this would likely be a small amount compared to the overall typical database sizes and potential savings.

Event loop

initServer() continues by creating the core event loop, calling aeCreateEventLoop() (see ae.c) and assigning the result to the elmember of server.

One key aspect of Redis’s implementation is the use of locally-provided wrappers to simplify and hide the complexity of common tasks, without adding dependecies at build-time. For example,zmalloc.h defines a number of wrappers for the family of*alloc() functions, which keep track of how much memory Redis has allocated. sds.hdefines an API to a dynamic strings library (basically, a string that keeps track of its length and whether its memory can be freed.

ae.h provides a platform-independent wrapper for setting up I/O event notification loops, which usesepoll on Linux, kqueue on BSD, and falls back to select if the respective first choice is not available.) Redis’s event loop polls for new connections and I/O events (reading requests from and writing responses to a socket), being triggered when a new event arrives. This is what makes Redis so responsive, it can serve thousands of clients simultaneously without blocking while individual requests are processed and responded to.

Databases

initServer() also initializes a number of redisDb objects, which are structs that encapsulate the details of a particular Redis database, including tracking expiring keys, keys that are blocking (either from a B{L,R}POPcommand or from I/O), and keys that are being watched for check-and-set. (By default there are 16 separate databases, which can be thought of as namespaces within a Redis server.)

TCP socket

initServer() is where the socket that Redis listens for connections (by default, bound to port 6379) is set up. Another Redis-local wrapper, anet.h, defines anetTcpServer() and a number of other functions that simplify the usual complexity of setting up a new socket, binding, and listening to a port.

// redis.c:791
server.fd = anetTcpServer(server.neterr, server.port, server.bindaddr);
if (server.fd == -1) {
redisLog(REDIS_WARNING, "Opening TCP port: %s", server.neterr);
exit(1);
}

Server cron

initServer() further allocates various dicts and lists for databases and for pub/sub, resets stats and various flags, and notes the UNIX timestamp of the server start time. It registers serverCron() with the event loop as a time event, executing that function once every 100 milliseconds. (This is a bit tricky, because initially, serverCron() is set to run in 1 millisecond by initServer(), in order to have the cron cycle start right away with the server startup, but then the return value of serverCron(), which is 100, is plugged in to the calculation of the next time the time event process should be handled.)

serverCron() performs a number of periodic tasks for Redis, including verbose logging of database size (# of keys and memory used) and connected clients, resizing hash tables, closing idle/timed-out client connections, performing any post-background save or AOF rewrite cleanup, kicking off a background save if the save conditions as configured have been met (so many keys changed in so many seconds), calculating LRU information and dealing with expired keys (Redis only expires a few timed-out keys per cron cycle, using a adaptive, statistical method to avoid tying up the server, but will get more aggressive if expiring keys can help avoid out-of-memory situations), swapping out values to disk if virtual memory is enabled, and syncing with a master if this server is a slave.

Registering connection handler with event loop

Crucially, initServer() hooks up the event loop with the server’s TCP socket by registering the socket’s descriptor, registering the acceptHandler() function to be called when a new connection is accepted. (More on this below in the “Processing a request” section.)

// redis.c:821
if (aeCreateFileEvent(server.el, server.fd, AE_READABLE,
acceptHandler, NULL) == AE_ERR) oom("creating file event");

Opening the AOF

initServer() creates or opens the append-only file (AOF), it the server was configured to use it.

// redis.c:824
if (server.appendonly) {
server.appendfd = open(server.appendfilename,O_WRONLY|O_APPEND|O_CREAT,0644);

Finally, initServer() initializes Redis’s virtual memory system, again, if the server had been configured for it.

Back up to main()

If the server was configured to daemonize, Redis will now try to write out a pid file (the path of which is configurable, but defaults to /var/run/redis.pid).

At this point, the server has started up, and Redis will log this fact to its log file. However, there’s still a bit more to do in main()before Redis is fully ready.

Restoring data

If there is an AOF or a database dump file (eg., dump.rdb), it will be loaded, restoring data to the server from a previous session. (If both exist, the AOF takes priority.)

// redis.c:1452
if (server.appendonly) {
if (loadAppendOnlyFile(server.appendfilename) == REDIS_OK)
redisLog(REDIS_NOTICE,"DB loaded from append only file: %ld seconds",time(NULL)-start);
} else {
if (rdbLoad(server.dbfilename) == REDIS_OK)
redisLog(REDIS_NOTICE,"DB loaded from disk: %ld seconds",time(NULL)-start);
}

The server is now ready to start accepting requests.

Event loop setup

To finish up, Redis registers a function to be called each time it enters the event loop, beforeSleep() (since the process essentially goes to sleep while it waits to be notified of events). beforeSleep() does two things: it deals with serving clients that have requested keys that were swapped to disk if the virtual memory system was enabled, and it flushes the AOF to disk. The writing of the AOF is handled by flushAppendOnlyFile(). The function encapsulates some tricky logic about flushing the buffer that holds pending AOF writes (the frequency of which is configurable by the user).

Entering the event loop

Redis now enters the main event loop by calling aeMain(), with the argument server.el (remember, this member contains a pointer to an aeEventLoop). If there are any time (i.e., server cron) or file events to process each time through the loop, their respective handler functions will be called. aeProcessEvents() encapsulates this logic — time events are handled by custom logic, whereas file events are handled by the underlying epoll or kqueue or select I/O event notification system.

Because of Redis’s need to respond to time events as well as file or I/O events, it implements a custom event/polling loop,aeMain(). By checking to see if any time events need processing, and utilizing file event notification, the event loop can efficiently sleep until there is work to be done, and not tie up the CPU in a tight while loop.

Processing a request & returning a response

We are now inside Redis’s main event polling loop, listening on a port and waiting for clients to connect. It’s time to look at how Redis processes a command request.

Handling a new connection

Back under initServer(), Redis registered acceptHandler() to be called when there was an I/O event associated with the file descriptor of the socket the server is listening to (i.e., the socket has data waiting to be read or written). acceptHandler() creates a client object — pointer to a redisClient, a struct defined in redis.h — to represent a new client connection.

// networking.c:347
cfd = anetAccept(server.neterr, fd, cip, &cport);
if (cfd == AE_ERR) {
redisLog(REDIS_VERBOSE,"Accepting client connection: %s", server.neterr);
return;
}
redisLog(REDIS_VERBOSE,"Accepted %s:%d", cip, cport);
if ((c = createClient(cfd)) == NULL) {
redisLog(REDIS_WARNING,"Error allocating resoures for the client");
close(cfd); /* May be already closed, just ingore errors */
return;
}

createClient() is called to allocate and initialize the client object. It selects database 0 by default (since there has to be at least one Redis db per server), and associates the client file descriptor generated by accept(2) in acceptHandler() with the client object. Other flags and members are initialized, and finally the client is appended to the global list of clients being tracked byserver.clients. The key thing Redis does in createClient() is registering a handler with the event loop, the functionreadQueryFromClient(), for when there is data from the client connection to be read.

// networking.c:20
if (aeCreateFileEvent(server.el,fd,AE_READABLE, readQueryFromClient, c) == AE_ERR)
{
close(fd);
zfree(c);
return NULL;
}

Reading a command from a client

readQueryFromClient() is called by the main event loop when the client makes a command request. (If you are debugging with GDB, this is a good function to set as a breakpoint.) It reads it as much as it can of the command — up to 1024 bytes — to a temporary buffer, then appends it to a client-specific query buffer. This allows Redis to process commands where the payload (command name plus arguments) is larger than 1024 bytes, or because of I/O reasons have been split up into multiple read events. It then calls processInputBuffer(), passing the client object as an argument.

// networking.c:754
void readQueryFromClient(aeEventLoop *el, int fd, void *privdata, int mask) {
redisClient *c = (redisClient*) privdata;
char buf[REDIS_IOBUF_LEN];
int nread;
// ... nread = read(fd, buf, REDIS_IOBUF_LEN);
// ...
if (nread) {
size_t oldlen = sdslen(c->querybuf);
c->querybuf = sdscatlen(c->querybuf, buf, nread);
c->lastinteraction = time(NULL);
/* Scan this new piece of the query for the newline. We do this
* here in order to make sure we perform this scan just one time
* per piece of buffer, leading to an O(N) scan instead of O(N*N) */
if (c->bulklen == -1 && c->newline == NULL)
c->newline = strchr(c->querybuf+oldlen,'\n');
} else {
return;
}
Processinputbuffer(c);
}

processInputBuffer() parses the raw query from the client into arguments for execution of a Redis command. It first has to contend with the possibility that the client is blocked in a B{L,R}POP command, and bails early if that is the case. The function then parses the raw query buffer into arguments, creating Redis string objects of each and storing them in an array on the client object. The query is in the form of the Redis protocolprocessInputBuffer() is really a protocol parser, calling back toprocessCommand() to fully parse the query. Somewhat confusingly, the source code comments describe parsing the “multi bulk command type,” an alternative protocol originally meant for commands like MSET, but it actually is now the main Redis protocol for all commands. It is binary-safe and easy to parse and debug. (Note: this code has been refactored for the upcoming 2.2 release and is a bit easier to follow.) Now it’s time to actually execute the command the client has sent, by callingprocessCommand() on the client object.

processCommand() takes the arguments of a command from a client and executes it. Before it gets to the actual execution of the command, it performs a number of checks — if any check fails, it appends a error message to the client object’s reply list and returns to the caller, processInputBuffer(). After handling the QUIT command as a special case (in order to shut down the client safely), processCommand() looks up the command name in the commandTable that was previously set up during Redis’s start up cycle. If it’s an unknown command, or the client got the arity of the command wrong, it’s an error. While it’s not commonly used, Redis can be configured to require a password to authenticate a client before it will accept commands, and this is the stage where Redis will check if the client is authenticate, and will set an error if not. If Redis is configured to use up to a maximum amount of memory, it will at this point try to free up memory if it can (be freeing objects from the free list and removing expired keys), otherwise if the server is over the limit, it won’t process commands with the REDIS_CMD_DENYOOM flag set (mainly writes like SET,INCRRPUSHZADD, etc.), again, an error. One final check Redis makes is that a client can only issue SUBSCRIBE or UNSUBSCRIBEcommands while there are outstanding channels subscribed to, otherwise, it’s an error. If all the checks have been passed, the command will be executed, by calling call() with the client object and the command object as arguments.

Executing the command and responding

call(), gets a pointer to function of type struct redisCommandProc, from the proc member of the command object, which takes a single argument, that of a client object. The Redis command procedure is called.

// redis.c:864
void call(redisClient *c, struct redisCommand *cmd) {
long long dirty; dirty = server.dirty;
cmd->proc(c);
dirty = server.dirty-dirty;
}
// ...

Each Redis command is responsible for setting the reply for the client. This is possible because the signature of a Redis command procedure is of a single argument, which is the client object. Likewise, each command procedure is responsible for encoding, or deserializing, arguments from the command, and for decoding, or serializing, Redis objects in memory for response to the client.

Write commands, like SET and ZADD, make the server “dirty,” in other words, the server is marked as having pages in memory that have changed. This is important for the automatic save process, which keeps track of how many keys have changed in a certain period, or the writing to the AOF. The function callsfeedAppendOnlyFile() if use of the AOF has been enabled, which writes out the command buffer from the client to the AOF, so the command can be replayed. (It translates commands that set a relative key expiration to an absolute expiration, but otherwise it basically copies the command as it came in from the client, see catAppendOnlyGenericCommand().) If any slaves are connected, call() will send the command to each of them to be executed locally, see replicationFeedSlaves(). Likewise, if any clients are connected and have issued the MONITOR command, Redis will send a representation of the command, prefixed with a timestamp, see replicationFeedMonitors().

// redis.c:871 (call() cont.'d)
// ...
if (server.appendonly && dirty)
feedAppendOnlyFile(cmd,c->db->id,c->argv,c->argc);
if ((dirty || cmd->flags & REDIS_CMD_FORCE_REPLICATION) &&
listLength(server.slaves))
replicationFeedSlaves(server.slaves,c->db->id,c->argv,c->argc);
if (listLength(server.monitors))
replicationFeedMonitors(server.monitors,c->db->id,c->argv,c->argc);
server.stat_numcommands++;
}

Control returns to the caller, processCommand(), which resets the client object for subsequent commands.

As mentioned, each Redis command procedure is itself responsible for setting the response to be sent to the client. AfterreadQueryFromClient() exits and Redis returns the to event loop in aeMain()aeProcessEvents() will pick up the waiting response in the write buffer and will copy it to the socket the client is connected on.

And that’s it! The response has been sent, and both client and server are back to a state where they can respectively emit and process more Redis commands.

Summary

Redis starts up by initializing a global server state variable, and reading in an optional configuration file to override any defaults. It sets up a global command table that connects command names with the actual function that implements the command. It creates an event loop using the best available underlying system library for event/readiness notification, and registers a handler function for when there is a new client socket connection to accept. It also registers a periodic (i.e., time-based) event handler for dealing with cron-like tasks like key expiry that need to be addressed outside the regular client-handling path. Once a client has connected, a function is registered on the event loop for being notified when the client has data to be read (i.e., a query for a command). The client’s query is parsed and a command handler is called to execute the command and write the response back to the client (the writing of data to the client is also handled by the event-notification loop). The client object is reset and server is ready to process more queries.

Next time — tracing a SET and GET

I’ll follow-up this article with one that takes a close look at the processing of two commands, SET and GET, by stepping through the implementation of each command’s procedure and examining the data structures Redis uses to store and index data. March 15, 2011here it is.

Paul Smith (follow me on the Twitter)

Thanks to pietern on #redis for feedback on a draft of this article

October 18, 2010

Redis: under the hood---转载的更多相关文章

  1. 物联网应用中实时定位与轨迹回放的解决方案 – Redis的典型运用(转载)

    物联网应用中实时定位与轨迹回放的解决方案 – Redis的典型运用(转载)   2015年11月14日|    by: nbboy|    Category: 系统设计, 缓存设计, 高性能系统 摘要 ...

  2. nginx+tomcat+redis完成session共享(转载)

    转载:http://blog.csdn.net/grhlove123/article/details/48047735 tomcat7下基于redis的session共享所需jar包: http:// ...

  3. redis缓存机制【转载】

    转载自:http://blog.csdn.net/acmman/article/details/53434134 redis的主从模式(1)介绍redis存储数据是在内存中运行的,运行速度比关系型数据 ...

  4. Redis全方位讲解--主从复制(转载)

    前言 前面介绍了redis持久化和容灾备份,这篇会介绍redis主从复制和redis持久化在主从复制中的一些应用.因为本人没有那么多服务器或机器,所以这里主要介绍下如何在docker容器中搭建主从复制 ...

  5. 美团在Redis上踩过的一些坑-3.redis内存占用飙升(转载)

     一.现象:     redis-cluster某个分片内存飙升,明显比其他分片高很多,而且持续增长.并且主从的内存使用量并不一致.   二.分析可能原因:  1.  redis-cluster的bu ...

  6. ServiceStack.Redis之IRedisClient(转载)

    一.属性 IRedisClient的属性如下: 属性 说明 ConnectTimeout  连接超时 Db 当前数据库的ID或下标 DbSize  当前数据库的 key 的数量 HadExceptio ...

  7. 使用Node.js和Redis实现push服务--转载

    出处:http://blog.csdn.net/unityoxb/article/details/8532028 push服务是一项很有用处的技术,它能改善交互,提升用户体验.要实现这项服务通常有两种 ...

  8. CentOS 6.5 下安装 Redis 2.8.7(转载)

    wget http://download.redis.io/redis-stable.tar.gz tar xvzf redis-stable.tar.gz cd redis-stable make ...

  9. 超强、超详细Redis数据库入门教程(转载)

    这篇文章主要介绍了超强.超详细Redis入门教程,本文详细介绍了Redis数据库各个方面的知识,需要的朋友可以参考下   [本教程目录] 1.redis是什么 2.redis的作者何许人也 3.谁在使 ...

  10. win7 64位安装redis 及Redis Desktop Manager使用(转载的)

    写基于dapper的一套自动化程序,看到 mgravell的另一个项目,StackExchange.Redis,之前在.NET上用过一段时间Redis,不过一直是其它的驱动开发包,这个根据作者介绍,是 ...

随机推荐

  1. Java Socket实现基于TCP和UDP多线程通信

    一.通过Socket实现TCP编程 1.1 TCP编程 TCP协议是面向连接,可靠的,有序的,以字节流的方式发送数据.基于TCP协议实现网络通信的类有客户端的Socket类和服务器端的ServerSo ...

  2. java项目 远程debug

    AVA项目无法像PHP那样可以随时修改文件内容进行调试,调试可以借助eclipse,本地代码的话很容易在本地debug,但如果代码已经打包部署在linux上呢?可以进行远程debug   很简单,只需 ...

  3. python中的函数(基础)

    1.什么是函数 函数是指将一组数据的集合通过一个名字(函数名)封装起来,要想执行这个函数,只需调用函数名即可 (函数就是对功能或者动作的封装) 2.函数的语法和定义 def 函数名() 函数体 调用: ...

  4. shell脚本报错:-bash: xxx: /bin/bash^M: bad interpreter: No such file or directory

    当我们把文件从windows系统中编辑的文件拷贝到linux系统中,如果我们执行文件会保存如下的错: shell脚本报错:-bash: xxx: /bin/bash^M: bad interprete ...

  5. API接口安全加强设计方法

    前面两篇相关文章: <Web Api 内部数据思考 和 利用http缓存优化 Api> <Web Api 端点设计 与 Oauth> 1.开放的接口 这样的接口我们天天都在接触 ...

  6. php如何进行多进程与异步调用方法

    浏览器和服务器之间只一种面向无连接的HTTP协议进行通讯的,面向无连接的程序的特点是客户端请求服务端,服务端根据请求输出相应的程序,不能保持持久连接. 这样就出现了一个问题,一个客户端的相应服务端可能 ...

  7. Linux基础命令(一)

    Linux语法命令 [选项] 参数注意:[]内容是对命令的扩张1.命令中单词之间空格隔开2.单行命令最多256个字符3.大小写区分 clear 清屏pwd 查看当前目录cd 切换目录    .表示当前 ...

  8. git log 与 git reflog 的 区别

    git log: commit 的版本日志 包含提交的版本 操作者 日期 (方便查看commit的版本,但是版本回退后,使用git log 看不到回退版本号之后的版本记录) commit ce0d69 ...

  9. python 之 比较哪个数据大小

    #定义一个字典info={}#定义比较的人数n=int(input("请输入你要比较的人数"))#循环while(n): #输入a,b 两个数据 ,分别代表学号 和分数 # 把输入 ...

  10. 架构师养成记--17.disrunptor 多生产者多消费者

    入口: import java.nio.ByteBuffer; import java.util.UUID; import java.util.concurrent.CountDownLatch; i ...