Zookeeper 源码（七）请求处理

以单机启动为例讲解 Zookeeper 是如何处理请求的。先回顾一下单机时的请求处理链。

// 单机包含 3 个请求链：PrepRequestProcessor -> SyncRequestProcessor -> FinalRequestProcessor

protected void setupRequestProcessors() {

    RequestProcessor finalProcessor = new FinalRequestProcessor(this);

    RequestProcessor syncProcessor = new SyncRequestProcessor(this,

            finalProcessor);

    ((SyncRequestProcessor)syncProcessor).start();

    firstProcessor = new PrepRequestProcessor(this, syncProcessor);

    ((PrepRequestProcessor)firstProcessor).start();

}

请求的调用链如下：

PrepRequestProcessor.processRequest() <- ZooKeeperServer.submitRequest() <- ZooKeeperServer.processPacket() <- NettyServerCnxn.receiveMessage() <- CnxnChannelHandler.processMessage() <- CnxnChannelHandler.messageReceived()

RequestProcessor 接口

public interface RequestProcessor {

    public static class RequestProcessorException extends Exception {

        public RequestProcessorException(String msg, Throwable t) {

            super(msg, t);

        }

    }

    // 处理请求

    void processRequest(Request request) throws RequestProcessorException;

    // 关闭当前及子处理器，处理器可能是线程

    void shutdown();

}

一、PrepRequestProcessor

PrepRequestProcessor 是服务器的请求预处理器，能够识别出当前客户端是否是事务请求，对于事务请求，进行一系列预处理，如创建请求事务头，事务体，会话检查，ACL 检查等。

(1) PrepRequestProcessor 构造函数

public class PrepRequestProcessor extends ZooKeeperCriticalThread implements RequestProcessor {

    // 已提交请求队列

    LinkedBlockingQueue<Request> submittedRequests = new LinkedBlockingQueue<Request>();

    // 下个处理器

    private final RequestProcessor nextProcessor;

    // Zookeeper 服务器

    ZooKeeperServer zks;

    public PrepRequestProcessor(ZooKeeperServer zks, RequestProcessor nextProcessor) {

        // 初始化线程

        super("ProcessThread(sid:" + zks.getServerId() + " cport:"

                + zks.getClientPort() + "):", zks.getZooKeeperServerListener());

        this.nextProcessor = nextProcessor;

        this.zks = zks;

    }

}

说明：类的核心属性有 submittedRequests 和 nextProcessor，前者表示已经提交的请求，而后者表示提交的下个处理器。

(2) RequestProcessor 接口实现

// 接收请求

public void processRequest(Request request) {

    submittedRequests.add(request);

}

// 关闭线程

public void shutdown() {

    LOG.info("Shutting down");

    submittedRequests.clear();

    submittedRequests.add(Request.requestOfDeath);

    nextProcessor.shutdown();

}

既然请求都提交到 submittedRequests 中了，必然有地方消费 submittedRequests，下面看一下线程的处理过程。

(3) run(核心)

public void run() {

    try {

        while (true) {

            Request request = submittedRequests.take();

            long traceMask = ZooTrace.CLIENT_REQUEST_TRACE_MASK;

            if (request.type == OpCode.ping) {          // 请求类型为 PING

                traceMask = ZooTrace.CLIENT_PING_TRACE_MASK;

            }

            if (Request.requestOfDeath == request) {    // 结束线程

                break;

            }

            pRequest(request);                          // 处理请求(核心)

        }

    } catch (RequestProcessorException e) {

        if (e.getCause() instanceof XidRolloverException) {

            LOG.info(e.getCause().getMessage());

        }

        handleException(this.getName(), e);

    } catch (Exception e) {

        handleException(this.getName(), e);

    }

    LOG.info("PrepRequestProcessor exited loop!");

}

说明：run 函数是对 Thread 类 run 函数的重写，其核心逻辑相对简单，即不断从队列中取出 request 进行处理，其会调用 pRequest 函数，while 自旋这样做的好处是充分利用 CPU，避免线程频繁切换线程。

二、SyncRequestProcessor

在分析了 PrepRequestProcessor 处理器后，接着来分析 SyncRequestProcessor，该处理器将请求存入磁盘，其将请求批量的存入磁盘以提高效率，请求在写入磁盘之前是不会被转发到下个处理器的。

SyncRequestProcessor 除了会定期的把 request 持久化到本地磁盘，同时他还要维护本机的 txnlog 和 snapshot，这里的基本逻辑是：

每隔 snapCount/2 个 request 会重新生成一个 snapshot 并滚动一次 txnlog，同时为了避免所有的 zookeeper server 在同一个时间生成 snapshot 和滚动日志，这里会再加上一个随机数，snapCount 的默认值是 10w 个 request

(1) 重要属性

public class SyncRequestProcessor extends ZooKeeperCriticalThread implements RequestProcessor {

    private final ZooKeeperServer zks;

    // queuedRequests 接收外界传递的请求队列

    private final LinkedBlockingQueue<Request> queuedRequests = new LinkedBlockingQueue<Request>();

    private final RequestProcessor nextProcessor;

    // 快照处理线程

    private Thread snapInProcess = null;

    volatile private boolean running;

    // 等待被刷新到磁盘的请求队列

    private final LinkedList<Request> toFlush = new LinkedList<Request>();

    private final Random r = new Random(System.nanoTime());

    // 快照个数

    private static int snapCount = ZooKeeperServer.getSnapCount();

    // 关闭线程

    private final Request requestOfDeath = Request.requestOfDeath;

}

(2) run(核心方法)

public void run() {

    try {

        // 1. 初始化，日志数量为 0

        int logCount = 0;

        // 确保所有的服务器在同一时间不是使用的同一个快照

        int randRoll = r.nextInt(snapCount/2);

        while (true) {

            Request si = null;

            // 2. 没有需要刷新到磁盘的请求，则 take 取出数据，会阻塞

            if (toFlush.isEmpty()) {

                si = queuedRequests.take();

            // 3. 有则 poll 取出数据，不会阻塞

            } else {

                si = queuedRequests.poll();

                // 没有请求则先将已有的请求刷新到磁盘

                if (si == null) {

                    flush(toFlush);

                    continue;

                }

            }

            if (si == requestOfDeath) {

                break;

            }

            if (si != null) {

                // 4. 将请求添加至日志文件，只有事务性请求才会返回 true

                if (zks.getZKDatabase().append(si)) {

                    logCount++;

                    if (logCount > (snapCount / 2 + randRoll)) {

                        randRoll = r.nextInt(snapCount/2);

                        // 4.1 生成滚动日志 roll the log

                        zks.getZKDatabase().rollLog();

                        // 4.2 生成快照日志，如果 snapInProcess 线程仍在进行快照则忽略本次快照

                        if (snapInProcess != null && snapInProcess.isAlive()) {

                            LOG.warn("Too busy to snap, skipping");

                        } else {

                            snapInProcess = new ZooKeeperThread("Snapshot Thread") {

                                    public void run() {

                                        try {

                                            zks.takeSnapshot();

                                        } catch(Exception e) {

                                            LOG.warn("Unexpected exception", e);

                                        }

                                    }

                                };

                            snapInProcess.start();

                        }

                        logCount = 0;

                    }

                // 5. 查看此时 toFlush 是否为空，如果为空，说明近段时间读多写少，直接交给下一个处理器处理

                } else if (toFlush.isEmpty()) {

                    if (nextProcessor != null) {

                        nextProcessor.processRequest(si);

                        if (nextProcessor instanceof Flushable) {

                            ((Flushable)nextProcessor).flush();

                        }

                    }

                    continue;

                }

                toFlush.add(si);

                if (toFlush.size() > 1000) {

                    flush(toFlush);

                }

            }

        }

    } catch (Throwable t) {

        handleException(this.getName(), t);

    } finally{

        running = false;

    }

    LOG.info("SyncRequestProcessor exited!");

}

(3) flush(刷新到磁盘)

private void flush(LinkedList<Request> toFlush) throws IOException, RequestProcessorException {

    if (toFlush.isEmpty())

        return;

    // 1. 提交至 ZK 数据库

    zks.getZKDatabase().commit();

    // 2. 将所有的请求提交到下个处理器处理

    while (!toFlush.isEmpty()) {

        Request i = toFlush.remove();

        if (nextProcessor != null) {

            nextProcessor.processRequest(i);

        }

    }

    if (nextProcessor != null && nextProcessor instanceof Flushable) {

        // 刷新到磁盘

        ((Flushable)nextProcessor).flush();

    }

}

说明：该函数主要用于将toFlush队列中的请求刷新到磁盘中。

三、FinalRequestProcessor

FinalRequestProcessor 负责把已经 commit 的写操作应用到本机，对于读操作则从本机中读取数据并返回给 client，这个 processor 是责任链中的最后一个

FinalRequestProcessor 是一个同步处理的 processor，主要的处理逻辑就在方法 processRequest 中：

如果 request.hdr != null，则表明 request 是写操作，则调用 zks.processTxn(hdr, txn) 来把 request 关联的写操作执行到内存状态中
如果是写操作，则调用 zks.getZKDatabase().addCommittedProposal(request);

把 request 加入到 ZKDatabase.committedLog 队列中，这个队列主要是为了快速和 follower 同步而保留的
为各类操作准备响应数据，对于写操作则根据 processTxn 的结果来回复，如果是读操作，则读取内存中的状态
发送响应数据给 client

processRequest 的处理逻辑非常长，我们一点点分析。

(1) 处理事务请求

public void processRequest(Request request) {

    ProcessTxnResult rc = null;

    synchronized (zks.outstandingChanges) {

        // 1. 请求委托 ZookeeperServer 处理，zks 会针对事务和非事务请求会分别处理

        rc = zks.processTxn(request);

        // 2. request.hdr!=null 则是事务请求，即写操作，outstandingChanges 保存有所有的事务请求记录

        //    PrepRequestProcessor 会将事务请求添加到集合中，FinalRequestProcessor 则事务请求已经处理完毕需要移除

        if (request.getHdr() != null) {

            // 事务请求头

            TxnHeader hdr = request.getHdr();

            Record txn = request.getTxn();

            long zxid = hdr.getZxid();

            // zk 有严格的执行顺序，如果小于 zxid 则认为已经处理完毕

            while (!zks.outstandingChanges.isEmpty()

                   && zks.outstandingChanges.get(0).zxid <= zxid) {

                ChangeRecord cr = zks.outstandingChanges.remove(0);

                if (cr.zxid < zxid) {

                    LOG.warn("Zxid outstanding " + cr.zxid + " is less than current " + zxid);

                }

                if (zks.outstandingChangesForPath.get(cr.path) == cr) {

                    zks.outstandingChangesForPath.remove(cr.path);

                }

            }

        }

        // 3. 如果是事务请求，则把 request 加入到 ZKDatabase.committedLog 队列中

        if (request.isQuorum()) {

            zks.getZKDatabase().addCommittedProposal(request);

        }

    }

}

processRequest 将请求委托给了 zk 处理，我们看一下 ZookeeperServer 是如何处理请求的。

public ProcessTxnResult processTxn(Request request) {

    return processTxn(request, request.getHdr(), request.getTxn());

}

private ProcessTxnResult processTxn(Request request, TxnHeader hdr,

                                    Record txn) {

    ProcessTxnResult rc;

    int opCode = request != null ? request.type : hdr.getType();

    long sessionId = request != null ? request.sessionId : hdr.getClientId();

    if (hdr != null) {

        // 写操作(事务请求)

        rc = getZKDatabase().processTxn(hdr, txn);

    } else {

        // 读操作(非事务请求)

        rc = new ProcessTxnResult();

    }

    if (opCode == OpCode.createSession) {

        if (hdr != null && txn instanceof CreateSessionTxn) {

            CreateSessionTxn cst = (CreateSessionTxn) txn;

            sessionTracker.addGlobalSession(sessionId, cst.getTimeOut());

        } else if (request != null && request.isLocalSession()) {

            request.request.rewind();

            int timeout = request.request.getInt();

            request.request.rewind();

            sessionTracker.addSession(request.sessionId, timeout);

        } else {

            LOG.warn("*****>>>>> Got " + txn.getClass() + " " + txn.toString());

        }

    } else if (opCode == OpCode.closeSession) {

        sessionTracker.removeSession(sessionId);

    }

    return rc;

}

(2) 请求响应

// 1. 对于写操作(事务请求)根据 processTxn() 的结果来获取响应数据

case OpCode.create: {

    lastOp = "CREA";

    rsp = new CreateResponse(rc.path);

    err = Code.get(rc.err);

    break;

}

// 2. 对于读操作(非事务请求)从内存数据库中获取响应数据

case OpCode.getData: {

    lastOp = "GETD";

    GetDataRequest getDataRequest = new GetDataRequest();

    ByteBufferInputStream.byteBuffer2Record(request.request,

            getDataRequest);

    DataNode n = zks.getZKDatabase().getNode(getDataRequest.getPath());

    if (n == null) {

        throw new KeeperException.NoNodeException();

    }

    Long aclL;

    synchronized(n) {

        aclL = n.acl;

    }

    PrepRequestProcessor.checkACL(zks, zks.getZKDatabase().convertLong(aclL),

            ZooDefs.Perms.READ,

            request.authInfo);

    Stat stat = new Stat();

    // 直接从内存数据库中获取响应数据

    byte b[] = zks.getZKDatabase().getData(getDataRequest.getPath(), stat,

            getDataRequest.getWatch() ? cnxn : null);

    rsp = new GetDataResponse(b, stat);

    break;

}

参考：

《Zookeeper请求处理》：https://www.cnblogs.com/leesf456/p/6140503.html
《【Zookeeper】源码分析之请求处理链（二）之PrepRequestProcessor》：https://www.cnblogs.com/leesf456/p/6412843.html
《【Zookeeper】源码分析之请求处理链（三）之SyncRequestProcessor》：https://www.cnblogs.com/leesf456/p/6438411.html
《【Zookeeper】源码分析之请求处理链（四）之FinalRequestProcessor》：https://www.cnblogs.com/leesf456/p/6472496.html
从 Paxos 到 Zookeeper : 分布式一致性原理与实践

每天用心记录一点点。内容也许不重要，但习惯很重要！