上文:

zookeeper源码分析之一服务端启动过程

中,我们介绍了zookeeper服务器的启动过程,其中单机是ZookeeperServer启动,集群使用QuorumPeer启动,那么这次我们分析各自一下消息处理过程:

前文可以看到在

1.在单机情况下NettyServerCnxnFactory中启动ZookeeperServer来处理消息:

  1. public synchronized void startup() {
  2. if (sessionTracker == null) {
  3. createSessionTracker();
  4. }
  5. startSessionTracker();
  6. setupRequestProcessors();
  7.  
  8. registerJMX();
  9.  
  10. state = State.RUNNING;
  11. notifyAll();
  12. }

消息处理器的调用如下:

  1. protected void setupRequestProcessors() {
  2. RequestProcessor finalProcessor = new FinalRequestProcessor(this);
  3. RequestProcessor syncProcessor = new SyncRequestProcessor(this,
  4. finalProcessor);
  5. ((SyncRequestProcessor)syncProcessor).start();
  6. firstProcessor = new PrepRequestProcessor(this, syncProcessor);
  7. ((PrepRequestProcessor)firstProcessor).start();
  8. }

我们看到启动两个消息处理器来处理请求:第一个同步消息处理器预消息服务器,最后一个同步请求处理器和异步请求处理器。

  1.1  第一个消息服务器处理器预消息服务器PrepRequestProcessor

  

  1. @Override
  2. public void run() {
  3. try {
  4. while (true) {
  5. Request request = submittedRequests.take();
  6. long traceMask = ZooTrace.CLIENT_REQUEST_TRACE_MASK;
  7. if (request.type == OpCode.ping) {
  8. traceMask = ZooTrace.CLIENT_PING_TRACE_MASK;
  9. }
  10. if (LOG.isTraceEnabled()) {
  11. ZooTrace.logRequest(LOG, traceMask, 'P', request, "");
  12. }
  13. if (Request.requestOfDeath == request) {
  14. break;
  15. }
  16. pRequest(request);
  17. }
  18. } catch (RequestProcessorException e) {
  19. if (e.getCause() instanceof XidRolloverException) {
  20. LOG.info(e.getCause().getMessage());
  21. }
  22. handleException(this.getName(), e);
  23. } catch (Exception e) {
  24. handleException(this.getName(), e);
  25. }
  26. LOG.info("PrepRequestProcessor exited loop!");
  27. }

可以看到,while(true)是一个一直循环处理的过程,其中红色的部分为处理的主体。

  1. /**
  2. * This method will be called inside the ProcessRequestThread, which is a
  3. * singleton, so there will be a single thread calling this code.
  4. *
  5. * @param request
  6. */
  7. protected void pRequest(Request request) throws RequestProcessorException {
  8. // LOG.info("Prep>>> cxid = " + request.cxid + " type = " +
  9. // request.type + " id = 0x" + Long.toHexString(request.sessionId));
  10. request.setHdr(null);
  11. request.setTxn(null);
  12.  
  13. try {
  14. switch (request.type) {
  15. case OpCode.createContainer:
  16. case OpCode.create:
  17. case OpCode.create2:
  18. CreateRequest create2Request = new CreateRequest();
  19. pRequest2Txn(request.type, zks.getNextZxid(), request, create2Request, true);
  20. break;
  21. case OpCode.deleteContainer:
  22. case OpCode.delete:
  23. DeleteRequest deleteRequest = new DeleteRequest();
  24. pRequest2Txn(request.type, zks.getNextZxid(), request, deleteRequest, true);
  25. break;
  26. case OpCode.setData:
  27. SetDataRequest setDataRequest = new SetDataRequest();
  28. pRequest2Txn(request.type, zks.getNextZxid(), request, setDataRequest, true);
  29. break;
  30. case OpCode.reconfig:
  31. ReconfigRequest reconfigRequest = new ReconfigRequest();
  32. ByteBufferInputStream.byteBuffer2Record(request.request, reconfigRequest);
  33. pRequest2Txn(request.type, zks.getNextZxid(), request, reconfigRequest, true);
  34. break;
  35. case OpCode.setACL:
  36. SetACLRequest setAclRequest = new SetACLRequest();
  37. pRequest2Txn(request.type, zks.getNextZxid(), request, setAclRequest, true);
  38. break;
  39. case OpCode.check:
  40. CheckVersionRequest checkRequest = new CheckVersionRequest();
  41. pRequest2Txn(request.type, zks.getNextZxid(), request, checkRequest, true);
  42. break;
  43. case OpCode.multi:
  44. MultiTransactionRecord multiRequest = new MultiTransactionRecord();
  45. try {
  46. ByteBufferInputStream.byteBuffer2Record(request.request, multiRequest);
  47. } catch(IOException e) {
  48. request.setHdr(new TxnHeader(request.sessionId, request.cxid, zks.getNextZxid(),
  49. Time.currentWallTime(), OpCode.multi));
  50. throw e;
  51. }
  52. List<Txn> txns = new ArrayList<Txn>();
  53. //Each op in a multi-op must have the same zxid!
  54. long zxid = zks.getNextZxid();
  55. KeeperException ke = null;
  56.  
  57. //Store off current pending change records in case we need to rollback
  58. Map<String, ChangeRecord> pendingChanges = getPendingChanges(multiRequest);
  59.  
  60. for(Op op: multiRequest) {
  61. Record subrequest = op.toRequestRecord();
  62. int type;
  63. Record txn;
  64.  
  65. /* If we've already failed one of the ops, don't bother
  66. * trying the rest as we know it's going to fail and it
  67. * would be confusing in the logfiles.
  68. */
  69. if (ke != null) {
  70. type = OpCode.error;
  71. txn = new ErrorTxn(Code.RUNTIMEINCONSISTENCY.intValue());
  72. }
  73.  
  74. /* Prep the request and convert to a Txn */
  75. else {
  76. try {
  77. pRequest2Txn(op.getType(), zxid, request, subrequest, false);
  78. type = request.getHdr().getType();
  79. txn = request.getTxn();
  80. } catch (KeeperException e) {
  81. ke = e;
  82. type = OpCode.error;
  83. txn = new ErrorTxn(e.code().intValue());
  84.  
  85. LOG.info("Got user-level KeeperException when processing "
  86. + request.toString() + " aborting remaining multi ops."
  87. + " Error Path:" + e.getPath()
  88. + " Error:" + e.getMessage());
  89.  
  90. request.setException(e);
  91.  
  92. /* Rollback change records from failed multi-op */
  93. rollbackPendingChanges(zxid, pendingChanges);
  94. }
  95. }
  96.  
  97. //FIXME: I don't want to have to serialize it here and then
  98. // immediately deserialize in next processor. But I'm
  99. // not sure how else to get the txn stored into our list.
  100. ByteArrayOutputStream baos = new ByteArrayOutputStream();
  101. BinaryOutputArchive boa = BinaryOutputArchive.getArchive(baos);
  102. txn.serialize(boa, "request") ;
  103. ByteBuffer bb = ByteBuffer.wrap(baos.toByteArray());
  104.  
  105. txns.add(new Txn(type, bb.array()));
  106. }
  107.  
  108. request.setHdr(new TxnHeader(request.sessionId, request.cxid, zxid,
  109. Time.currentWallTime(), request.type));
  110. request.setTxn(new MultiTxn(txns));
  111.  
  112. break;
  113.  
  114. //create/close session don't require request record
  115. case OpCode.createSession:
  116. case OpCode.closeSession:
  117. if (!request.isLocalSession()) {
  118. pRequest2Txn(request.type, zks.getNextZxid(), request,
  119. null, true);
  120. }
  121. break;
  122.  
  123. //All the rest don't need to create a Txn - just verify session
  124. case OpCode.sync:
  125. case OpCode.exists:
  126. case OpCode.getData:
  127. case OpCode.getACL:
  128. case OpCode.getChildren:
  129. case OpCode.getChildren2:
  130. case OpCode.ping:
  131. case OpCode.setWatches:
  132. case OpCode.checkWatches:
  133. case OpCode.removeWatches:
  134. zks.sessionTracker.checkSession(request.sessionId,
  135. request.getOwner());
  136. break;
  137. default:
  138. LOG.warn("unknown type " + request.type);
  139. break;
  140. }
  141. } catch (KeeperException e) {
  142. if (request.getHdr() != null) {
  143. request.getHdr().setType(OpCode.error);
  144. request.setTxn(new ErrorTxn(e.code().intValue()));
  145. }
  146. LOG.info("Got user-level KeeperException when processing "
  147. + request.toString()
  148. + " Error Path:" + e.getPath()
  149. + " Error:" + e.getMessage());
  150. request.setException(e);
  151. } catch (Exception e) {
  152. // log at error level as we are returning a marshalling
  153. // error to the user
  154. LOG.error("Failed to process " + request, e);
  155.  
  156. StringBuilder sb = new StringBuilder();
  157. ByteBuffer bb = request.request;
  158. if(bb != null){
  159. bb.rewind();
  160. while (bb.hasRemaining()) {
  161. sb.append(Integer.toHexString(bb.get() & 0xff));
  162. }
  163. } else {
  164. sb.append("request buffer is null");
  165. }
  166.  
  167. LOG.error("Dumping request buffer: 0x" + sb.toString());
  168. if (request.getHdr() != null) {
  169. request.getHdr().setType(OpCode.error);
  170. request.setTxn(new ErrorTxn(Code.MARSHALLINGERROR.intValue()));
  171. }
  172. }
  173. request.zxid = zks.getZxid();
  174. nextProcessor.processRequest(request);
  175. }

排除异常的逻辑,该方法是处理不同类型的request,根据type选择一个处理分支,ProcessRequestThread内部调用该方法,它是单例的,因此只有一个单线程调用此代码。以create请求为例(红色部分),了解工作机制:

  1. CreateRequest createRequest = (CreateRequest)record;
  2. if (deserialize) {
  3. ByteBufferInputStream.byteBuffer2Record(request.request, createRequest);
  4. }
  5. CreateMode createMode = CreateMode.fromFlag(createRequest.getFlags());
  6. validateCreateRequest(createMode, request);
  7. String path = createRequest.getPath();
  8. String parentPath = validatePathForCreate(path, request.sessionId);
  9.  
  10. List<ACL> listACL = fixupACL(path, request.authInfo, createRequest.getAcl());
  11. ChangeRecord parentRecord = getRecordForPath(parentPath);
  12.  
  13. checkACL(zks, parentRecord.acl, ZooDefs.Perms.CREATE, request.authInfo);
  14. int parentCVersion = parentRecord.stat.getCversion();
  15. if (createMode.isSequential()) {
  16. path = path + String.format(Locale.ENGLISH, "%010d", parentCVersion);
  17. }
  18. validatePath(path, request.sessionId);
  19. try {
  20. if (getRecordForPath(path) != null) {
  21. throw new KeeperException.NodeExistsException(path);
  22. }
  23. } catch (KeeperException.NoNodeException e) {
  24. // ignore this one
  25. }
  26. boolean ephemeralParent = (parentRecord.stat.getEphemeralOwner() != 0) &&
  27. (parentRecord.stat.getEphemeralOwner() != DataTree.CONTAINER_EPHEMERAL_OWNER);
  28. if (ephemeralParent) {
  29. throw new KeeperException.NoChildrenForEphemeralsException(path);
  30. }
  31. int newCversion = parentRecord.stat.getCversion()+1;
  32. if (type == OpCode.createContainer) {
  33. request.setTxn(new CreateContainerTxn(path, createRequest.getData(), listACL, newCversion));
  34. } else {
  35. request.setTxn(new CreateTxn(path, createRequest.getData(), listACL, createMode.isEphemeral(),
  36. newCversion));
  37. }
  38. StatPersisted s = new StatPersisted();
  39. if (createMode.isEphemeral()) {
  40. s.setEphemeralOwner(request.sessionId);
  41. }
  42. parentRecord = parentRecord.duplicate(request.getHdr().getZxid());
  43. parentRecord.childCount++;
  44. parentRecord.stat.setCversion(newCversion);
  45. addChangeRecord(parentRecord);
  46. addChangeRecord(new ChangeRecord(request.getHdr().getZxid(), path, s, 0, listACL));
  47. break;

调用方法,处理变化:

  1. private void addChangeRecord(ChangeRecord c) {
  2. synchronized (zks.outstandingChanges) {
  3. zks.outstandingChanges.add(c);
  4. zks.outstandingChangesForPath.put(c.path, c);
  5. }
  6. }

继续向下

  1. private void addChangeRecord(ChangeRecord c) {
  2. synchronized (zks.outstandingChanges) {
  3. zks.outstandingChanges.add(c);
  4. zks.outstandingChangesForPath.put(c.path, c);
  5. }
  6. }

其中:outstandingChanges 是一组ChangeRecord,outstandingChangesForPath是map的ChangeRecord,如下定义:

final List<ChangeRecord> outstandingChanges = new ArrayList<ChangeRecord>();
// this data structure must be accessed under the outstandingChanges lock
final HashMap<String, ChangeRecord> outstandingChangesForPath =
new HashMap<String, ChangeRecord>();

ChangeRecord是一个数据结构,方便PrepRP和FinalRp共享信息。

  1. ChangeRecord(long zxid, String path, StatPersisted stat, int childCount,
  2. List<ACL> acl) {
  3. this.zxid = zxid;
  4. this.path = path;
  5. this.stat = stat;
  6. this.childCount = childCount;
  7. this.acl = acl;
  8. }

  1.2 先看一下同步请求处理器FinalRequestProcessor,这个请求处理器实际上应用到一个请求的所有事务,针对任何查询提供服务。它通常处于请求处理的最后(不会有下一个消息处理器),故此得名。 它是如何处理请求呢?

  1. public void processRequest(Request request) {
  2. if (LOG.isDebugEnabled()) {
  3. LOG.debug("Processing request:: " + request);
  4. }
  5. // request.addRQRec(">final");
  6. long traceMask = ZooTrace.CLIENT_REQUEST_TRACE_MASK;
  7. if (request.type == OpCode.ping) {
  8. traceMask = ZooTrace.SERVER_PING_TRACE_MASK;
  9. }
  10. if (LOG.isTraceEnabled()) {
  11. ZooTrace.logRequest(LOG, traceMask, 'E', request, "");
  12. }
  13. ProcessTxnResult rc = null;
  14. synchronized (zks.outstandingChanges) {
  15. // Need to process local session requests
  16. rc = zks.processTxn(request);
  17.  
  18. // request.hdr is set for write requests, which are the only ones
  19. // that add to outstandingChanges.
  20. if (request.getHdr() != null) {
  21. TxnHeader hdr = request.getHdr();
  22. Record txn = request.getTxn();
  23. long zxid = hdr.getZxid();
  24. while (!zks.outstandingChanges.isEmpty()
  25. && zks.outstandingChanges.get(0).zxid <= zxid) {
  26. ChangeRecord cr = zks.outstandingChanges.remove(0);
  27. if (cr.zxid < zxid) {
  28. LOG.warn("Zxid outstanding " + cr.zxid
  29. + " is less than current " + zxid);
  30. }
  31. if (zks.outstandingChangesForPath.get(cr.path) == cr) {
  32. zks.outstandingChangesForPath.remove(cr.path);
  33. }
  34. }
  35. }
  36.  
  37. // do not add non quorum packets to the queue.
  38. if (request.isQuorum()) {
  39. zks.getZKDatabase().addCommittedProposal(request);
  40. }
  41. }
  42.  
  43. // ZOOKEEPER-558:
  44. // In some cases the server does not close the connection (e.g., closeconn buffer
  45. // was not being queued — ZOOKEEPER-558) properly. This happens, for example,
  46. // when the client closes the connection. The server should still close the session, though.
  47. // Calling closeSession() after losing the cnxn, results in the client close session response being dropped.
  48. if (request.type == OpCode.closeSession && connClosedByClient(request)) {
  49. // We need to check if we can close the session id.
  50. // Sometimes the corresponding ServerCnxnFactory could be null because
  51. // we are just playing diffs from the leader.
  52. if (closeSession(zks.serverCnxnFactory, request.sessionId) ||
  53. closeSession(zks.secureServerCnxnFactory, request.sessionId)) {
  54. return;
  55. }
  56. }
  57.  
  58. if (request.cnxn == null) {
  59. return;
  60. }
  61. ServerCnxn cnxn = request.cnxn;
  62.  
  63. String lastOp = "NA";
  64. zks.decInProcess();
  65. Code err = Code.OK;
  66. Record rsp = null;
  67. try {
  68. if (request.getHdr() != null && request.getHdr().getType() == OpCode.error) {
  69. /*
  70. * When local session upgrading is disabled, leader will
  71. * reject the ephemeral node creation due to session expire.
  72. * However, if this is the follower that issue the request,
  73. * it will have the correct error code, so we should use that
  74. * and report to user
  75. */
  76. if (request.getException() != null) {
  77. throw request.getException();
  78. } else {
  79. throw KeeperException.create(KeeperException.Code
  80. .get(((ErrorTxn) request.getTxn()).getErr()));
  81. }
  82. }
  83.  
  84. KeeperException ke = request.getException();
  85. if (ke != null && request.type != OpCode.multi) {
  86. throw ke;
  87. }
  88.  
  89. if (LOG.isDebugEnabled()) {
  90. LOG.debug("{}",request);
  91. }
  92. switch (request.type) {
  93. case OpCode.ping: {
  94. zks.serverStats().updateLatency(request.createTime);
  95.  
  96. lastOp = "PING";
  97. cnxn.updateStatsForResponse(request.cxid, request.zxid, lastOp,
  98. request.createTime, Time.currentElapsedTime());
  99.  
  100. cnxn.sendResponse(new ReplyHeader(-2,
  101. zks.getZKDatabase().getDataTreeLastProcessedZxid(), 0), null, "response");
  102. return;
  103. }
  104. case OpCode.createSession: {
  105. zks.serverStats().updateLatency(request.createTime);
  106.  
  107. lastOp = "SESS";
  108. cnxn.updateStatsForResponse(request.cxid, request.zxid, lastOp,
  109. request.createTime, Time.currentElapsedTime());
  110.  
  111. zks.finishSessionInit(request.cnxn, true);
  112. return;
  113. }
  114. case OpCode.multi: {
  115. lastOp = "MULT";
  116. rsp = new MultiResponse() ;
  117.  
  118. for (ProcessTxnResult subTxnResult : rc.multiResult) {
  119.  
  120. OpResult subResult ;
  121.  
  122. switch (subTxnResult.type) {
  123. case OpCode.check:
  124. subResult = new CheckResult();
  125. break;
  126. case OpCode.create:
  127. subResult = new CreateResult(subTxnResult.path);
  128. break;
  129. case OpCode.create2:
  130. case OpCode.createContainer:
  131. subResult = new CreateResult(subTxnResult.path, subTxnResult.stat);
  132. break;
  133. case OpCode.delete:
  134. case OpCode.deleteContainer:
  135. subResult = new DeleteResult();
  136. break;
  137. case OpCode.setData:
  138. subResult = new SetDataResult(subTxnResult.stat);
  139. break;
  140. case OpCode.error:
  141. subResult = new ErrorResult(subTxnResult.err) ;
  142. break;
  143. default:
  144. throw new IOException("Invalid type of op");
  145. }
  146.  
  147. ((MultiResponse)rsp).add(subResult);
  148. }
  149.  
  150. break;
  151. }
  152. case OpCode.create: {
  153. lastOp = "CREA";
  154. rsp = new CreateResponse(rc.path);
  155. err = Code.get(rc.err);
  156. break;
  157. }
  158. case OpCode.create2:
  159. case OpCode.createContainer: {
  160. lastOp = "CREA";
  161. rsp = new Create2Response(rc.path, rc.stat);
  162. err = Code.get(rc.err);
  163. break;
  164. }
  165. case OpCode.delete:
  166. case OpCode.deleteContainer: {
  167. lastOp = "DELE";
  168. err = Code.get(rc.err);
  169. break;
  170. }
  171. case OpCode.setData: {
  172. lastOp = "SETD";
  173. rsp = new SetDataResponse(rc.stat);
  174. err = Code.get(rc.err);
  175. break;
  176. }
  177. case OpCode.reconfig: {
  178. lastOp = "RECO";
  179. rsp = new GetDataResponse(((QuorumZooKeeperServer)zks).self.getQuorumVerifier().toString().getBytes(), rc.stat);
  180. err = Code.get(rc.err);
  181. break;
  182. }
  183. case OpCode.setACL: {
  184. lastOp = "SETA";
  185. rsp = new SetACLResponse(rc.stat);
  186. err = Code.get(rc.err);
  187. break;
  188. }
  189. case OpCode.closeSession: {
  190. lastOp = "CLOS";
  191. err = Code.get(rc.err);
  192. break;
  193. }
  194. case OpCode.sync: {
  195. lastOp = "SYNC";
  196. SyncRequest syncRequest = new SyncRequest();
  197. ByteBufferInputStream.byteBuffer2Record(request.request,
  198. syncRequest);
  199. rsp = new SyncResponse(syncRequest.getPath());
  200. break;
  201. }
  202. case OpCode.check: {
  203. lastOp = "CHEC";
  204. rsp = new SetDataResponse(rc.stat);
  205. err = Code.get(rc.err);
  206. break;
  207. }
  208. case OpCode.exists: {
  209. lastOp = "EXIS";
  210. // TODO we need to figure out the security requirement for this!
  211. ExistsRequest existsRequest = new ExistsRequest();
  212. ByteBufferInputStream.byteBuffer2Record(request.request,
  213. existsRequest);
  214. String path = existsRequest.getPath();
  215. if (path.indexOf('\0') != -1) {
  216. throw new KeeperException.BadArgumentsException();
  217. }
  218. Stat stat = zks.getZKDatabase().statNode(path, existsRequest
  219. .getWatch() ? cnxn : null);
  220. rsp = new ExistsResponse(stat);
  221. break;
  222. }
  223. case OpCode.getData: {
  224. lastOp = "GETD";
  225. GetDataRequest getDataRequest = new GetDataRequest();
  226. ByteBufferInputStream.byteBuffer2Record(request.request,
  227. getDataRequest);
  228. DataNode n = zks.getZKDatabase().getNode(getDataRequest.getPath());
  229. if (n == null) {
  230. throw new KeeperException.NoNodeException();
  231. }
  232. Long aclL;
  233. synchronized(n) {
  234. aclL = n.acl;
  235. }
  236. PrepRequestProcessor.checkACL(zks, zks.getZKDatabase().convertLong(aclL),
  237. ZooDefs.Perms.READ,
  238. request.authInfo);
  239. Stat stat = new Stat();
  240. byte b[] = zks.getZKDatabase().getData(getDataRequest.getPath(), stat,
  241. getDataRequest.getWatch() ? cnxn : null);
  242. rsp = new GetDataResponse(b, stat);
  243. break;
  244. }
  245. case OpCode.setWatches: {
  246. lastOp = "SETW";
  247. SetWatches setWatches = new SetWatches();
  248. // XXX We really should NOT need this!!!!
  249. request.request.rewind();
  250. ByteBufferInputStream.byteBuffer2Record(request.request, setWatches);
  251. long relativeZxid = setWatches.getRelativeZxid();
  252. zks.getZKDatabase().setWatches(relativeZxid,
  253. setWatches.getDataWatches(),
  254. setWatches.getExistWatches(),
  255. setWatches.getChildWatches(), cnxn);
  256. break;
  257. }
  258. case OpCode.getACL: {
  259. lastOp = "GETA";
  260. GetACLRequest getACLRequest = new GetACLRequest();
  261. ByteBufferInputStream.byteBuffer2Record(request.request,
  262. getACLRequest);
  263. Stat stat = new Stat();
  264. List<ACL> acl =
  265. zks.getZKDatabase().getACL(getACLRequest.getPath(), stat);
  266. rsp = new GetACLResponse(acl, stat);
  267. break;
  268. }
  269. case OpCode.getChildren: {
  270. lastOp = "GETC";
  271. GetChildrenRequest getChildrenRequest = new GetChildrenRequest();
  272. ByteBufferInputStream.byteBuffer2Record(request.request,
  273. getChildrenRequest);
  274. DataNode n = zks.getZKDatabase().getNode(getChildrenRequest.getPath());
  275. if (n == null) {
  276. throw new KeeperException.NoNodeException();
  277. }
  278. Long aclG;
  279. synchronized(n) {
  280. aclG = n.acl;
  281.  
  282. }
  283. PrepRequestProcessor.checkACL(zks, zks.getZKDatabase().convertLong(aclG),
  284. ZooDefs.Perms.READ,
  285. request.authInfo);
  286. List<String> children = zks.getZKDatabase().getChildren(
  287. getChildrenRequest.getPath(), null, getChildrenRequest
  288. .getWatch() ? cnxn : null);
  289. rsp = new GetChildrenResponse(children);
  290. break;
  291. }
  292. case OpCode.getChildren2: {
  293. lastOp = "GETC";
  294. GetChildren2Request getChildren2Request = new GetChildren2Request();
  295. ByteBufferInputStream.byteBuffer2Record(request.request,
  296. getChildren2Request);
  297. Stat stat = new Stat();
  298. DataNode n = zks.getZKDatabase().getNode(getChildren2Request.getPath());
  299. if (n == null) {
  300. throw new KeeperException.NoNodeException();
  301. }
  302. Long aclG;
  303. synchronized(n) {
  304. aclG = n.acl;
  305. }
  306. PrepRequestProcessor.checkACL(zks, zks.getZKDatabase().convertLong(aclG),
  307. ZooDefs.Perms.READ,
  308. request.authInfo);
  309. List<String> children = zks.getZKDatabase().getChildren(
  310. getChildren2Request.getPath(), stat, getChildren2Request
  311. .getWatch() ? cnxn : null);
  312. rsp = new GetChildren2Response(children, stat);
  313. break;
  314. }
  315. case OpCode.checkWatches: {
  316. lastOp = "CHKW";
  317. CheckWatchesRequest checkWatches = new CheckWatchesRequest();
  318. ByteBufferInputStream.byteBuffer2Record(request.request,
  319. checkWatches);
  320. WatcherType type = WatcherType.fromInt(checkWatches.getType());
  321. boolean containsWatcher = zks.getZKDatabase().containsWatcher(
  322. checkWatches.getPath(), type, cnxn);
  323. if (!containsWatcher) {
  324. String msg = String.format(Locale.ENGLISH, "%s (type: %s)",
  325. new Object[] { checkWatches.getPath(), type });
  326. throw new KeeperException.NoWatcherException(msg);
  327. }
  328. break;
  329. }
  330. case OpCode.removeWatches: {
  331. lastOp = "REMW";
  332. RemoveWatchesRequest removeWatches = new RemoveWatchesRequest();
  333. ByteBufferInputStream.byteBuffer2Record(request.request,
  334. removeWatches);
  335. WatcherType type = WatcherType.fromInt(removeWatches.getType());
  336. boolean removed = zks.getZKDatabase().removeWatch(
  337. removeWatches.getPath(), type, cnxn);
  338. if (!removed) {
  339. String msg = String.format(Locale.ENGLISH, "%s (type: %s)",
  340. new Object[] { removeWatches.getPath(), type });
  341. throw new KeeperException.NoWatcherException(msg);
  342. }
  343. break;
  344. }
  345. }
  346. } catch (SessionMovedException e) {
  347. // session moved is a connection level error, we need to tear
  348. // down the connection otw ZOOKEEPER-710 might happen
  349. // ie client on slow follower starts to renew session, fails
  350. // before this completes, then tries the fast follower (leader)
  351. // and is successful, however the initial renew is then
  352. // successfully fwd/processed by the leader and as a result
  353. // the client and leader disagree on where the client is most
  354. // recently attached (and therefore invalid SESSION MOVED generated)
  355. cnxn.sendCloseSession();
  356. return;
  357. } catch (KeeperException e) {
  358. err = e.code();
  359. } catch (Exception e) {
  360. // log at error level as we are returning a marshalling
  361. // error to the user
  362. LOG.error("Failed to process " + request, e);
  363. StringBuilder sb = new StringBuilder();
  364. ByteBuffer bb = request.request;
  365. bb.rewind();
  366. while (bb.hasRemaining()) {
  367. sb.append(Integer.toHexString(bb.get() & 0xff));
  368. }
  369. LOG.error("Dumping request buffer: 0x" + sb.toString());
  370. err = Code.MARSHALLINGERROR;
  371. }
  372.  
  373. long lastZxid = zks.getZKDatabase().getDataTreeLastProcessedZxid();
  374. ReplyHeader hdr =
  375. new ReplyHeader(request.cxid, lastZxid, err.intValue());
  376.  
  377. zks.serverStats().updateLatency(request.createTime);
  378. cnxn.updateStatsForResponse(request.cxid, lastZxid, lastOp,
  379. request.createTime, Time.currentElapsedTime());
  380.  
  381. try {
  382. cnxn.sendResponse(hdr, rsp, "response");
  383. if (request.type == OpCode.closeSession) {
  384. cnxn.sendCloseSession();
  385. }
  386. } catch (IOException e) {
  387. LOG.error("FIXMSG",e);
  388. }
  389. }

  第一步,根据共享的outstandingChanges,

先处理事务后处理session:

  1. private ProcessTxnResult processTxn(Request request, TxnHeader hdr,
  2. Record txn) {
  3. ProcessTxnResult rc;
  4. int opCode = request != null ? request.type : hdr.getType();
  5. long sessionId = request != null ? request.sessionId : hdr.getClientId();
  6. if (hdr != null) {
  7. rc = getZKDatabase().processTxn(hdr, txn);
  8. } else {
  9. rc = new ProcessTxnResult();
  10. }
  11. if (opCode == OpCode.createSession) {
  12. if (hdr != null && txn instanceof CreateSessionTxn) {
  13. CreateSessionTxn cst = (CreateSessionTxn) txn;
  14. sessionTracker.addGlobalSession(sessionId, cst.getTimeOut());
  15. } else if (request != null && request.isLocalSession()) {
  16. request.request.rewind();
  17. int timeout = request.request.getInt();
  18. request.request.rewind();
  19. sessionTracker.addSession(request.sessionId, timeout);
  20. } else {
  21. LOG.warn("*****>>>>> Got "
  22. + txn.getClass() + " "
  23. + txn.toString());
  24. }
  25. } else if (opCode == OpCode.closeSession) {
  26. sessionTracker.removeSession(sessionId);
  27. }
  28. return rc;
  29. }

处理事务,本地和数据库的不同分支, DataTree创建节点

  1. CreateTxn createTxn = (CreateTxn) txn;
  2. rc.path = createTxn.getPath();
  3. createNode(
  4. createTxn.getPath(),
  5. createTxn.getData(),
  6. createTxn.getAcl(),
  7. createTxn.getEphemeral() ? header.getClientId() : 0,
  8. createTxn.getParentCVersion(),
  9. header.getZxid(), header.getTime(), null);
  10. break;

新增一个节点的逻辑是:

  1. /**
  2. * Add a new node to the DataTree.
  3. * @param path
  4. * Path for the new node.
  5. * @param data
  6. * Data to store in the node.
  7. * @param acl
  8. * Node acls
  9. * @param ephemeralOwner
  10. * the session id that owns this node. -1 indicates this is not
  11. * an ephemeral node.
  12. * @param zxid
  13. * Transaction ID
  14. * @param time
  15. * @param outputStat
  16. * A Stat object to store Stat output results into.
  17. * @throws NodeExistsException
  18. * @throws NoNodeException
  19. * @throws KeeperException
  20. */
  21. public void createNode(final String path, byte data[], List<ACL> acl,
  22. long ephemeralOwner, int parentCVersion, long zxid, long time, Stat outputStat)
  23. throws KeeperException.NoNodeException,
  24. KeeperException.NodeExistsException {
  25. int lastSlash = path.lastIndexOf('/');
  26. String parentName = path.substring(0, lastSlash);
  27. String childName = path.substring(lastSlash + 1);
  28. StatPersisted stat = new StatPersisted();
  29. stat.setCtime(time);
  30. stat.setMtime(time);
  31. stat.setCzxid(zxid);
  32. stat.setMzxid(zxid);
  33. stat.setPzxid(zxid);
  34. stat.setVersion(0);
  35. stat.setAversion(0);
  36. stat.setEphemeralOwner(ephemeralOwner);
  37. DataNode parent = nodes.get(parentName);
  38. if (parent == null) {
  39. throw new KeeperException.NoNodeException();
  40. }
  41. synchronized (parent) {
  42. Set<String> children = parent.getChildren();
  43. if (children != null && children.contains(childName)) {
  44. throw new KeeperException.NodeExistsException();
  45. }
  46.  
  47. if (parentCVersion == -1) {
  48. parentCVersion = parent.stat.getCversion();
  49. parentCVersion++;
  50. }
  51. parent.stat.setCversion(parentCVersion);
  52. parent.stat.setPzxid(zxid);
  53. Long longval = convertAcls(acl);
  54. DataNode child = new DataNode(data, longval, stat);
  55. parent.addChild(childName);
  56. nodes.put(path, child);
  57. if (ephemeralOwner == CONTAINER_EPHEMERAL_OWNER) {
  58. containers.add(path);
  59. } else if (ephemeralOwner != 0) {
  60. HashSet<String> list = ephemerals.get(ephemeralOwner);
  61. if (list == null) {
  62. list = new HashSet<String>();
  63. ephemerals.put(ephemeralOwner, list);
  64. }
  65. synchronized (list) {
  66. list.add(path);
  67. }
  68. }
  69. if (outputStat != null) {
  70. child.copyStat(outputStat);
  71. }
  72. }
  73. // now check if its one of the zookeeper node child
  74. if (parentName.startsWith(quotaZookeeper)) {
  75. // now check if its the limit node
  76. if (Quotas.limitNode.equals(childName)) {
  77. // this is the limit node
  78. // get the parent and add it to the trie
  79. pTrie.addPath(parentName.substring(quotaZookeeper.length()));
  80. }
  81. if (Quotas.statNode.equals(childName)) {
  82. updateQuotaForPath(parentName
  83. .substring(quotaZookeeper.length()));
  84. }
  85. }
  86. // also check to update the quotas for this node
  87. String lastPrefix = getMaxPrefixWithQuota(path);
  88. if(lastPrefix != null) {
  89. // ok we have some match and need to update
  90. updateCount(lastPrefix, 1);
  91. updateBytes(lastPrefix, data == null ? 0 : data.length);
  92. }
  93. dataWatches.triggerWatch(path, Event.EventType.NodeCreated);
  94. childWatches.triggerWatch(parentName.equals("") ? "/" : parentName,
  95. Event.EventType.NodeChildrenChanged);
  96. }

最后的逻辑是触发创建节点和子节点改变事件。

  1. Set<Watcher> triggerWatch(String path, EventType type, Set<Watcher> supress) {
  2. WatchedEvent e = new WatchedEvent(type,
  3. KeeperState.SyncConnected, path);
  4. HashSet<Watcher> watchers;
  5. synchronized (this) {
  6. watchers = watchTable.remove(path);
  7. if (watchers == null || watchers.isEmpty()) {
  8. if (LOG.isTraceEnabled()) {
  9. ZooTrace.logTraceMessage(LOG,
  10. ZooTrace.EVENT_DELIVERY_TRACE_MASK,
  11. "No watchers for " + path);
  12. }
  13. return null;
  14. }
  15. for (Watcher w : watchers) {
  16. HashSet<String> paths = watch2Paths.get(w);
  17. if (paths != null) {
  18. paths.remove(path);
  19. }
  20. }
  21. }
  22. for (Watcher w : watchers) {
  23. if (supress != null && supress.contains(w)) {
  24. continue;
  25. }
  26. w.process(e);
  27. }
  28. return watchers;
  29. }

WatcherManager调用定义的watcher进行事件处理。

  1.3. 再看异步消息处理器SyncRequestProcessor

  1. @Override
  2. public void run() {
  3. try {
  4. int logCount = 0;
  5.  
  6. // we do this in an attempt to ensure that not all of the servers
  7. // in the ensemble take a snapshot at the same time
  8. int randRoll = r.nextInt(snapCount/2);
  9. while (true) {
  10. Request si = null;
  11. if (toFlush.isEmpty()) {
  12. si = queuedRequests.take();
  13. } else {
  14. si = queuedRequests.poll();
  15. if (si == null) {
  16. flush(toFlush);
  17. continue;
  18. }
  19. }
  20. if (si == requestOfDeath) {
  21. break;
  22. }
  23. if (si != null) {
  24. // track the number of records written to the log
  25. if (zks.getZKDatabase().append(si)) {
  26. logCount++;
  27. if (logCount > (snapCount / 2 + randRoll)) {
  28. randRoll = r.nextInt(snapCount/2);
  29. // roll the log
  30. zks.getZKDatabase().rollLog();
  31. // take a snapshot
  32. if (snapInProcess != null && snapInProcess.isAlive()) {
  33. LOG.warn("Too busy to snap, skipping");
  34. } else {
  35. snapInProcess = new ZooKeeperThread("Snapshot Thread") {
  36. public void run() {
  37. try {
  38. zks.takeSnapshot();
  39. } catch(Exception e) {
  40. LOG.warn("Unexpected exception", e);
  41. }
  42. }
  43. };
  44. snapInProcess.start();
  45. }
  46. logCount = 0;
  47. }
  48. } else if (toFlush.isEmpty()) {
  49. // optimization for read heavy workloads
  50. // iff this is a read, and there are no pending
  51. // flushes (writes), then just pass this to the next
  52. // processor
  53. if (nextProcessor != null) {
  54. nextProcessor.processRequest(si);
  55. if (nextProcessor instanceof Flushable) {
  56. ((Flushable)nextProcessor).flush();
  57. }
  58. }
  59. continue;
  60. }
  61. toFlush.add(si);
  62. if (toFlush.size() > 1000) {
  63. flush(toFlush);
  64. }
  65. }
  66. }
  67. } catch (Throwable t) {
  68. handleException(this.getName(), t);
  69. } finally{
  70. running = false;
  71. }
  72. LOG.info("SyncRequestProcessor exited!");
  73. }

  异步处理日志和快照,启动ZooKeeperThread线程来生成快照。

  1. public void takeSnapshot(){
  2. try {
  3. txnLogFactory.save(zkDb.getDataTree(), zkDb.getSessionWithTimeOuts());
  4. } catch (IOException e) {
  5. LOG.error("Severe unrecoverable error, exiting", e);
  6. // This is a severe error that we cannot recover from,
  7. // so we need to exit
  8. System.exit(10);
  9. }
  10. }

FileTxnSnapLog是个工具类,帮助处理txtlog和snapshot。

  1. /**
  2. * save the datatree and the sessions into a snapshot
  3. * @param dataTree the datatree to be serialized onto disk
  4. * @param sessionsWithTimeouts the sesssion timeouts to be
  5. * serialized onto disk
  6. * @throws IOException
  7. */
  8. public void save(DataTree dataTree,
  9. ConcurrentHashMap<Long, Integer> sessionsWithTimeouts)
  10. throws IOException {
  11. long lastZxid = dataTree.lastProcessedZxid;
  12. File snapshotFile = new File(snapDir, Util.makeSnapshotName(lastZxid));
  13. LOG.info("Snapshotting: 0x{} to {}", Long.toHexString(lastZxid),
  14. snapshotFile);
  15. snapLog.serialize(dataTree, sessionsWithTimeouts, snapshotFile);
  16.  
  17. }

持久化为文件

  1. /**
  2. * serialize the datatree and session into the file snapshot
  3. * @param dt the datatree to be serialized
  4. * @param sessions the sessions to be serialized
  5. * @param snapShot the file to store snapshot into
  6. */
  7. public synchronized void serialize(DataTree dt, Map<Long, Integer> sessions, File snapShot)
  8. throws IOException {
  9. if (!close) {
  10. OutputStream sessOS = new BufferedOutputStream(new FileOutputStream(snapShot));
  11. CheckedOutputStream crcOut = new CheckedOutputStream(sessOS, new Adler32());
  12. //CheckedOutputStream cout = new CheckedOutputStream()
  13. OutputArchive oa = BinaryOutputArchive.getArchive(crcOut);
  14. FileHeader header = new FileHeader(SNAP_MAGIC, VERSION, dbId);
  15. serialize(dt,sessions,oa, header);
  16. long val = crcOut.getChecksum().getValue();
  17. oa.writeLong(val, "val");
  18. oa.writeString("/", "path");
  19. sessOS.flush();
  20. crcOut.close();
  21. sessOS.close();
  22. }
  23. }

至此,整个流程已经走完。

  2. 集群情况下

 集群情况和单机略有不同,集群中使用QuorumPeer来启动ServerCnxnFactory,绑定本地地址

  1. @Override
  2. public void start() {
  3. LOG.info("binding to port " + localAddress);
  4. parentChannel = bootstrap.bind(localAddress);
  5. }

限于篇幅,后面的逻辑将在下篇中详细描述。

 

小结

  从上面的代码流程中,我们可以看出服务器处理请求要么通过Noi要不通过框架Netty来处理请求,请求通过先通过PrepRequestProcessor接收请求,并进行包装,然后请求类型的不同,设置同享数据;然后通过SyncRequestProcessor来序列化快照和事务日志,并根据命令类型改变db的内容,在日志和快照没有写入前不会进行下一个消息处理器;最后调用FinalRequestProcessor来作为消息处理器的终结者,发送响应消息,并触发watcher的处理程序 。

zookeeper源码分析之四服务端(单机)处理请求流程的更多相关文章

  1. zookeeper源码分析之五服务端(集群leader)处理请求流程

    leader的实现类为LeaderZooKeeperServer,它间接继承自标准ZookeeperServer.它规定了请求到达leader时需要经历的路径: PrepRequestProcesso ...

  2. zookeeper源码分析之一服务端启动过程

    zookeeper简介 zookeeper是为分布式应用提供分布式协作服务的开源软件.它提供了一组简单的原子操作,分布式应用可以基于这些原子操作来实现更高层次的同步服务,配置维护,组管理和命名.zoo ...

  3. 4. 源码分析---SOFARPC服务端暴露

    服务端的示例 我们首先贴上我们的服务端的示例: public static void main(String[] args) { ServerConfig serverConfig = new Ser ...

  4. Nacos(二)源码分析Nacos服务端注册示例流程

    上回我们讲解了客户端配置好nacos后,是如何进行注册到服务器的,那我们今天来讲解一下服务器端接收到注册实例请求后会做怎么样的处理. 首先还是把博主画的源码分析图例发一下,让大家对整个流程有一个大概的 ...

  5. Netty源码分析之服务端启动过程

    一.首先来看一段服务端的示例代码: public class NettyTestServer { public void bind(int port) throws Exception{ EventL ...

  6. 源码分析--dubbo服务端暴露

    服务暴露的入口方法是 ServiceBean 的 onApplicationEvent.onApplicationEvent 是一个事件响应方法,该方法会在收到 Spring 上下文刷新事件后执行服务 ...

  7. TeamTalk源码分析之服务端描述

    TTServer(TeamTalk服务器端)主要包含了以下几种服务器: LoginServer (C++): 登录服务器,分配一个负载小的MsgServer给客户端使用 MsgServer (C++) ...

  8. Netty源码分析之服务端启动

    Netty服务端启动代码: public final class EchoServer { static final int PORT = Integer.parseInt(System.getPro ...

  9. zookeeper源码分析三LEADER与FOLLOWER同步数据流程

    根据二)中的分析,如果一台zookeeper服务器成为集群中的leader,那么一定是当前所有服务器中保存数据最多的服务器,所以在这台服务器成为leader之后,首先要做的事情就是与集群中的其它服务器 ...

随机推荐

  1. [APUE]标准IO库(下)

    一.标准IO的效率 对比以下四个程序的用户CPU.系统CPU与时钟时间对比 程序1:系统IO 程序2:标准IO getc版本 程序3:标准IO fgets版本 结果: [注:该表截取自APUE,上表中 ...

  2. MVC Core 网站开发(Ninesky) 2.1、栏目的前台显示(补充)

    在2.1.栏目的前台显示中因右键没有添加视图把微软给鄙视了一下,后来有仔细研究了一下发现应该鄙视自己,其实这个功能是有的,是自己没搞清楚乱吐糟. 其实只要在NuGet中安装两个包(Microsoft. ...

  3. Android N开发 你需要知道的一切

    title: Android N开发 你需要知道的一切 tags: Android N,Android7.0,Android --- 转载请注明出处:http://www.cnblogs.com/yi ...

  4. 基于ASP.NET/C#开发国外支付平台(Paypal)学习心得。

        最近一直在研究Paypal的支付平台,因为本人之前没有接触过接口这一块,新来一家公司比较不清楚流程就要求开发两个支付平台一个是支付宝(这边就不再这篇文章里面赘述了),但还是花了2-3天的时间通 ...

  5. SignalR SelfHost实时消息,集成到web中,实现服务器消息推送

    先前用过两次SignalR,但是中途有段时间没弄了,今天重新弄,发现已经忘得差不多了,做个笔记! 首先创建一个控制台项目Nuget添加引用联机搜索:Microsoft.AspNet.SignalR.S ...

  6. CSS 3学习——文本效果和@font-face

    文本效果 关于文本效果,这里仅仅记录得到大多数浏览器支持的几个属性,分别是: text-overflow text-shadow word-break word-wrap text-overflow ...

  7. VSCode调试go语言出现:exec: "gcc": executable file not found in %PATH%

    1.问题描述 由于安装VS15 Preview 5,搞的系统由重新安装一次:在用vscdoe编译go语言时,出现以下问题: # odbcexec: "gcc": executabl ...

  8. 【WPF】日常笔记

    本文专用于记录WPF开发中的小细节,作为备忘录使用. 1. 关于绑定: Text ="{Binding AnchorageValue,Mode=TwoWay,UpdateSourceTrig ...

  9. WEB安全隐患

    org.apache.commons.lang.StringEscapeUtils 进行输入框内容处理 [StringEscapeUtils.escapeSql(str);StringEscapeUt ...

  10. closure

    什么是闭包?百度的答案: 闭包是指可以包含自由(未绑定到特定对象)变量的代码块:这些变量不是在这个代码块内或者任何全局上下文中定义的,而是在定义代码块的环境中定义(局部变量)."闭包&quo ...