rocket持久化保证的思想有两点:1是刷盘保证大部分数据不丢失;2是持久化文件的处理,零拷贝技术和内存页,NIO模型保证处理能力


  • 文件持久化目录

  ├──abort:rocket broker启动检查的文件,正常启动会写入一个abort,正常退出会删除abort,通过它来判断上一次是否异常退出

  ├──checkpoint:随着broker启动,加载的历史检查点

  ├──lock:全局资源的文件锁

  ├──commitlog:broker存储的核心,我们都是到rocket是broker集中存储,落地存盘就存在commitlog里

  │ ├──00000000000000000000(示例)rocket会对commitlog进行预创建,并将消息写入,每次创建的文件根据当前文件偏移量决定,例如第一次创建就是00000000000000000000

  ├──compaction:(基于rocket 5.0)

  │ ├──position-checkpoint:缓存上一次消费的检查点,每次处理完成后会更新

  ├──config:

  │ ├──consumerFilter.json:存储对应topic下的消息过滤规则:ConcurrentMap<String/*Topic*/, FilterDataMapByTopic>

  │ ├──consumerOffset.json:根据消费者组存储的每个消费者消费点位:ConcurrentMap<String/* topic@group */, ConcurrentMap<Integer, Long>>

  │ ├──consumerOrderInfo.json:顺序消息顺序:ConcurrentHashMap<String/* topic@group*/, ConcurrentHashMap<Integer/*queueId*/, OrderInfo>>

  │ ├──delayOffset.json:针对消费者pull的延时队列拉取消费点位

  │ ├──subscriptionGroup.json:消费者组对应订阅的消息信息,其实就是broker接收的消费者信息

  │ ├──topics.json:存储对应的topic信息

  │ ├──timercheck:基于定时消息的时间轮配置文件,rocket5.0以上版本

  │ ├──timermetrics:基于定时消息的时间轮配置文件,rocket5.0以上版本

  ├──consumequeue:broker对应topic下队列的消费信息

  │ ├──%{topicName}:主题名称

  │ │ ├──%{queueId}:队列id

  │ │ │ ├──00000000000000000000:消费点位

  ├──index:索引文目录

  │ ├──00000000000000000000:索引文件,快速定位commitlog中的消息位置

  └──timerwheel:基于时间轮算法实现定时消息的配置

  这些文件是broker支持容灾的基础,rocket集群其实就是broker集群的能力,通过这些配置文件可以做到不丢失,在broker启动时会加载对应的配置。

  1. /**
  2. * 上层抽象的配置工厂,在broker启动时会根据组件依次加载,并将文件读取到变量中。例如consumerOffsetTable
  3. * 抽象类下每一个manager加载对应的配置信息
  4. */
  5. public abstract class ConfigManager {
  6. private static final Logger log = LoggerFactory.getLogger(LoggerName.COMMON_LOGGER_NAME);
  7.  
  8. public abstract String encode();   
  • store存储

  rocket基于文件的处理,底层是采用mmap的方式和NIO的byteBuffer,在store上层封装了基本的组件

  

  1. /**
  2. * TODO store消息处理的核心对象 mappedFile封装了对消息处理 写入
  3. * NIO 的文件到磁盘的处理工具
  4. */
  5. public class DefaultMappedFile extends AbstractMappedFile {
  6. // 操作系统数据页 4K,unix系列通常是这个大小
  7. public static final int OS_PAGE_SIZE = 1024 * 4;
  8. public static final Unsafe UNSAFE = getUnsafe();
  9. private static final Method IS_LOADED_METHOD;
  10. public static final int UNSAFE_PAGE_SIZE = UNSAFE == null ? OS_PAGE_SIZE : UNSAFE.pageSize();
  11.  
  12. protected static final Logger log = LoggerFactory.getLogger(LoggerName.STORE_LOGGER_NAME);
  13.  
  14. // mq总共分配的映射文件内存大小
  15. protected static final AtomicLong TOTAL_MAPPED_VIRTUAL_MEMORY = new AtomicLong(0);
  16.  
  17. // mq总共创建的内存文件映射数量
  18. protected static final AtomicInteger TOTAL_MAPPED_FILES = new AtomicInteger(0);
  19.  
  20. protected static final AtomicIntegerFieldUpdater<DefaultMappedFile> WROTE_POSITION_UPDATER;
  21. protected static final AtomicIntegerFieldUpdater<DefaultMappedFile> COMMITTED_POSITION_UPDATER;
  22. protected static final AtomicIntegerFieldUpdater<DefaultMappedFile> FLUSHED_POSITION_UPDATER;
  23.  
  24. // 当前数据的写入位置指针,下次写数据从此开始写入
  25. protected volatile int wrotePosition;
  26. // 当前数据的提交指针,指针之前的数据已提交到fileChannel,commitPos~writePos之间的数据是还未提交到fileChannel的
  27. protected volatile int committedPosition;
  28. // 当前数据的刷盘指针,指针之前的数据已落盘,commitPos~flushedPos之间的数据是还未落盘的
  29. protected volatile int flushedPosition;
  30. //文件大小 字节
  31. protected int fileSize;
  32. // TODO 磁盘文件的内存文件通道对象 也是mmap的方式体现
  33. protected FileChannel fileChannel;
  34. /**
  35. * Message will put to here first, and then reput to FileChannel if writeBuffer is not null.
  36. */
  37. // 异步刷盘时数据先写入writeBuf,由CommitRealTime线程定时200ms提交到fileChannel内存,再由FlushRealTime线程定时500ms刷fileChannel落盘
  38. protected ByteBuffer writeBuffer = null;
  39. // 堆外内存池,服务于异步刷盘机制,为了减少内存申请和销毁的时间,提前向OS申请并锁定一块对外内存池,writeBuf就从这里获取
  40. protected TransientStorePool transientStorePool = null;
  41. // 文件起始的字节
  42. protected String fileName;
  43. // 文件的初始消费点位,跟文件的命名相关 例如 00000000000000000000 就代表从0开始,默认一个commitLog是1G 大小,那么超过之后会生成新的commitLog 文件名称就是当前文件起始的偏移量
  44. protected long fileFromOffset;
  45. protected File file;
  46. // 磁盘文件的内存映射对象,同步刷盘时直接将数据写入到mapedBuf
  47. protected MappedByteBuffer mappedByteBuffer;
  48. // 最近操作的时间戳
  49. protected volatile long storeTimestamp = 0;
  50. protected boolean firstCreateInQueue = false;
  51. private long lastFlushTime = -1L;
  52.  
  53. protected MappedByteBuffer mappedByteBufferWaitToClean = null;
  54. protected long swapMapTime = 0L;
  55. protected long mappedByteBufferAccessCountSinceLastSwap = 0L;

  首先,核心的DefaultMappedFile 使用了 FileChannel 通道,它也是基于mmap的实现零拷贝技术。

  其中它定义了三个指针,分别是
  wrotePosition:当前数据的写入位置指针,下次写数据从此开始写入

  committedPosition:当前数据的提交指针,指针之前的数据已提交到fileChannel,commitPos~writePos之间的数据是还未提交到fileChannel的

  flushedPosition:当前数据的刷盘指针,指针之前的数据已落盘,commitPos~flushedPos之间的数据是还未落盘的

  同时,定义了ByteBuffer,基于NIO在异步刷盘时,先会将数据写入byteBuffer,然后会有定时线程会定时拉取到fileChannel通道,最后将fileChannel进行刷盘

  1. /**
  2. * 根据队列中的AllocateRequest创建下一个commitLog
  3. */
  4. public void run() {
  5. log.info(this.getServiceName() + " service started");
  6.  
  7. while (!this.isStopped() && this.mmapOperation()) {
  8.  
  9. }
  10. log.info(this.getServiceName() + " service end");
  11. }

  AllocateRequest封装的是对commitLog预处理的动作,AllocateRequest是对预创建commitLog的封装,会在处理时预创建并将放入队列,在store启动时会启动AllocateMappedFileService的线程监听创建

  1. /**
  2. * TODO commitLog 创建预处理封装的核心
  3. * @param nextFilePath
  4. * @param nextNextFilePath
  5. * @param fileSize
  6. * @return
  7. */
  8. public MappedFile putRequestAndReturnMappedFile(String nextFilePath, String nextNextFilePath, int fileSize) {
  9. int canSubmitRequests = 2;
  10. if (this.messageStore.isTransientStorePoolEnable()) {
  11. if (this.messageStore.getMessageStoreConfig().isFastFailIfNoBufferInStorePool()
  12. && BrokerRole.SLAVE != this.messageStore.getMessageStoreConfig().getBrokerRole()) { //if broker is slave, don't fast fail even no buffer in pool
  13. canSubmitRequests = this.messageStore.getTransientStorePool().availableBufferNums() - this.requestQueue.size();
  14. }
  15. }
  16.  
  17. //封装一个AllocateRequest放在队列里,异步线程方式去获取执行
  18. AllocateRequest nextReq = new AllocateRequest(nextFilePath, fileSize);
  19. boolean nextPutOK = this.requestTable.putIfAbsent(nextFilePath, nextReq) == null;
  20.  
  21. if (nextPutOK) {
  22. if (canSubmitRequests <= 0) {
  23. log.warn("[NOTIFYME]TransientStorePool is not enough, so create mapped file error, " +
  24. "RequestQueueSize : {}, StorePoolSize: {}", this.requestQueue.size(), this.messageStore.getTransientStorePool().availableBufferNums());
  25. this.requestTable.remove(nextFilePath);
  26. return null;
  27. }
  28. boolean offerOK = this.requestQueue.offer(nextReq);
  29. if (!offerOK) {
  30. log.warn("never expected here, add a request to preallocate queue failed");
  31. }
  32. canSubmitRequests--;
  33. }
  34.  
  35. AllocateRequest nextNextReq = new AllocateRequest(nextNextFilePath, fileSize);
  36. boolean nextNextPutOK = this.requestTable.putIfAbsent(nextNextFilePath, nextNextReq) == null;
  37. if (nextNextPutOK) {
  38. if (canSubmitRequests <= 0) {
  39. log.warn("[NOTIFYME]TransientStorePool is not enough, so skip preallocate mapped file, " +
  40. "RequestQueueSize : {}, StorePoolSize: {}", this.requestQueue.size(), this.messageStore.getTransientStorePool().availableBufferNums());
  41. this.requestTable.remove(nextNextFilePath);
  42. } else {
  43. boolean offerOK = this.requestQueue.offer(nextNextReq);
  44. if (!offerOK) {
  45. log.warn("never expected here, add a request to preallocate queue failed");
  46. }
  47. }
  48. }
  49.  
  50. if (hasException) {
  51. log.warn(this.getServiceName() + " service has exception. so return null");
  52. return null;
  53. }
  54. // 阻塞等待AllocateMapFile线程创建好文件并返回
  55. AllocateRequest result = this.requestTable.get(nextFilePath);
  56. try {
  57. if (result != null) {
  58. messageStore.getPerfCounter().startTick("WAIT_MAPFILE_TIME_MS");
  59. boolean waitOK = result.getCountDownLatch().await(waitTimeOut, TimeUnit.MILLISECONDS);
  60. messageStore.getPerfCounter().endTick("WAIT_MAPFILE_TIME_MS");
  61. if (!waitOK) {
  62. log.warn("create mmap timeout " + result.getFilePath() + " " + result.getFileSize());
  63. return null;
  64. } else {
  65. this.requestTable.remove(nextFilePath);
  66. return result.getMappedFile();
  67. }
  68. } else {
  69. log.error("find preallocate mmap failed, this never happen");
  70. }
  71. } catch (InterruptedException e) {
  72. log.warn(this.getServiceName() + " service has exception. ", e);
  73. }

  在 Broker 初始化时会启动管理 MappedFile 创建的 AllocateMappedFileService 异步线程。消息处理线程 和 AllocateMappedFileService 线程通过队列 requestQueue 关联。

  消息写入时调用 AllocateMappedFileService 的 putRequestAndReturnMappedFile 方法往 requestQueue 放入提交创建 MappedFile 请求,这边会同时构建两个 AllocateRequest 放入队列。

  AllocateMappedFileService 线程循环从 requestQueue 获取 AllocateRequest 来创建 MappedFile。消息处理线程通过 CountDownLatch 等待获取第一个 MappedFile 创建成功就返回。

  当消息处理线程需要再次创建 MappedFile 时,此时可以直接获取之前已预创建的 MappedFile。这样通过预创建 MappedFile ,减少文件创建等待时间。

  • store消息存储全流程

  

  从图上可以看到,从生产者到消费者,store扮演了重要的角色。

  生产者发送消息后,会进行消息存盘,消费者消费消息后,会进行消费进度存盘。

  下面我们详细说说store的流程

  • 消息存储-从生产者到磁盘

  消息被生产者创建并发送到broker后,会对消息先进行存盘。如果是异步消息,存盘是由单独的子线程定时去处理的,如果是同步消息,则会阻塞等待消息处理完成后再进行返回。

  消息首先会经过producer,组装后会通过netty发送给broker,我们只关系broker的处理流程,如果想了解生产者之前的处理方式,可参考之前的文章。

  首先,broker中processor是broker对client基于netty的一些动作通知的封装,AbstractSendMessageProcessor上层会封装一些基本功能,例如消息重试,消息发送私信队列,以及一些beforeHook和afterHook前后置处理钩子函数,在producer发送sendMessage动作后,会将req发送至SendMessageProcessor,SendMessageProcessor 是client做sendMessage动作时,broker处理发送消息的加工者。

  

  1. public RemotingCommand processRequest(ChannelHandlerContext ctx,
  2. RemotingCommand request) throws RemotingCommandException {
  3. SendMessageContext sendMessageContext;
  4. switch (request.getCode()) {
  5. case RequestCode.CONSUMER_SEND_MSG_BACK:
  6. return this.consumerSendMsgBack(ctx, request);
  7. default:
  8. //发送成功的处理
  9. SendMessageRequestHeader requestHeader = parseRequestHeader(request);
  10. if (requestHeader == null) {
  11. return null;
  12. }
  13. TopicQueueMappingContext mappingContext = this.brokerController.getTopicQueueMappingManager().buildTopicQueueMappingContext(requestHeader, true);
  14. RemotingCommand rewriteResult = this.brokerController.getTopicQueueMappingManager().rewriteRequestForStaticTopic(requestHeader, mappingContext);
  15. if (rewriteResult != null) {
  16. return rewriteResult;
  17. }
  18. sendMessageContext = buildMsgContext(ctx, requestHeader, request);
  19. try {
  20. //加载前置钩子函数
  21. this.executeSendMessageHookBefore(sendMessageContext);
  22. } catch (AbortProcessException e) {
  23. final RemotingCommand errorResponse = RemotingCommand.createResponseCommand(e.getResponseCode(), e.getErrorMessage());
  24. errorResponse.setOpaque(request.getOpaque());
  25. return errorResponse;
  26. }
  27.  
  28. RemotingCommand response;
  29. //针对单消息处理和批量消息处理,并执行后置钩子函数
  30. if (requestHeader.isBatch()) {
  31. response = this.sendBatchMessage(ctx, request, sendMessageContext, requestHeader, mappingContext,
  32. (ctx1, response1) -> executeSendMessageHookAfter(response1, ctx1));
  33. } else {
  34. response = this.sendMessage(ctx, request, sendMessageContext, requestHeader, mappingContext,
  35. (ctx12, response12) -> executeSendMessageHookAfter(response12, ctx12));
  36. }
  37.  
  38. return response;
  39. }
  40. }

  如果消息是重试消息,则将消息发送到%retry%-topic队列进行重试,并处理重试等级及重试次数。

  这里最核心的是针对单消息处理和批量消息处理,对应的是处理单消息和多消息,broker封装的MessageBatch就是批量消息。

  

  1. public RemotingCommand sendMessage(final ChannelHandlerContext ctx,
  2. final RemotingCommand request,
  3. final SendMessageContext sendMessageContext,
  4. final SendMessageRequestHeader requestHeader,
  5. final TopicQueueMappingContext mappingContext,
  6. final SendMessageCallback sendMessageCallback) throws RemotingCommandException {
  7.  
  8. final RemotingCommand response = preSend(ctx, request, requestHeader);
  9. if (response.getCode() != -1) {
  10. return response;
  11. }
  12.  
  13. final SendMessageResponseHeader responseHeader = (SendMessageResponseHeader) response.readCustomHeader();
  14. //获取消息内容
  15. final byte[] body = request.getBody();
  16.  
  17. //获取消息指定队列id
  18. int queueIdInt = requestHeader.getQueueId();
  19. TopicConfig topicConfig = this.brokerController.getTopicConfigManager().selectTopicConfig(requestHeader.getTopic());
  20.  
  21. //如果队列id小于0,默认是非法的id,则重新分配一个队列进行绑定
  22. if (queueIdInt < 0) {
  23. queueIdInt = randomQueueId(topicConfig.getWriteQueueNums());
  24. }
  25.  
  26. MessageExtBrokerInner msgInner = new MessageExtBrokerInner();
  27. msgInner.setTopic(requestHeader.getTopic());
  28. msgInner.setQueueId(queueIdInt);
  29.  
  30. Map<String, String> oriProps = MessageDecoder.string2messageProperties(requestHeader.getProperties());
  31. //如果是重试消息或达到最大次数进入死信队列的消息,则直接返回
  32. if (!handleRetryAndDLQ(requestHeader, response, request, msgInner, topicConfig, oriProps)) {
  33. return response;
  34. }
  35.  
  36. msgInner.setBody(body);
  37. msgInner.setFlag(requestHeader.getFlag());
  38.  
  39. String uniqKey = oriProps.get(MessageConst.PROPERTY_UNIQ_CLIENT_MESSAGE_ID_KEYIDX);
  40. if (uniqKey == null || uniqKey.length() <= 0) {
  41. uniqKey = MessageClientIDSetter.createUniqID();
  42. oriProps.put(MessageConst.PROPERTY_UNIQ_CLIENT_MESSAGE_ID_KEYIDX, uniqKey);
  43. }
  44.  
  45. MessageAccessor.setProperties(msgInner, oriProps);
  46.  
  47. CleanupPolicy cleanupPolicy = CleanupPolicyUtils.getDeletePolicy(Optional.of(topicConfig));
  48. if (Objects.equals(cleanupPolicy, CleanupPolicy.COMPACTION)) {
  49. if (StringUtils.isBlank(msgInner.getKeys())) {
  50. response.setCode(ResponseCode.MESSAGE_ILLEGAL);
  51. response.setRemark("Required message key is missing");
  52. return response;
  53. }
  54. }
  55.  
  56. msgInner.setTagsCode(MessageExtBrokerInner.tagsString2tagsCode(topicConfig.getTopicFilterType(), msgInner.getTags()));
  57. msgInner.setBornTimestamp(requestHeader.getBornTimestamp());
  58. msgInner.setBornHost(ctx.channel().remoteAddress());
  59. msgInner.setStoreHost(this.getStoreHost());
  60. msgInner.setReconsumeTimes(requestHeader.getReconsumeTimes() == null ? 0 : requestHeader.getReconsumeTimes());
  61. String clusterName = this.brokerController.getBrokerConfig().getBrokerClusterName();
  62. MessageAccessor.putProperty(msgInner, MessageConst.PROPERTY_CLUSTER, clusterName);
  63.  
  64. msgInner.setPropertiesString(MessageDecoder.messageProperties2String(msgInner.getProperties()));
  65.  
  66. // Map<String, String> oriProps = MessageDecoder.string2messageProperties(requestHeader.getProperties());
  67. String traFlag = oriProps.get(MessageConst.PROPERTY_TRANSACTION_PREPARED);
  68. boolean sendTransactionPrepareMessage = false;
  69. if (Boolean.parseBoolean(traFlag)
  70. && !(msgInner.getReconsumeTimes() > 0 && msgInner.getDelayTimeLevel() > 0)) { //For client under version 4.6.1
  71. /**
  72. * 如果当前消息已经被消费者消费了不止一次,或者它的消费次数大于0,说明它已经是一个重复消费的消息了,如果它是一个事务消息,这是不允许的
  73. */
  74. if (this.brokerController.getBrokerConfig().isRejectTransactionMessage()) {
  75. response.setCode(ResponseCode.NO_PERMISSION);
  76. response.setRemark(
  77. "the broker[" + this.brokerController.getBrokerConfig().getBrokerIP1()
  78. + "] sending transaction message is forbidden");
  79. return response;
  80. }
  81. sendTransactionPrepareMessage = true;
  82. }
  83.  
  84. long beginTimeMillis = this.brokerController.getMessageStore().now();
  85.  
  86. /**
  87. * TODO 这是才是针对消息做的处理,根据broker同步或异步模型,则针对事务消息和普通消息做消息的处理
  88. */
  89. if (brokerController.getBrokerConfig().isAsyncSendEnable()) {
  90. CompletableFuture<PutMessageResult> asyncPutMessageFuture;
  91. //putMessage 是处理store 消息存储的核心
  92. if (sendTransactionPrepareMessage) {
  93. /**
  94. * @see org.apache.rocketmq.broker.transaction.queue.TransactionalMessageServiceImpl.asyncPrepareMessage
  95. * 将消息包装成half消息
  96. */
  97. asyncPutMessageFuture = this.brokerController.getTransactionalMessageService().asyncPrepareMessage(msgInner);
  98. } else {
  99. asyncPutMessageFuture = this.brokerController.getMessageStore().asyncPutMessage(msgInner);
  100. }
  101.  
  102. final int finalQueueIdInt = queueIdInt;
  103. final MessageExtBrokerInner finalMsgInner = msgInner;
  104. /**
  105. * 处理完成后,异步回调handlePutMessageResult,如果是同步模型,则阻塞handlePutMessageResult等待处理,这里跟下文else中处理方式类似,只是采用非阻塞的异步任务处理
  106. */
  107. asyncPutMessageFuture.thenAcceptAsync(putMessageResult -> {
  108. RemotingCommand responseFuture =
  109. handlePutMessageResult(putMessageResult, response, request, finalMsgInner, responseHeader, sendMessageContext,
  110. ctx, finalQueueIdInt, beginTimeMillis, mappingContext, BrokerMetricsManager.getMessageType(requestHeader));
  111. if (responseFuture != null) {
  112. doResponse(ctx, request, responseFuture);
  113. }
  114. sendMessageCallback.onComplete(sendMessageContext, response);
  115. }, this.brokerController.getPutMessageFutureExecutor());
  116. // Returns null to release the send message thread
  117. return null;
  118. } else {
  119. PutMessageResult putMessageResult = null;
  120. if (sendTransactionPrepareMessage) {
  121. putMessageResult = this.brokerController.getTransactionalMessageService().prepareMessage(msgInner);
  122. } else {
  123. putMessageResult = this.brokerController.getMessageStore().putMessage(msgInner);
  124. }
  125. handlePutMessageResult(putMessageResult, response, request, msgInner, responseHeader, sendMessageContext, ctx, queueIdInt, beginTimeMillis, mappingContext, BrokerMetricsManager.getMessageType(requestHeader));
  126. sendMessageCallback.onComplete(sendMessageContext, response);
  127. return response;
  128. }
  129. }

  首先进行前期的组装,消息体,设置队列id,丢弃一部分不合法消息,如重试消息或达到死信队列的消息。

  再将消息进行分类,如果是异步消息,且消息类型为事务消息,则异步处理一个asyncHalf,如果是其他类型的消息,根据消息内容进行异步的存储

  1. //putMessage 是处理store 消息存储的核心
  2. if (sendTransactionPrepareMessage) {
  3. /**
  4. * @see org.apache.rocketmq.broker.transaction.queue.TransactionalMessageServiceImpl.asyncPrepareMessage
  5. * 将消息包装成half消息
  6. */
  7. asyncPutMessageFuture = this.brokerController.getTransactionalMessageService().asyncPrepareMessage(msgInner);
  8. } else {
  9. asyncPutMessageFuture = this.brokerController.getMessageStore().asyncPutMessage(msgInner);
  10. }

  等待future处理完成后,异步回调handlePutMessageResult,如果是同步模型,则阻塞handlePutMessageResult等待处理,这里跟下文else中处理方式类似,只是采用非阻塞的异步任务处理;同步方式处理的流程是一样的,只是使用主线程阻塞处理。

  如果是采取异步处理,根据上一次的刷盘时间和策略定义3000ms时间进行线程监控,监控流程类似jdk9中对completableFuture中使用get阻塞超时时间。

  1. @Override
  2. public PutMessageResult putMessage(MessageExtBrokerInner msg) {
  3. return waitForPutResult(asyncPutMessage(msg));
  4. }
  1. //future异步任务的超时处理
  2. private PutMessageResult waitForPutResult(CompletableFuture<PutMessageResult> putMessageResultFuture) {
  3. try {
  4. int putMessageTimeout =
  5. Math.max(this.messageStoreConfig.getSyncFlushTimeout(),
  6. this.messageStoreConfig.getSlaveTimeout()) + 5000;
  7. return putMessageResultFuture.get(putMessageTimeout, TimeUnit.MILLISECONDS);
  8. } catch (ExecutionException | InterruptedException e) {
  9. return new PutMessageResult(PutMessageStatus.UNKNOWN_ERROR, null);
  10. } catch (TimeoutException e) {
  11. LOGGER.error("usually it will never timeout, putMessageTimeout is much bigger than slaveTimeout and "
  12. + "flushTimeout so the result can be got anyway, but in some situations timeout will happen like full gc "
  13. + "process hangs or other unexpected situations.");
  14. return new PutMessageResult(PutMessageStatus.UNKNOWN_ERROR, null);
  15. }
  16. }

  真正对消息存储的处理,在DefaultMessageStore的asyncPutMessage中

  1. public CompletableFuture<PutMessageResult> asyncPutMessage(MessageExtBrokerInner msg) {
  2.  
  3. //先指定初始化的前置钩子函数
  4. for (PutMessageHook putMessageHook : putMessageHookList) {
  5. PutMessageResult handleResult = putMessageHook.executeBeforePutMessage(msg);
  6. if (handleResult != null) {
  7. return CompletableFuture.completedFuture(handleResult);
  8. }
  9. }
  10.  
  11. /**
  12. * 检查消息的格式,如果格式不合法则直接中断
  13. */
  14. if (msg.getProperties().containsKey(MessageConst.PROPERTY_INNER_NUM)
  15. && !MessageSysFlag.check(msg.getSysFlag(), MessageSysFlag.INNER_BATCH_FLAG)) {
  16. LOGGER.warn("[BUG]The message had property {} but is not an inner batch", MessageConst.PROPERTY_INNER_NUM);
  17. return CompletableFuture.completedFuture(new PutMessageResult(PutMessageStatus.MESSAGE_ILLEGAL, null));
  18. }
  19.  
  20. if (MessageSysFlag.check(msg.getSysFlag(), MessageSysFlag.INNER_BATCH_FLAG)) {
  21. Optional<TopicConfig> topicConfig = this.getTopicConfig(msg.getTopic());
  22. if (!QueueTypeUtils.isBatchCq(topicConfig)) {
  23. LOGGER.error("[BUG]The message is an inner batch but cq type is not batch cq");
  24. return CompletableFuture.completedFuture(new PutMessageResult(PutMessageStatus.MESSAGE_ILLEGAL, null));
  25. }
  26. }
  27.  
  28. long beginTime = this.getSystemClock().now();
  29. //commitLog处理消息
  30. CompletableFuture<PutMessageResult> putResultFuture = this.commitLog.asyncPutMessage(msg);
  31.  
  32. /**
  33. * 计算future存储消息所用的时间并将其更新
  34. */
  35. putResultFuture.thenAccept(result -> {
  36. long elapsedTime = this.getSystemClock().now() - beginTime;
  37. if (elapsedTime > 500) {
  38. LOGGER.warn("DefaultMessageStore#putMessage: CommitLog#putMessage cost {}ms, topic={}, bodyLength={}",
  39. elapsedTime, msg.getTopic(), msg.getBody().length);
  40. }
  41. this.storeStatsService.setPutMessageEntireTimeMax(elapsedTime);
  42.  
  43. if (null == result || !result.isOk()) {
  44. //如果处理失败,则增加一次保存消息失败的次数
  45. this.storeStatsService.getPutMessageFailedTimes().add(1);
  46. }
  47. });
  48.  
  49. return putResultFuture;
  50. }

  可以看到其实asyncPutMessage将处理结果封装成completableFuture异步执行,开始先做了HookBefore的前置钩子函数,然后检查消息格式以及topic的配置,最后在处理完成后更新了处理的时间和失败次数在storeStatus的成员变量中。其中最核心的操作其实是 CompletableFuture<PutMessageResult> putResultFuture = this.commitLog.asyncPutMessage(msg); ,它是根据消息进行append,最核心的处理文件的方式就是mappedFileChannel

  1. /**
  2. * TODO 核心存储消息的代码
  3. * @param msg
  4. * @return
  5. */
  6. public CompletableFuture<PutMessageResult> asyncPutMessage(final MessageExtBrokerInner msg) {
  7. // Set the storage time
  8. if (!defaultMessageStore.getMessageStoreConfig().isDuplicationEnable()) {
  9. msg.setStoreTimestamp(System.currentTimeMillis());
  10. }
  11.  
  12. // Set the message body CRC (consider the most appropriate setting on the client)
  13. msg.setBodyCRC(UtilAll.crc32(msg.getBody()));
  14. // Back to Results
  15. AppendMessageResult result = null;
  16.  
  17. StoreStatsService storeStatsService = this.defaultMessageStore.getStoreStatsService();
  18.  
  19. String topic = msg.getTopic();
  20. msg.setVersion(MessageVersion.MESSAGE_VERSION_V1);
  21. boolean autoMessageVersionOnTopicLen =
  22. this.defaultMessageStore.getMessageStoreConfig().isAutoMessageVersionOnTopicLen();
  23. if (autoMessageVersionOnTopicLen && topic.length() > Byte.MAX_VALUE) {
  24. msg.setVersion(MessageVersion.MESSAGE_VERSION_V2);
  25. }
  26.  
  27. InetSocketAddress bornSocketAddress = (InetSocketAddress) msg.getBornHost();
  28. if (bornSocketAddress.getAddress() instanceof Inet6Address) {
  29. msg.setBornHostV6Flag();
  30. }
  31.  
  32. InetSocketAddress storeSocketAddress = (InetSocketAddress) msg.getStoreHost();
  33. if (storeSocketAddress.getAddress() instanceof Inet6Address) {
  34. msg.setStoreHostAddressV6Flag();
  35. }
  36.  
  37. //获取本地线程的变量,并更新最大消息大小
  38. PutMessageThreadLocal putMessageThreadLocal = this.putMessageThreadLocal.get();
  39. updateMaxMessageSize(putMessageThreadLocal);
  40. //根据topic和queue的messgae信息组装成一个唯一的topicQueueKey 格式为:topic-queueId
  41. String topicQueueKey = generateKey(putMessageThreadLocal.getKeyBuilder(), msg);
  42. long elapsedTimeInLock = 0;
  43. MappedFile unlockMappedFile = null;
  44. //TODO 获取上一次操作的mapperFile 也就是最后的一个mapped
  45. MappedFile mappedFile = this.mappedFileQueue.getLastMappedFile();
  46.  
  47. //如果当前没有mappedFile 说明是第一次创建,则从最开始进行位置计算
  48. long currOffset;
  49. if (mappedFile == null) {
  50. currOffset = 0;
  51. } else {
  52. //如果有说明当前的消息应该存储在 当前commit文件名的位置加上当前指针已经偏移的位置
  53. currOffset = mappedFile.getFileFromOffset() + mappedFile.getWrotePosition();
  54. }
  55.  
  56. //计算需要ack的数量以及是否需要做HA通知broker
  57. int needAckNums = this.defaultMessageStore.getMessageStoreConfig().getInSyncReplicas();
  58. boolean needHandleHA = needHandleHA(msg);
  59.  
  60. if (needHandleHA && this.defaultMessageStore.getBrokerConfig().isEnableControllerMode()) {
  61. if (this.defaultMessageStore.getHaService().inSyncReplicasNums(currOffset) < this.defaultMessageStore.getMessageStoreConfig().getMinInSyncReplicas()) {
  62. return CompletableFuture.completedFuture(new PutMessageResult(PutMessageStatus.IN_SYNC_REPLICAS_NOT_ENOUGH, null));
  63. }
  64. if (this.defaultMessageStore.getMessageStoreConfig().isAllAckInSyncStateSet()) {
  65. // -1 means all ack in SyncStateSet
  66. needAckNums = MixAll.ALL_ACK_IN_SYNC_STATE_SET;
  67. }
  68. } else if (needHandleHA && this.defaultMessageStore.getBrokerConfig().isEnableSlaveActingMaster()) {
  69. int inSyncReplicas = Math.min(this.defaultMessageStore.getAliveReplicaNumInGroup(),
  70. this.defaultMessageStore.getHaService().inSyncReplicasNums(currOffset));
  71. needAckNums = calcNeedAckNums(inSyncReplicas);
  72. if (needAckNums > inSyncReplicas) {
  73. // Tell the producer, don't have enough slaves to handle the send request
  74. return CompletableFuture.completedFuture(new PutMessageResult(PutMessageStatus.IN_SYNC_REPLICAS_NOT_ENOUGH, null));
  75. }
  76. }
  77.  
  78. //对当前指定的key进行锁定,当前key说明是一个topic下一个队列
  79. topicQueueLock.lock(topicQueueKey);
  80. try {
  81.  
  82. boolean needAssignOffset = true;
  83. if (defaultMessageStore.getMessageStoreConfig().isDuplicationEnable()
  84. && defaultMessageStore.getMessageStoreConfig().getBrokerRole() != BrokerRole.SLAVE) {
  85. needAssignOffset = false;
  86. }
  87. if (needAssignOffset) {
  88. defaultMessageStore.assignOffset(msg, getMessageNum(msg));
  89. }
  90.  
  91. PutMessageResult encodeResult = putMessageThreadLocal.getEncoder().encode(msg);
  92. if (encodeResult != null) {
  93. return CompletableFuture.completedFuture(encodeResult);
  94. }
  95. msg.setEncodedBuff(putMessageThreadLocal.getEncoder().getEncoderBuffer());
  96. //存储消息的上下文
  97. PutMessageContext putMessageContext = new PutMessageContext(topicQueueKey);
  98.  
  99. //spin或ReentrantLock,具体取决于存储配置
  100. putMessageLock.lock(); //spin or ReentrantLock ,depending on store config
  101. try {
  102. //加锁成功后的时间
  103. long beginLockTimestamp = this.defaultMessageStore.getSystemClock().now();
  104. this.beginTimeInLock = beginLockTimestamp;
  105.  
  106. // Here settings are stored timestamp, in order to ensure an orderly
  107. // global
  108. //设置存储时间为加锁成功后的时间,保证顺序
  109. if (!defaultMessageStore.getMessageStoreConfig().isDuplicationEnable()) {
  110. msg.setStoreTimestamp(beginLockTimestamp);
  111. }
  112.  
  113. //如果当前没有mapped或mapped已经满了,则会创建新的mapped
  114. if (null == mappedFile || mappedFile.isFull()) {
  115. mappedFile = this.mappedFileQueue.getLastMappedFile(0); // Mark: NewFile may be cause noise
  116. }
  117. if (null == mappedFile) {
  118. log.error("create mapped file1 error, topic: " + msg.getTopic() + " clientAddr: " + msg.getBornHostString());
  119. beginTimeInLock = 0;
  120. return CompletableFuture.completedFuture(new PutMessageResult(PutMessageStatus.CREATE_MAPPED_FILE_FAILED, null));
  121. }
  122.  
  123. //追加写入的内容
  124. result = mappedFile.appendMessage(msg, this.appendMessageCallback, putMessageContext);
  125. switch (result.getStatus()) {
  126. case PUT_OK:
  127. onCommitLogAppend(msg, result, mappedFile);
  128. break;
  129. case END_OF_FILE:
  130. //如果文件空间不足,重新初始化文件并尝试重新写入
  131. onCommitLogAppend(msg, result, mappedFile);
  132. unlockMappedFile = mappedFile;
  133. // Create a new file, re-write the message
  134. mappedFile = this.mappedFileQueue.getLastMappedFile(0);
  135. if (null == mappedFile) {
  136. // XXX: warn and notify me
  137. log.error("create mapped file2 error, topic: " + msg.getTopic() + " clientAddr: " + msg.getBornHostString());
  138. beginTimeInLock = 0;
  139. return CompletableFuture.completedFuture(new PutMessageResult(PutMessageStatus.CREATE_MAPPED_FILE_FAILED, result));
  140. }
  141. result = mappedFile.appendMessage(msg, this.appendMessageCallback, putMessageContext);
  142. if (AppendMessageStatus.PUT_OK.equals(result.getStatus())) {
  143. onCommitLogAppend(msg, result, mappedFile);
  144. }
  145. break;
  146. case MESSAGE_SIZE_EXCEEDED:
  147. case PROPERTIES_SIZE_EXCEEDED:
  148. beginTimeInLock = 0;
  149. return CompletableFuture.completedFuture(new PutMessageResult(PutMessageStatus.MESSAGE_ILLEGAL, result));
  150. case UNKNOWN_ERROR:
  151. beginTimeInLock = 0;
  152. return CompletableFuture.completedFuture(new PutMessageResult(PutMessageStatus.UNKNOWN_ERROR, result));
  153. default:
  154. beginTimeInLock = 0;
  155. return CompletableFuture.completedFuture(new PutMessageResult(PutMessageStatus.UNKNOWN_ERROR, result));
  156. }
  157.  
  158. //更新使用的时间
  159. elapsedTimeInLock = this.defaultMessageStore.getSystemClock().now() - beginLockTimestamp;
  160. beginTimeInLock = 0;
  161. } finally {
  162. //释放锁
  163. putMessageLock.unlock();
  164. }
  165. } finally {
  166. //释放锁
  167. topicQueueLock.unlock(topicQueueKey);
  168. }
  169.  
  170. if (elapsedTimeInLock > 500) {
  171. log.warn("[NOTIFYME]putMessage in lock cost time(ms)={}, bodyLength={} AppendMessageResult={}", elapsedTimeInLock, msg.getBody().length, result);
  172. }
  173.  
  174. if (null != unlockMappedFile && this.defaultMessageStore.getMessageStoreConfig().isWarmMapedFileEnable()) {
  175. this.defaultMessageStore.unlockMappedFile(unlockMappedFile);
  176. }
  177.  
  178. PutMessageResult putMessageResult = new PutMessageResult(PutMessageStatus.PUT_OK, result);
  179.  
  180. // Statistics
  181. //存储缓存数据副本的更新
  182. storeStatsService.getSinglePutMessageTopicTimesTotal(msg.getTopic()).add(result.getMsgNum());
  183. storeStatsService.getSinglePutMessageTopicSizeTotal(topic).add(result.getWroteBytes());
  184.  
  185. //提交刷盘请求,提交副本请求
  186. return handleDiskFlushAndHA(putMessageResult, msg, needAckNums, needHandleHA);
  187. }

  先设置一些基本数据,如存储时间,brokerHost,storeHost,获取本地变量LocalThread,更新最大的消息存储大小;

  根据topic和queue的messgae信息组装成一个唯一的topicQueueKey 格式为:topic-queueId;获取上一次操作的mapperFile 也就是最后的一个mapped,因为消息的写入是append追加的,消息的持久化都是集中存储的;

  如果没有获取到使用过的mappedFileChannel,说明这条消息可能是第一条,那么就创建一个fileChannel通道,如果没有消息那么消费的初始点位肯定是0,如果获取到了fileChannel,其实对应的commitlog文件的名称就是这个文件最开始的消费点位,那么当前消息对应的消费点位其实就是获取到的mappedFile的文件名称 + 当前消息所处的offSet的位置 就是这个文件存储的位置;

  校验HA和ack;

  先对 topicQueueKey进行锁定,这个key生成的规则是topic下的一个queue,计算这次消费的消费点位;

  定义存储消息的上下文 PutMessageContext:

  1. public class PutMessageContext {
  2. private String topicQueueTableKey;//锁定的key
  3. private long[] phyPos;
  4. private int batchSize;//批量数据的大小
  5.  
  6. public PutMessageContext(String topicQueueTableKey) {
  7. this.topicQueueTableKey = topicQueueTableKey;
  8. }
  9. }

  对putMessageLock进行锁定:这里锁定有两种方式:自旋锁和重入锁

  1. /**
  2. * Spin lock Implementation to put message, suggest using this with low race conditions
  3. */
  4. public class PutMessageSpinLock implements PutMessageLock {
  5. //true: Can lock, false : in lock.
  6. private AtomicBoolean putMessageSpinLock = new AtomicBoolean(true);
  7.  
  8. @Override
  9. public void lock() {
  10. boolean flag;
  11. do {
  12. flag = this.putMessageSpinLock.compareAndSet(true, false);
  13. }
  14. while (!flag);
  15. }
  16.  
  17. @Override
  18. public void unlock() {
  19. this.putMessageSpinLock.compareAndSet(false, true);
  20. }
  21. }
  1. /**
  2. * Exclusive lock implementation to put message
  3. */
  4. public class PutMessageReentrantLock implements PutMessageLock {
  5. private ReentrantLock putMessageNormalLock = new ReentrantLock(); // NonfairSync
  6.  
  7. @Override
  8. public void lock() {
  9. putMessageNormalLock.lock();
  10. }
  11.  
  12. @Override
  13. public void unlock() {
  14. putMessageNormalLock.unlock();
  15. }
  16. }

  在rocket4.X之后,应该都是默认true,异步刷盘建议使用自旋锁,同步刷盘建议使用重入锁,调整Broker配置项`useReentrantLockWhenPutMessage`,默认为false;异步刷盘建议开启`TransientStorePoolEnable`;建议关闭transferMsgByHeap,提高拉消息效率;同步刷盘建议适当增大`sendMessageThreadPoolNums`,具体配置需要经过压测

  设置成功加锁后的时间,保证了操作的顺序。上一步获取的mappedFile如果没有获取到或者已经获取满了,则需要创建新的mappedFile;

  1. /**
  2. * TODO 预处理创建新的commitLog
  3. * @return
  4. */
  5. public MappedFile getLastMappedFile(final long startOffset, boolean needCreate) {
  6. long createOffset = -1;
  7. /**
  8. * 获取最新的mappedFile
  9. */
  10. MappedFile mappedFileLast = getLastMappedFile();
  11.  
  12. //如果获取不到,则说明是第一次创建文件
  13. if (mappedFileLast == null) {
  14. createOffset = startOffset - (startOffset % this.mappedFileSize);
  15. }
  16.  
  17. /**
  18. * 如果文件写满了,则需要计算下一个文件的初始量 其实就是上一个文件最后的偏移量的下一个
  19. */
  20. if (mappedFileLast != null && mappedFileLast.isFull()) {
  21. createOffset = mappedFileLast.getFileFromOffset() + this.mappedFileSize;
  22. }
  23.  
  24. //创建新的commitLog
  25. if (createOffset != -1 && needCreate) {
  26. return tryCreateMappedFile(createOffset);
  27. }
  28.  
  29. return mappedFileLast;
  30. }

  追加需要写入的数据 result = mappedFile.appendMessage(msg, this.appendMessageCallback, putMessageContext);

  1. /**
  2. * TODO append 统一为fileChannel 对文件的写入 提供了单消息和批量消息的写入
  3. */
  4. public AppendMessageResult appendMessage(final ByteBuffer byteBufferMsg, final CompactionAppendMsgCallback cb) {
  5. assert byteBufferMsg != null;
  6. assert cb != null;
  7.  
  8. //获取当前写入的位置
  9. int currentPos = WROTE_POSITION_UPDATER.get(this);
  10. //当前写入的位置需要比文件最大的位数要小
  11. if (currentPos < this.fileSize) {
  12. //根据appendMessageBuffer选择是否写入writeBuffer还是mapperByteBuffer 异步刷盘应该写入writeBuffer 再定时写到mapperBuffer
  13. ByteBuffer byteBuffer = appendMessageBuffer().slice();
  14. //修改写入位置
  15. byteBuffer.position(currentPos);
  16. AppendMessageResult result = cb.doAppend(byteBuffer, this.fileFromOffset, this.fileSize - currentPos, byteBufferMsg);
  17. //AtomicInteger累计更新写入的位置 WROTE_POSITION_UPDATER其实就是当前已经存储文件的字节
  18. WROTE_POSITION_UPDATER.addAndGet(this, result.getWroteBytes());
  19. //更新最后一次写入时间
  20. this.storeTimestamp = result.getStoreTimestamp();
  21. return result;
  22. }
  23. log.error("MappedFile.appendMessage return null, wrotePosition: {} fileSize: {}", currentPos, this.fileSize);
  24. return new AppendMessageResult(AppendMessageStatus.UNKNOWN_ERROR);
  25. }

  写入处理后,根据响应状态处理,store提供了 onCommitLogAppend的提交后追加处理,如果当前写入失败是因为写入的长度不满足,则尝试重新创建文件并写入

  1. switch (result.getStatus()) {
  2. case PUT_OK:
  3. onCommitLogAppend(msg, result, mappedFile);
  4. break;
  5. case END_OF_FILE:
  6. //如果文件空间不足,重新初始化文件并尝试重新写入
  7. onCommitLogAppend(msg, result, mappedFile);
  8. unlockMappedFile = mappedFile;
  9. // Create a new file, re-write the message
  10. mappedFile = this.mappedFileQueue.getLastMappedFile(0);
  11. if (null == mappedFile) {
  12. // XXX: warn and notify me
  13. log.error("create mapped file2 error, topic: " + msg.getTopic() + " clientAddr: " + msg.getBornHostString());
  14. beginTimeInLock = 0;
  15. return CompletableFuture.completedFuture(new PutMessageResult(PutMessageStatus.CREATE_MAPPED_FILE_FAILED, result));
  16. }
  17. result = mappedFile.appendMessage(msg, this.appendMessageCallback, putMessageContext);
  18. if (AppendMessageStatus.PUT_OK.equals(result.getStatus())) {
  19. onCommitLogAppend(msg, result, mappedFile);
  20. }
  21. break;
  22. case MESSAGE_SIZE_EXCEEDED:
  23. case PROPERTIES_SIZE_EXCEEDED:
  24. beginTimeInLock = 0;
  25. return CompletableFuture.completedFuture(new PutMessageResult(PutMessageStatus.MESSAGE_ILLEGAL, result));
  26. case UNKNOWN_ERROR:
  27. beginTimeInLock = 0;
  28. return CompletableFuture.completedFuture(new PutMessageResult(PutMessageStatus.UNKNOWN_ERROR, result));
  29. default:
  30. beginTimeInLock = 0;
  31. return CompletableFuture.completedFuture(new PutMessageResult(PutMessageStatus.UNKNOWN_ERROR, result));
  32. }

  处理完成后,释放锁,缓存数据副本更新,提交刷盘并提交HA

  1. /**
  2. * 通知刷盘并HA的核心代码
  3. * @return
  4. */
  5. private CompletableFuture<PutMessageResult> handleDiskFlushAndHA(PutMessageResult putMessageResult,
  6. MessageExt messageExt, int needAckNums, boolean needHandleHA) {
  7. /**
  8. * 同步刷盘或异步刷盘的任务
  9. */
  10. CompletableFuture<PutMessageStatus> flushResultFuture = handleDiskFlush(putMessageResult.getAppendMessageResult(), messageExt);
  11. CompletableFuture<PutMessageStatus> replicaResultFuture;
  12. if (!needHandleHA) {
  13. replicaResultFuture = CompletableFuture.completedFuture(PutMessageStatus.PUT_OK);
  14. } else {
  15. replicaResultFuture = handleHA(putMessageResult.getAppendMessageResult(), putMessageResult, needAckNums);
  16. }
  17.  
  18. return flushResultFuture.thenCombine(replicaResultFuture, (flushStatus, replicaStatus) -> {
  19. if (flushStatus != PutMessageStatus.PUT_OK) {
  20. putMessageResult.setPutMessageStatus(flushStatus);
  21. }
  22. if (replicaStatus != PutMessageStatus.PUT_OK) {
  23. putMessageResult.setPutMessageStatus(replicaStatus);
  24. }
  25. return putMessageResult;
  26. });
  27. }
  1. @Override
  2. public CompletableFuture<PutMessageStatus> handleDiskFlush(AppendMessageResult result, MessageExt messageExt) {
  3. // Synchronization flush
  4. //同步刷盘
  5. if (FlushDiskType.SYNC_FLUSH == CommitLog.this.defaultMessageStore.getMessageStoreConfig().getFlushDiskType()) {
  6. final GroupCommitService service = (GroupCommitService) this.flushCommitLogService;
  7. if (messageExt.isWaitStoreMsgOK()) {
  8. GroupCommitRequest request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes(), CommitLog.this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout());
  9. //将刷盘request:GroupCommitRequest放入commitRequests
  10. flushDiskWatcher.add(request);
  11. service.putRequest(request);
  12. return request.future();
  13. } else {
  14. //唤醒线程去消费
  15. service.wakeup();
  16. return CompletableFuture.completedFuture(PutMessageStatus.PUT_OK);
  17. }
  18. }
  19. // Asynchronous flush
  20. //异步,唤醒线程就返回
  21. else {
  22. if (!CommitLog.this.defaultMessageStore.isTransientStorePoolEnable()) {
  23. flushCommitLogService.wakeup();
  24. } else {
  25. commitRealTimeService.wakeup();
  26. }
  27. return CompletableFuture.completedFuture(PutMessageStatus.PUT_OK);
  28. }
  29. }

  处理完成后,再进行onComplete对后置HookAfter钩子函数的回调

  • 消息存储-从消费者到磁盘

    •  消费者拉取

  consumer在startUp时会启动一个线程池异步去指定拉取的动作,pullRequest,client端的流程不在本篇具体描述,流程可以参考之前的文章,如何保证不重复消费。本篇主要考虑在 broker中processoe中如何根据store做消费进度持久化和拉取的。

  broker核心处理拉取方法:

  1. /**
  2. * TODO broker processor拉取对应消息的核心代码
  3. * 同样的写法 上层做了异步的CompletableFuture,真正拉取的地方在 @see DefaultMessageStore#getMessage
  4. */
  5. messageStore.getMessageAsync(group, topic, queueId, requestHeader.getQueueOffset(),
  6. requestHeader.getMaxMsgNums(), messageFilter)
  1. /**
  2. * TODO broker根据持久化存储拉取文件的处理
  3. * @return
  4. */
  5. @Override
  6. public GetMessageResult getMessage(final String group, final String topic, final int queueId, final long offset,
  7. final int maxMsgNums, final int maxTotalMsgSize, final MessageFilter messageFilter) {
  8. //判断当前状态
  9. if (this.shutdown) {
  10. LOGGER.warn("message store has shutdown, so getMessage is forbidden");
  11. return null;
  12. }
  13.  
  14. if (!this.runningFlags.isReadable()) {
  15. LOGGER.warn("message store is not readable, so getMessage is forbidden " + this.runningFlags.getFlagBits());
  16. return null;
  17. }
  18.  
  19. Optional<TopicConfig> topicConfig = getTopicConfig(topic);
  20. CleanupPolicy policy = CleanupPolicyUtils.getDeletePolicy(topicConfig);
  21. //check request topic flag
  22. //操作标记是过期清理,则通过compactionStore.getMessage获取消息
  23. if (Objects.equals(policy, CleanupPolicy.COMPACTION) && messageStoreConfig.isEnableCompaction()) {
  24. return compactionStore.getMessage(group, topic, queueId, offset, maxMsgNums, maxTotalMsgSize);
  25. } // else skip
  26.  
  27. long beginTime = this.getSystemClock().now();
  28.  
  29. GetMessageStatus status = GetMessageStatus.NO_MESSAGE_IN_QUEUE;
  30. long nextBeginOffset = offset;
  31. long minOffset = 0;
  32. long maxOffset = 0;
  33.  
  34. GetMessageResult getResult = new GetMessageResult();
  35.  
  36. //获取当前最大消费进度
  37. final long maxOffsetPy = this.commitLog.getMaxOffset();
  38.  
  39. //TODO 获取消费队列信息
  40. ConsumeQueueInterface consumeQueue = findConsumeQueue(topic, queueId);
  41. if (consumeQueue != null) {
  42. minOffset = consumeQueue.getMinOffsetInQueue();
  43. maxOffset = consumeQueue.getMaxOffsetInQueue();
  44.  
  45. if (maxOffset == 0) {
  46. //offSet一直没有东西或者没有被消费过,那么将下一个初始的消费设置成0
  47. status = GetMessageStatus.NO_MESSAGE_IN_QUEUE;
  48. nextBeginOffset = nextOffsetCorrection(offset, 0);
  49. } else if (offset < minOffset) {
  50. //如果当前消费点位比最小的还小,那么它就是最小的
  51. status = GetMessageStatus.OFFSET_TOO_SMALL;
  52. nextBeginOffset = nextOffsetCorrection(offset, minOffset);
  53. } else if (offset == maxOffset) {
  54. //如果当前消费点位跟最大的相同
  55. status = GetMessageStatus.OFFSET_OVERFLOW_ONE;
  56. nextBeginOffset = nextOffsetCorrection(offset, offset);
  57. } else if (offset > maxOffset) {
  58. //如果当前消费点位已经比最大的还大了
  59. status = GetMessageStatus.OFFSET_OVERFLOW_BADLY;
  60. nextBeginOffset = nextOffsetCorrection(offset, maxOffset);
  61. } else {
  62. //当前消费点位在最大和最小的之间
  63. //一次拉取过滤的最大消息数量
  64. final int maxFilterMessageSize = Math.max(16000, maxMsgNums * consumeQueue.getUnitSize());
  65. final boolean diskFallRecorded = this.messageStoreConfig.isDiskFallRecorded();
  66.  
  67. //设置一次拉取最大的消息数量
  68. long maxPullSize = Math.max(maxTotalMsgSize, 100);
  69. if (maxPullSize > MAX_PULL_MSG_SIZE) {
  70. LOGGER.warn("The max pull size is too large maxPullSize={} topic={} queueId={}", maxPullSize, topic, queueId);
  71. maxPullSize = MAX_PULL_MSG_SIZE;
  72. }
  73. status = GetMessageStatus.NO_MATCHED_MESSAGE;
  74. long maxPhyOffsetPulling = 0;
  75. int cqFileNum = 0;
  76.  
  77. while (getResult.getBufferTotalSize() <= 0
  78. && nextBeginOffset < maxOffset
  79. && cqFileNum++ < this.messageStoreConfig.getTravelCqFileNumWhenGetMessage()) {
  80. //根据当前指定的点位进行过滤 nextBeginOffset就是这次需要从哪里开始拉
  81. ReferredIterator<CqUnit> bufferConsumeQueue = consumeQueue.iterateFrom(nextBeginOffset);
  82.  
  83. if (bufferConsumeQueue == null) {
  84. status = GetMessageStatus.OFFSET_FOUND_NULL;
  85. nextBeginOffset = nextOffsetCorrection(nextBeginOffset, this.consumeQueueStore.rollNextFile(consumeQueue, nextBeginOffset));
  86. LOGGER.warn("consumer request topic: " + topic + "offset: " + offset + " minOffset: " + minOffset + " maxOffset: "
  87. + maxOffset + ", but access logic queue failed. Correct nextBeginOffset to " + nextBeginOffset);
  88. break;
  89. }
  90.  
  91. try {
  92. long nextPhyFileStartOffset = Long.MIN_VALUE;
  93. /**
  94. * 当前拉取的点位小于最大的消费点位时,进行拉取
  95. */
  96. while (bufferConsumeQueue.hasNext()
  97. && nextBeginOffset < maxOffset) {
  98. CqUnit cqUnit = bufferConsumeQueue.next();
  99. //计算出消息在commitlog中存储的位置
  100. long offsetPy = cqUnit.getPos();
  101. //计算出消息在commitlog中存储的大小
  102. int sizePy = cqUnit.getSize();
  103.  
  104. //按照偏移量估算出提交的内存
  105. boolean isInMem = estimateInMemByCommitOffset(offsetPy, maxOffsetPy);
  106.  
  107. //如果当前大小已经超过指定过滤的大小,则不做处理 默认大小是16000
  108. if ((cqUnit.getQueueOffset() - offset) * consumeQueue.getUnitSize() > maxFilterMessageSize) {
  109. break;
  110. }
  111.  
  112. //判断是否已经满了
  113. if (this.isTheBatchFull(sizePy, cqUnit.getBatchNum(), maxMsgNums, maxPullSize, getResult.getBufferTotalSize(), getResult.getMessageCount(), isInMem)) {
  114. break;
  115. }
  116.  
  117. if (getResult.getBufferTotalSize() >= maxPullSize) {
  118. break;
  119. }
  120.  
  121. maxPhyOffsetPulling = offsetPy;
  122.  
  123. //Be careful, here should before the isTheBatchFull
  124. nextBeginOffset = cqUnit.getQueueOffset() + cqUnit.getBatchNum();
  125.  
  126. if (nextPhyFileStartOffset != Long.MIN_VALUE) {
  127. if (offsetPy < nextPhyFileStartOffset) {
  128. continue;
  129. }
  130. }
  131.  
  132. /**
  133. * 根据过滤器过滤消息
  134. */
  135. if (messageFilter != null
  136. && !messageFilter.isMatchedByConsumeQueue(cqUnit.getValidTagsCodeAsLong(), cqUnit.getCqExtUnit())) {
  137. if (getResult.getBufferTotalSize() == 0) {
  138. status = GetMessageStatus.NO_MATCHED_MESSAGE;
  139. }
  140.  
  141. continue;
  142. }
  143.  
  144. /**
  145. * 根据消费点位拉取到对应的消息流
  146. */
  147. SelectMappedBufferResult selectResult = this.commitLog.getMessage(offsetPy, sizePy);
  148. if (null == selectResult) {
  149. if (getResult.getBufferTotalSize() == 0) {
  150. status = GetMessageStatus.MESSAGE_WAS_REMOVING;
  151. }
  152.  
  153. nextPhyFileStartOffset = this.commitLog.rollNextFile(offsetPy);
  154. continue;
  155. }
  156.  
  157. //消息过滤
  158. if (messageFilter != null
  159. && !messageFilter.isMatchedByCommitLog(selectResult.getByteBuffer().slice(), null)) {
  160. if (getResult.getBufferTotalSize() == 0) {
  161. status = GetMessageStatus.NO_MATCHED_MESSAGE;
  162. }
  163. // release...
  164. selectResult.release();
  165. continue;
  166. }
  167. //填充拉取到的消息
  168. this.storeStatsService.getGetMessageTransferredMsgCount().add(cqUnit.getBatchNum());
  169. getResult.addMessage(selectResult, cqUnit.getQueueOffset(), cqUnit.getBatchNum());
  170. status = GetMessageStatus.FOUND;
  171. nextPhyFileStartOffset = Long.MIN_VALUE;
  172. }
  173. } finally {
  174. bufferConsumeQueue.release();
  175. }
  176. }
  177.  
  178. if (diskFallRecorded) {
  179. long fallBehind = maxOffsetPy - maxPhyOffsetPulling;
  180. brokerStatsManager.recordDiskFallBehindSize(group, topic, queueId, fallBehind);
  181. }
  182.  
  183. long diff = maxOffsetPy - maxPhyOffsetPulling;
  184. long memory = (long) (StoreUtil.TOTAL_PHYSICAL_MEMORY_SIZE
  185. * (this.messageStoreConfig.getAccessMessageInMemoryMaxRatio() / 100.0));
  186. getResult.setSuggestPullingFromSlave(diff > memory);
  187. }
  188. } else {
  189. status = GetMessageStatus.NO_MATCHED_LOGIC_QUEUE;
  190. nextBeginOffset = nextOffsetCorrection(offset, 0);
  191. }
  192.  
  193. //跟新本地成员变量统计信息
  194. if (GetMessageStatus.FOUND == status) {
  195. this.storeStatsService.getGetMessageTimesTotalFound().add(1);
  196. } else {
  197. this.storeStatsService.getGetMessageTimesTotalMiss().add(1);
  198. }
  199. long elapsedTime = this.getSystemClock().now() - beginTime;
  200. this.storeStatsService.setGetMessageEntireTimeMax(elapsedTime);
  201.  
  202. /**
  203. * 如果这次没有拉到数据,则把对应的消费点位放进来返回
  204. */
  205. // lazy init no data found.
  206. if (getResult == null) {
  207. getResult = new GetMessageResult(0);
  208. }
  209.  
  210. getResult.setStatus(status);
  211. getResult.setNextBeginOffset(nextBeginOffset);
  212. getResult.setMaxOffset(maxOffset);
  213. getResult.setMinOffset(minOffset);
  214. return getResult;
  215. }

  判断当前服务状态;

  核心处理:获取当前消费的最大进度,最大消费进度就是当前消费的位置,根据当前消费节点和当前持有文件初始节点计算;

  获取消费队列信息;

  计算当前队列的消费位置最大最小位置,如果offset时0说明offSet一直没有东西或者没有被消费过,那么将下一个初始的消费设置成0;如果当前点位比最小的点位还小,那么它就是最小的点位;如果它刚好等于最大的点位,说明它消费超过了一个,如果它比最大消费点位还大,说明它的消费是错误的;如果它刚好在最大最小中间,那么要知道我这次最多能过滤多少消息, Math.max(16000, maxMsgNums * consumeQueue.getUnitSize()); 我也要知道我最多能拉取多少消息 Math.max(maxTotalMsgSize, 100); 这时根据要拉取的点位遍历拉取:

  1. public SelectMappedBufferResult getMessage(final long offset, final int size) {
  2. int mappedFileSize = this.defaultMessageStore.getMessageStoreConfig().getMappedFileSizeCommitLog();
  3. MappedFile mappedFile = this.mappedFileQueue.findMappedFileByOffset(offset, offset == 0);
  4. if (mappedFile != null) {
  5. //获取到当前文件对应的位置,如果它小于1024 * 1024 * 1024 则就会在文件中顺序分配
  6. int pos = (int) (offset % mappedFileSize);
  7. return mappedFile.selectMappedBuffer(pos, size);
  8. }
  9. return null;
  10. }

  

    • 消费者消费

   消费完成后,核心的处理逻辑在 ConsumeMessageConcurrentlyService.this.processConsumeResult中实现:

   

  1. /**
  2. * TODO 消费者完成后的处理
  3. * @param status
  4. * @param context
  5. * @param consumeRequest
  6. */
  7. public void processConsumeResult(
  8. final ConsumeConcurrentlyStatus status,
  9. final ConsumeConcurrentlyContext context,
  10. final ConsumeRequest consumeRequest
  11. ) {
  12. int ackIndex = context.getAckIndex();
  13.  
  14. if (consumeRequest.getMsgs().isEmpty())
  15. return;
  16.  
  17. /**
  18. * 消费成功或失败的处理 默认ackIndex最大为Integer.max 这里需要计算一条消息或一批消息处理的偏移量
  19. * 如果设置的ackIndex大于当前处理消息的长度,则ackIndex应该是size -1
  20. */
  21. switch (status) {
  22. case CONSUME_SUCCESS:
  23. if (ackIndex >= consumeRequest.getMsgs().size()) {
  24. ackIndex = consumeRequest.getMsgs().size() - 1;
  25. }
  26. int ok = ackIndex + 1;
  27. int failed = consumeRequest.getMsgs().size() - ok;
  28. //维护消息处理成功或失败的量
  29. this.getConsumerStatsManager().incConsumeOKTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(), ok);
  30. this.getConsumerStatsManager().incConsumeFailedTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(), failed);
  31. break;
  32. case RECONSUME_LATER:
  33. ackIndex = -1;
  34. this.getConsumerStatsManager().incConsumeFailedTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(),
  35. consumeRequest.getMsgs().size());
  36. break;
  37. default:
  38. break;
  39. }
  40.  
  41. /**
  42. * 这里是针对消息重试的处理 广播模式是不需要消费重试的 所以不做任何处理
  43. * 集群模式处理有一点不同的是:如果上文返回的是处理失败,那么ackIndex一定为-1 这时你重试的消息就是这个request下所有的消息,因为从0的下标开始到结束都需要重试
  44. * 如果是批量消费,其实ackIndex设置的就是需要做重试的消息下标,那么上文 ackIndex = consumeRequest.getMsgs().size() - 1; 说明ackIndex是不会大于msgs最大数量的下标位置
  45. */
  46. switch (this.defaultMQPushConsumer.getMessageModel()) {
  47. case BROADCASTING:
  48. for (int i = ackIndex + 1; i < consumeRequest.getMsgs().size(); i++) {
  49. MessageExt msg = consumeRequest.getMsgs().get(i);
  50. log.warn("BROADCASTING, the message consume failed, drop it, {}", msg.toString());
  51. }
  52. break;
  53. case CLUSTERING:
  54. List<MessageExt> msgBackFailed = new ArrayList<>(consumeRequest.getMsgs().size());
  55. for (int i = ackIndex + 1; i < consumeRequest.getMsgs().size(); i++) {
  56. MessageExt msg = consumeRequest.getMsgs().get(i);
  57. // Maybe message is expired and cleaned, just ignore it.
  58. if (!consumeRequest.getProcessQueue().containsMessage(msg)) {
  59. log.info("Message is not found in its process queue; skip send-back-procedure, topic={}, "
  60. + "brokerName={}, queueId={}, queueOffset={}", msg.getTopic(), msg.getBrokerName(),
  61. msg.getQueueId(), msg.getQueueOffset());
  62. continue;
  63. }
  64. /**
  65. * 针对需要重试的消息,将消息发送sendMessageBack 并且将消息设置重试次数
  66. */
  67. boolean result = this.sendMessageBack(msg, context);
  68. if (!result) {
  69. msg.setReconsumeTimes(msg.getReconsumeTimes() + 1);
  70. msgBackFailed.add(msg);
  71. }
  72. }
  73.  
  74. /**
  75. * 将所有重试的消息进行回退,然后对成功处理的消息做进一步提交
  76. */
  77. if (!msgBackFailed.isEmpty()) {
  78. consumeRequest.getMsgs().removeAll(msgBackFailed);
  79.  
  80. this.submitConsumeRequestLater(msgBackFailed, consumeRequest.getProcessQueue(), consumeRequest.getMessageQueue());
  81. }
  82. break;
  83. default:
  84. break;
  85. }
  86.  
  87. /**
  88. * 计算处理的offSet偏移量 这里consumeRequest已经是成功处理的消息集合
  89. */
  90. long offset = consumeRequest.getProcessQueue().removeMessage(consumeRequest.getMsgs());
  91. if (offset >= 0 && !consumeRequest.getProcessQueue().isDropped()) {
  92. //更新消费节点 广播是通过本地处理 集群是通过更新broker消费节点
  93. this.defaultMQPushConsumerImpl.getOffsetStore().updateOffset(consumeRequest.getMessageQueue(), offset, true);
  94. }
  95. }

  如果消费完成,通过messageListener回调,封装了一层返回状态:如果消费成功,则需要处理ackIndex的数据。如果是单条消费,那么ack最多只有一个,如果是多条消费,那么ack的数量应该是msg.size - 1最大,那么先在本地变量保存一下当前处理的数量。

  然后是核心处理的能力:如果是广播消息,因为广播消息是不会重试的,所以无法再做任何处理,打个日志完事了;如果是集群消息,并且ackIndex返回了-1,那么这个消息一定是失败了,那么就需要走sendBack,通知broker将消息扔到重试队列里去,然后将消息的重试次数+1;

  对于已经成功的消息,我们需要更新掉它的偏移量,通过updateOffSet进行更新,同样区分更新方式,localFile其实更新的是本地的广播消费进度,remote是集群更新进度,集群的消费进度保存再broker中,但是其实这里都是更新了本地的offSetTable,其实在broker中会根据后续的动作会将offSet同步到broker中进行记录,这样新的消费实例就可以从broker保存的offset进行消费:

  1. /** TODO 消费同步模式 重要
  2. * 集群消费更新节点 其实可以看出在这里不管广播还是集群都是存储在了offsetTable中,其实会在后续推送到broker进行保存的
  3. * 这里有个误区,我们知道集群模式 一个queue会对应到一个消费者进行消费 一个消费者可以绑定多个队列进行pull 如果这里不存在rebalance时,这个消费者不会变化,它延后在注册心跳同步offSet是完全没有问题的
  4. * 但是如果这里触发了rebalance,这个消息可能在消费没来得及相应的情况下 进行了消费重排,这时这个队列在这个消费者下可能就是isDrop,但是新的消费者拉取消息时不会从当前的点位消费,而是从上一次成功提交
  5. * 的点位进行消费!
  6. * 当前保存的点位信息可能在同步或拉取时推送给broker
  7. * @see RemoteBrokerOffsetStore#persistAll(Set)
  8. * 在拉取时也会将当前的消费点位传入broker
  9. * @see org.apache.rocketmq.client.impl.consumer.DefaultMQPushConsumerImpl#pullMessage(PullRequest)
  10. */
  11. @Override
  12. public void updateOffset(MessageQueue mq, long offset, boolean increaseOnly) {
  13. if (mq != null) {
  14. AtomicLong offsetOld = this.offsetTable.get(mq);
  15. if (null == offsetOld) {
  16. offsetOld = this.offsetTable.putIfAbsent(mq, new AtomicLong(offset));
  17. }
  18.  
  19. if (null != offsetOld) {
  20. if (increaseOnly) {
  21. MixAll.compareAndIncreaseOnly(offsetOld, offset);
  22. } else {
  23. offsetOld.set(offset);
  24. }
  25. }
  26. }
  27. }

  它会根据另一个异步线程池定时将目前最新的 offset同步给broker。

  • index索引持久化

   在消息经过持久化进入commitlog后,相应的store也会对持久化的消息进行索引保存:在 ReputMessageService中:

  1. public void run() {
  2. DefaultMessageStore.LOGGER.info(this.getServiceName() + " service started");
  3.  
  4. while (!this.isStopped()) {
  5. try {
  6. Thread.sleep(1);
  7. this.doReput();
  8. } catch (Exception e) {
  9. DefaultMessageStore.LOGGER.warn(this.getServiceName() + " service has exception. ", e);
  10. }
  11. }
  12.  
  13. DefaultMessageStore.LOGGER.info(this.getServiceName() + " service end");
  14. }

  其中核心的操作就是doReput,它就是对index文件创建刷盘并给commitlog的消息创建索引的过程:

  1. /**
  2. * 自旋线程执行的方法
  3. */
  4. private void doReput() {
  5. if (this.reputFromOffset < DefaultMessageStore.this.commitLog.getMinOffset()) {
  6. LOGGER.warn("The reputFromOffset={} is smaller than minPyOffset={}, this usually indicate that the dispatch behind too much and the commitlog has expired.",
  7. this.reputFromOffset, DefaultMessageStore.this.commitLog.getMinOffset());
  8. this.reputFromOffset = DefaultMessageStore.this.commitLog.getMinOffset();
  9. }
  10. for (boolean doNext = true; this.isCommitLogAvailable() && doNext; ) {
  11.  
  12. //从commitlog中获取reput的offset对应的消息列表
  13. SelectMappedBufferResult result = DefaultMessageStore.this.commitLog.getData(reputFromOffset);
  14.  
  15. if (result == null) {
  16. break;
  17. }
  18.  
  19. try {
  20. this.reputFromOffset = result.getStartOffset();
  21.  
  22. //将对应的每条消息都封装成dispatchRequest
  23. for (int readSize = 0; readSize < result.getSize() && reputFromOffset < DefaultMessageStore.this.getConfirmOffset() && doNext; ) {
  24. DispatchRequest dispatchRequest =
  25. DefaultMessageStore.this.commitLog.checkMessageAndReturnSize(result.getByteBuffer(), false, false, false);
  26. int size = dispatchRequest.getBufferSize() == -1 ? dispatchRequest.getMsgSize() : dispatchRequest.getBufferSize();
  27.  
  28. if (reputFromOffset + size > DefaultMessageStore.this.getConfirmOffset()) {
  29. doNext = false;
  30. break;
  31. }
  32.  
  33. if (dispatchRequest.isSuccess()) {
  34. if (size > 0) {
  35. //如果dispatchRequest校验成功,消息检查成功,则执行doDispatch
  36. DefaultMessageStore.this.doDispatch(dispatchRequest);
  37.  
  38. if (DefaultMessageStore.this.brokerConfig.isLongPollingEnable()
  39. && DefaultMessageStore.this.messageArrivingListener != null) {
  40. DefaultMessageStore.this.messageArrivingListener.arriving(dispatchRequest.getTopic(),
  41. dispatchRequest.getQueueId(), dispatchRequest.getConsumeQueueOffset() + 1,
  42. dispatchRequest.getTagsCode(), dispatchRequest.getStoreTimestamp(),
  43. dispatchRequest.getBitMap(), dispatchRequest.getPropertiesMap());
  44. notifyMessageArrive4MultiQueue(dispatchRequest);
  45. }
  46.  
  47. this.reputFromOffset += size;
  48. readSize += size;
  49. if (!DefaultMessageStore.this.getMessageStoreConfig().isDuplicationEnable() &&
  50. DefaultMessageStore.this.getMessageStoreConfig().getBrokerRole() == BrokerRole.SLAVE) {
  51. DefaultMessageStore.this.storeStatsService
  52. .getSinglePutMessageTopicTimesTotal(dispatchRequest.getTopic()).add(dispatchRequest.getBatchSize());
  53. DefaultMessageStore.this.storeStatsService
  54. .getSinglePutMessageTopicSizeTotal(dispatchRequest.getTopic())
  55. .add(dispatchRequest.getMsgSize());
  56. }
  57. } else if (size == 0) {
  58. this.reputFromOffset = DefaultMessageStore.this.commitLog.rollNextFile(this.reputFromOffset);
  59. readSize = result.getSize();
  60. }
  61. } else {
  62. if (size > 0) {
  63. LOGGER.error("[BUG]read total count not equals msg total size. reputFromOffset={}", reputFromOffset);
  64. this.reputFromOffset += size;
  65. } else {
  66. doNext = false;
  67. // If user open the dledger pattern or the broker is master node,
  68. // it will not ignore the exception and fix the reputFromOffset variable
  69. if (DefaultMessageStore.this.getMessageStoreConfig().isEnableDLegerCommitLog() ||
  70. DefaultMessageStore.this.brokerConfig.getBrokerId() == MixAll.MASTER_ID) {
  71. LOGGER.error("[BUG]dispatch message to consume queue error, COMMITLOG OFFSET: {}",
  72. this.reputFromOffset);
  73. this.reputFromOffset += result.getSize() - readSize;
  74. }
  75. }
  76. }
  77. }
  78. } finally {
  79. result.release();
  80. }
  81. }
  82. }

  它根据reputOffset向commitlog拉取对应的消息列表,然后将这批消息进行批量构建索引,会将符合条件的所有的消息每个生成一个 DispatchRequest:

  核心的动作就是:

  1. /**
  2. * TODO 构建index索引并根据commitlog持久化消息处理核心代码
  3. */
  4. class CommitLogDispatcherBuildIndex implements CommitLogDispatcher {
  5.  
  6. @Override
  7. public void dispatch(DispatchRequest request) {
  8. if (DefaultMessageStore.this.messageStoreConfig.isMessageIndexEnable()) {
  9. //构建index索引
  10. DefaultMessageStore.this.indexService.buildIndex(request);
  11. }
  12. }
  13. }

  首先,获取对应的indexFile文件

  1. public void buildIndex(DispatchRequest req) {
  2. //尝试获取索引文件
  3. IndexFile indexFile = retryGetAndCreateIndexFile();
  4. if (indexFile != null) {
  5. long endPhyOffset = indexFile.getEndPhyOffset();
  6. DispatchRequest msg = req;
  7. String topic = msg.getTopic();
  8. String keys = msg.getKeys();
  9. //索引是根据commitlog的offset构建的,如果当前的消息小于当前已经构建的最大点位,则认为它是重复的消息
  10. if (msg.getCommitLogOffset() < endPhyOffset) {
  11. return;
  12. }
  13.  
  14. final int tranType = MessageSysFlag.getTransactionValue(msg.getSysFlag());
  15. switch (tranType) {
  16. case MessageSysFlag.TRANSACTION_NOT_TYPE:
  17. case MessageSysFlag.TRANSACTION_PREPARED_TYPE:
  18. case MessageSysFlag.TRANSACTION_COMMIT_TYPE:
  19. break;
  20. case MessageSysFlag.TRANSACTION_ROLLBACK_TYPE:
  21. return;
  22. }
  23.  
  24. /**
  25. * 生成索引
  26. */
  27. if (req.getUniqKey() != null) {
  28. indexFile = putKey(indexFile, msg, buildKey(topic, req.getUniqKey()));
  29. if (indexFile == null) {
  30. LOGGER.error("putKey error commitlog {} uniqkey {}", req.getCommitLogOffset(), req.getUniqKey());
  31. return;
  32. }
  33. }
  34.  
  35. if (keys != null && keys.length() > 0) {
  36. String[] keyset = keys.split(MessageConst.KEY_SEPARATOR);
  37. for (int i = 0; i < keyset.length; i++) {
  38. String key = keyset[i];
  39. if (key.length() > 0) {
  40. indexFile = putKey(indexFile, msg, buildKey(topic, key));
  41. if (indexFile == null) {
  42. LOGGER.error("putKey error commitlog {} uniqkey {}", req.getCommitLogOffset(), req.getUniqKey());
  43. return;
  44. }
  45. }
  46. }
  47. }
  48. } else {
  49. LOGGER.error("build index error, stop building index");
  50. }
  51. }

  更新索引文件后,会对每次最后一次更新的 时间戳进行index下的文件重命名。

  根据key进行消息查找,通过index文件:

  1. /**
  2. * TODO 根据索引key查找消息的核心代码
  3. * @return
  4. */
  5. @Override
  6. public QueryMessageResult queryMessage(String topic, String key, int maxNum, long begin, long end) {
  7. QueryMessageResult queryMessageResult = new QueryMessageResult();
  8.  
  9. long lastQueryMsgTime = end;
  10.  
  11. for (int i = 0; i < 3; i++) {
  12. //获取 IndexFile 索引文件中记录的消息在 CommitLog 文件物理偏移地址
  13. QueryOffsetResult queryOffsetResult = this.indexService.queryOffset(topic, key, maxNum, begin, lastQueryMsgTime);
  14. if (queryOffsetResult.getPhyOffsets().isEmpty()) {
  15. break;
  16. }
  17.  
  18. //排序 根据消费进度
  19. Collections.sort(queryOffsetResult.getPhyOffsets());
  20.  
  21. queryMessageResult.setIndexLastUpdatePhyoffset(queryOffsetResult.getIndexLastUpdatePhyoffset());
  22. queryMessageResult.setIndexLastUpdateTimestamp(queryOffsetResult.getIndexLastUpdateTimestamp());
  23.  
  24. for (int m = 0; m < queryOffsetResult.getPhyOffsets().size(); m++) {
  25. long offset = queryOffsetResult.getPhyOffsets().get(m);
  26.  
  27. try {
  28. MessageExt msg = this.lookMessageByOffset(offset);
  29. if (0 == m) {
  30. lastQueryMsgTime = msg.getStoreTimestamp();
  31. }
  32.  
  33. //根据消费点位在commitlog中查找
  34. SelectMappedBufferResult result = this.commitLog.getData(offset, false);
  35. if (result != null) {
  36. int size = result.getByteBuffer().getInt(0);
  37. result.getByteBuffer().limit(size);
  38. result.setSize(size);
  39. queryMessageResult.addMessage(result);
  40. }
  41. } catch (Exception e) {
  42. LOGGER.error("queryMessage exception", e);
  43. }
  44. }
  45.  
  46. if (queryMessageResult.getBufferTotalSize() > 0) {
  47. break;
  48. }
  49.  
  50. if (lastQueryMsgTime < begin) {
  51. break;
  52. }
  53. }
  54.  
  55. return queryMessageResult;
  56. }
  •  关于零拷贝

  了解零拷贝之前,我们先来了解一下常规的一次IO读取会经历哪些事情

  由于JVM本身不能操作内核,所以jvm进行一次IO时,会有一次内核的切换,DMA拷贝将内容拷贝到读取缓冲区中,再将内核切换为用户进程,再把内容拷贝到应用缓冲区中;

  发送同理,会先将内容通过CPU拷贝到套接字缓冲区中,再通过内核将套接字缓冲的内容通过DMA发送到网卡。这一共需要经历4次拷贝。

  mmap的零拷贝,采用的是将磁盘的内容直接拷贝到内核缓冲区,内核缓冲区可以看做一个虚拟内存,所以是3次拷贝。

  sendfile的零拷贝,采用的是将内核缓冲区直接拷贝到网卡去,所以是两次拷贝。(rocket采用的是mmap,kfaka采用的是sendfile)

  

  使用mmap+write方式(rocket)
  优点:即使频繁调用,使用小文件块传输,效率也很高
  缺点:不能很好的利用DMA方式,会比sendfile多消耗CPU资源,内存安全性控制复杂,需要避免JVM Crash问题
  使用sendfile方式(kfaka)
  优点:可以利用DMA方式,消耗CPU资源少,大块文件传输效率高,无内存安全新问题
  缺点:小块文件效率低于mmap方式,只能是BIO方式传输,不能使用NIO

  看一个实例:

  1. ServerSocket serverSocket = new ServerSocket(8999);
  2. while (true){
  3. Socket socket = serverSocket.accept();
  4. DataInputStream dataInputStream = new DataInputStream(socket.getInputStream());
  5. AtomicInteger integer = new AtomicInteger(0);
  6. try {
  7. byte[] buffer = new byte[1024];
  8. while (true){
  9. int read = dataInputStream.read(buffer, 0, buffer.length);
  10. integer.addAndGet(read);
  11. if (read == -1){
  12. System.out.println("接收:" + integer.get());
  13. integer = null;
  14. break;
  15. }
  16. }
  17. } catch (IOException e) {
  18. e.printStackTrace();
  19. }
  1.      Socket socket = new Socket("localhost", 8999);
  2. String fileName = "E://workSpace//store.log";//37.8 MB (39,703,524 字节)
  3. InputStream inputStream = new FileInputStream(fileName);
  4. DataOutputStream dataOutputStream = new DataOutputStream(socket.getOutputStream());
  5. try {
  6. byte[] buffer = new byte[1024];
  7. Integer read, total = 0;
  8. long time = System.currentTimeMillis();
  9. while ((read = inputStream.read(buffer)) > 0){
  10. total += read;
  11. dataOutputStream.write(buffer);
  12. }
  13. long end = System.currentTimeMillis();
  14. System.out.println("发送" + total + ",用时:" + ((end - time) ));
  15. } finally {
  16. dataOutputStream.close();
  17. socket.close();
  18. inputStream.close();
  19. }
  1. SocketChannel socketChannel = SocketChannel.open();
  2. socketChannel.connect(new InetSocketAddress("localhost", 8999));
  3. socketChannel.configureBlocking(true);
  4. String fileName = "E://workSpace//store.log";//37.8 MB (39,703,524 字节)
  5. FileChannel fileChannel = null;
  6. try {
  7. fileChannel = new FileInputStream(fileName).getChannel();
  8. long size = fileChannel.size();
  9. long position = 0;
  10. long total = 0;
  11. long timeMillis = System.currentTimeMillis();
  12. while (position < size) {
  13. long currentNum = fileChannel.transferTo(position, fileChannel.size(), socketChannel);
  14. if (currentNum <= 0) {
  15. break;
  16. }
  17. total += currentNum;
  18. position += currentNum;
  19. }
  20. long timeMillis1 = System.currentTimeMillis();
  21. System.out.println("发送:" + total + ",用时:"+ (timeMillis1 - timeMillis) );
  22. } finally {
  23. fileChannel.close();
  24. socketChannel.close();
  25. }

  上面提供了两种方式,传统的IO读写和mmap的读写,基于socket发送数据

  

  会发现,零拷贝的方式比传统的方式从读取到发送快了百分之70 -80左右

  所以,如果你需要优化网络传输的性能,或者文件读写的速度,请尽量使用零拷贝。它不仅能较少复制拷贝次数,还能较少上下文切换。

【打怪升级】【rocketMq】rocket的持久化的更多相关文章

  1. SDUT oj 3005 打怪升级(内存搜索)

    当比赛一直纠缠骑2如何做一个非常大的数量,数组不开啊...后来他们发现自己很傻啊,该数不超过最大10什么,这个上限就是力量100什么.. .. 其它的就是记忆化搜索啊,还有就是加一点力量的瓶子当时就要 ...

  2. Flask连接数据库打怪升级之旅

    一.前言 在初学 Flask 的时候,在数据库连接这部分也跟每个初学者一样.但是随着工作中项目接手的多了,代码写的多了,历练的多了也就有了自己的经验和技巧.在对这块儿代码不断的进行升级改造后,整理了在 ...

  3. 从苦逼到牛逼,详解Linux运维工程师的打怪升级之路

    做运维也快四年多了,就像游戏打怪升级,升级后知识体系和运维体系也相对变化挺大,学习了很多新的知识点. 运维工程师是从一个呆逼进化为苦逼再成长为牛逼的过程,前提在于你要能忍能干能拼,还要具有敏锐的嗅觉感 ...

  4. 运维工程师打怪升级进阶之路 V2.0

    在此之前,发布过两个版本: 运维工程师打怪升级之路 V1.0 版本发布 运维工程师打怪升级必经之路 V1.0.1 很多读者伙伴们反应总结的很系统.很全面,无论是0基础初学者,还是有基础的入门者,或者是 ...

  5. 1255: 打怪升级(Java)

    WUSTOJ 1255: 打怪升级 Description 对于多数RPG游戏来说,除了剧情就是打怪升级.本题的任务是用最短的时间取得所有战斗的胜利.这些战斗必须按照特定的顺序进行,每打赢一场,都可能 ...

  6. 20190528-JavaScriptの打怪升级旅行 { 语句 [ 赋值 ,数据 ] }

    写在前面的乱七八糟:今天考了试,emmm很基础的题,还是Mrs房的面试题让人绝望啊┓( ´∀` )┏,补了很多知识,很综合的题,坑也很多,总的来说,查漏补缺,其实是啥都缺~ 今天打的小BOSS主要是数 ...

  7. Oracle打怪升级之路二【视图、序列、游标、索引、存储过程、触发器】

    前言 在之前 <Oracle打怪升级之路一>中我们主要介绍了Oracle的基础和Oracle常用查询及函数,这篇文章作为补充,主要介绍Oracle的对象,视图.序列.同义词.索引等,以及P ...

  8. 沧桑巨变中焕发青春活力-记极1s HC5661A 打怪升级之路

    最近发现一个新货umaxhosting年付10美元的便宜VPS.2杯喜茶的价格可以让你在国外拥有一个1024MB (1GB) DDR3 RAM.1024MB (1GB) vSwap.70GB RAID ...

  9. Harbor打怪升级

    目录 一.目标 二.V1.4升级至V1.6 三.V1.6升级至V1.9 四.V1.9升级至V2.0 五.写在最后 一.目标 Harbor V1.4版本升级至V2.0 注: Harbor升级需要注意的是 ...

  10. “奥特曼攻打小怪兽”java学习打怪升级第一步

    ---恢复内容开始--- 练习:回合制对战游戏:奥特曼和小怪兽进行PK,直到一方的血量为0时结束战斗,输出谁胜利了! 不难看出场景中有两个对象:”奥特曼“这一对象抽象为”Ao"类:     ...

随机推荐

  1. 微信字体大小调整导致的H5页面错乱问题处理

    当用户调整微信字体大小时会导致H5页面错乱,解决方案如下: ios:在css中加入-webkit-text-size-adjust: 100% !important;   body {   -webk ...

  2. Pytest Fixture(三)

    name: name参数表示可以对fixture的名称进行重命名: 注意:通过name重命名后,继续使用以前的名字调用会报错. import pytest @pytest.fixture(name=' ...

  3. ProcessLassoLauncher.exe

    html, body { font-size: 15px } body { font-family: Helvetica, "Hiragino Sans GB", 微软雅黑, &q ...

  4. 用python判断三角形的形状

    # coding:utf-8 class point: def __init__(self,x,y,name): self.x = x self.y = y self.name = name '''两 ...

  5. es6中箭头函数和this指向

    箭头函数相当于匿名函数,简化了函数定义. 箭头函数有两种写法,当函数体是单条语句的时候可以省略{}和return. 另一种是包含多条语句,不可以省略{}和return. 特点 箭头函数最大的特点就是没 ...

  6. 【python】python3.7与3.9共存,两个3版本同时存在(平时用vscode敲代码)pip复制

    1.按照安装python及环境配置 - 人间寒梅 - 博客园 (cnblogs.com),将3.9装好. 2.在官网下载3.7的对应文件 3.下载后运行,并自定义下载且选中添加到path.,自己为py ...

  7. 方法(Java)

    什么是方法? 基本介绍 在其他语言中也叫函数 System.out.println();类名.对象.方法: Java方法是语句的集合,它们在一起执行一个功能 方法是解决一类问题的步骤的有序集合 方法包 ...

  8. PLC入门笔记6

    计数器指令及其应用 计数器指令介绍 很多场合需要进行计数操作.例如电机启动次数.生产线物料经过次数.位置传感器传送的脉冲次数等. 计数器分为普通和高速两种. 比PLC扫描频率远小于用普通,接近或大于用 ...

  9. pandas(随时更新)

    pandas处理一个表中的一列数据被另一个表中的另一列数据替换: df1=pd.DataFrame({'id':[1,2,3],'name':['Andy1','Jacky1','Bruce1']}) ...

  10. oracle中的!=与<>和^=

    oracle中的!=与<>和^=!= . <>.^= 三个符号都表示"不等于"的意思,在逻辑上没有本质区别但是要主义的是三个符号在表达"不等于&q ...