spark streaming 接收kafka消息之三 -- kafka broker 如何处理 fetch 请求

首先看一下 KafkaServer 这个类的声明：

Represents the lifecycle of a single Kafka broker. Handles all functionality required to start up and shutdown a single Kafka node.

代表了单个 broker 的生命周期，处理所有功能性的请求，以及startup 和shutdown 一个broker node。

在这个类的startup中，有一个线程池被实例化了：

/* start processing requests */

// 处理所有的请求

apis = new KafkaApis(socketServer.requestChannel, replicaManager, adminManager, groupCoordinator, transactionCoordinator,

  kafkaController, zkUtils, config.brokerId, config, metadataCache, metrics, authorizer, quotaManagers,

  brokerTopicStats, clusterId, time)

 // 请求处理的线程池

requestHandlerPool = new KafkaRequestHandlerPool(config.brokerId, socketServer.requestChannel, apis, time,

  config.numIoThreads)

KafkaRequestHandlerPool 的源代码如下：

 class KafkaRequestHandlerPool(val brokerId: Int,

                               val requestChannel: RequestChannel,

                               val apis: KafkaApis,

                               time: Time,

                               numThreads: Int) extends Logging with KafkaMetricsGroup {

   /* a meter to track the average free capacity of the request handlers */

   private val aggregateIdleMeter = newMeter("RequestHandlerAvgIdlePercent", "percent", TimeUnit.NANOSECONDS)

   this.logIdent = "[Kafka Request Handler on Broker " + brokerId + "], "

   val runnables = new Array[KafkaRequestHandler](numThreads)

   for(i <- 0 until numThreads) { // 实例化所有runnable 对象

     runnables(i) = new KafkaRequestHandler(i, brokerId, aggregateIdleMeter, numThreads, requestChannel, apis, time)

 // 初始化并启动daemon thread

     Utils.daemonThread("kafka-request-handler-" + i, runnables(i)).start()

   }

  // 关闭线程池中的所有的线程

   def shutdown() {

     info("shutting down")

     for (handler <- runnables)

       handler.initiateShutdown()

     for (handler <- runnables)

       handler.awaitShutdown()

     info("shut down completely")

   }

 }

再看一下 KafkaRequestHandler 的源码：

 class KafkaRequestHandler(id: Int,

                           brokerId: Int,

                           val aggregateIdleMeter: Meter,

                           val totalHandlerThreads: Int,

                           val requestChannel: RequestChannel,

                           apis: KafkaApis,

                           time: Time) extends Runnable with Logging {

   this.logIdent = "[Kafka Request Handler " + id + " on Broker " + brokerId + "], "

   private val latch = new CountDownLatch(1)

   def run() {

     while (true) { // 这个 run 方法会一直运行

       try {

         var req : RequestChannel.Request = null

         while (req == null) { // 如果没有 请求过来，就一直死循环下去

           // We use a single meter for aggregate idle percentage for the thread pool.

           // Since meter is calculated as total_recorded_value / time_window and

           // time_window is independent of the number of threads, each recorded idle

           // time should be discounted by # threads.

           val startSelectTime = time.nanoseconds

           req = requestChannel.receiveRequest(300)

           val endTime = time.nanoseconds

           if (req != null)

             req.requestDequeueTimeNanos = endTime

           val idleTime = endTime - startSelectTime

           aggregateIdleMeter.mark(idleTime / totalHandlerThreads)

         }

         if (req eq RequestChannel.AllDone) {

           debug("Kafka request handler %d on broker %d received shut down command".format(id, brokerId))

           latch.countDown()

           return

         }

         trace("Kafka request handler %d on broker %d handling request %s".format(id, brokerId, req))

         apis.handle(req) // 处理请求

       } catch {

         case e: FatalExitError =>

           latch.countDown()

           Exit.exit(e.statusCode)

         case e: Throwable => error("Exception when handling request", e)

       }

     }

   }

   def initiateShutdown(): Unit = requestChannel.sendRequest(RequestChannel.AllDone)

   def awaitShutdown(): Unit = latch.await()

 }

重点看一下， kafka.server.KafkaApis#handle 源码：

 /**

  * Top-level method that handles all requests and multiplexes to the right api

  */

 def handle(request: RequestChannel.Request) {

   try {

     trace("Handling request:%s from connection %s;securityProtocol:%s,principal:%s".

       format(request.requestDesc(true), request.connectionId, request.securityProtocol, request.session.principal))

     ApiKeys.forId(request.requestId) match {

       case ApiKeys.PRODUCE => handleProduceRequest(request)

       case ApiKeys.FETCH => handleFetchRequest(request) // 这是请求fetch消息的请求

       case ApiKeys.LIST_OFFSETS => handleListOffsetRequest(request)

       case ApiKeys.METADATA => handleTopicMetadataRequest(request)

       case ApiKeys.LEADER_AND_ISR => handleLeaderAndIsrRequest(request)

       case ApiKeys.STOP_REPLICA => handleStopReplicaRequest(request)

       case ApiKeys.UPDATE_METADATA_KEY => handleUpdateMetadataRequest(request)

       case ApiKeys.CONTROLLED_SHUTDOWN_KEY => handleControlledShutdownRequest(request)

       case ApiKeys.OFFSET_COMMIT => handleOffsetCommitRequest(request)

       case ApiKeys.OFFSET_FETCH => handleOffsetFetchRequest(request)

       case ApiKeys.FIND_COORDINATOR => handleFindCoordinatorRequest(request)

       case ApiKeys.JOIN_GROUP => handleJoinGroupRequest(request)

       case ApiKeys.HEARTBEAT => handleHeartbeatRequest(request)

       case ApiKeys.LEAVE_GROUP => handleLeaveGroupRequest(request)

       case ApiKeys.SYNC_GROUP => handleSyncGroupRequest(request)

       case ApiKeys.DESCRIBE_GROUPS => handleDescribeGroupRequest(request)

       case ApiKeys.LIST_GROUPS => handleListGroupsRequest(request)

       case ApiKeys.SASL_HANDSHAKE => handleSaslHandshakeRequest(request)

       case ApiKeys.API_VERSIONS => handleApiVersionsRequest(request)

       case ApiKeys.CREATE_TOPICS => handleCreateTopicsRequest(request)

       case ApiKeys.DELETE_TOPICS => handleDeleteTopicsRequest(request)

       case ApiKeys.DELETE_RECORDS => handleDeleteRecordsRequest(request)

       case ApiKeys.INIT_PRODUCER_ID => handleInitProducerIdRequest(request)

       case ApiKeys.OFFSET_FOR_LEADER_EPOCH => handleOffsetForLeaderEpochRequest(request)

       case ApiKeys.ADD_PARTITIONS_TO_TXN => handleAddPartitionToTxnRequest(request)

       case ApiKeys.ADD_OFFSETS_TO_TXN => handleAddOffsetsToTxnRequest(request)

       case ApiKeys.END_TXN => handleEndTxnRequest(request)

       case ApiKeys.WRITE_TXN_MARKERS => handleWriteTxnMarkersRequest(request)

       case ApiKeys.TXN_OFFSET_COMMIT => handleTxnOffsetCommitRequest(request)

       case ApiKeys.DESCRIBE_ACLS => handleDescribeAcls(request)

       case ApiKeys.CREATE_ACLS => handleCreateAcls(request)

       case ApiKeys.DELETE_ACLS => handleDeleteAcls(request)

       case ApiKeys.ALTER_CONFIGS => handleAlterConfigsRequest(request)

       case ApiKeys.DESCRIBE_CONFIGS => handleDescribeConfigsRequest(request)

     }

   } catch {

     case e: FatalExitError => throw e

     case e: Throwable => handleError(request, e)

   } finally {

     request.apiLocalCompleteTimeNanos = time.nanoseconds

   }

 }

再看 handleFetchRequest：

 // call the replica manager to fetch messages from the local replica

     replicaManager.fetchMessages(

       fetchRequest.maxWait.toLong, // 在这里是 0

       fetchRequest.replicaId,

       fetchRequest.minBytes,

       fetchRequest.maxBytes,

       versionId <= 2,

       authorizedRequestInfo,

       replicationQuota(fetchRequest),

       processResponseCallback,

       fetchRequest.isolationLevel)

fetchMessage 源码如下：

 /**

  * Fetch messages from the leader replica, and wait until enough data can be fetched and return;

  * the callback function will be triggered either when timeout or required fetch info is satisfied

  */

 def fetchMessages(timeout: Long,

                   replicaId: Int,

                   fetchMinBytes: Int,

                   fetchMaxBytes: Int,

                   hardMaxBytesLimit: Boolean,

                   fetchInfos: Seq[(TopicPartition, PartitionData)],

                   quota: ReplicaQuota = UnboundedQuota,

                   responseCallback: Seq[(TopicPartition, FetchPartitionData)] => Unit,

                   isolationLevel: IsolationLevel) {

   val isFromFollower = replicaId >= 0

   val fetchOnlyFromLeader: Boolean = replicaId != Request.DebuggingConsumerId

   val fetchOnlyCommitted: Boolean = ! Request.isValidBrokerId(replicaId)

  // 从本地 logs 中读取数据

   // read from local logs

   val logReadResults = readFromLocalLog(

     replicaId = replicaId,

     fetchOnlyFromLeader = fetchOnlyFromLeader,

     readOnlyCommitted = fetchOnlyCommitted,

     fetchMaxBytes = fetchMaxBytes,

     hardMaxBytesLimit = hardMaxBytesLimit,

     readPartitionInfo = fetchInfos,

     quota = quota,

     isolationLevel = isolationLevel)

   // if the fetch comes from the follower,

   // update its corresponding log end offset

   if(Request.isValidBrokerId(replicaId))

     updateFollowerLogReadResults(replicaId, logReadResults)

   // check if this fetch request can be satisfied right away

   val logReadResultValues = logReadResults.map { case (_, v) => v }

   val bytesReadable = logReadResultValues.map(_.info.records.sizeInBytes).sum

   val errorReadingData = logReadResultValues.foldLeft(false) ((errorIncurred, readResult) =>

     errorIncurred || (readResult.error != Errors.NONE))

  // 立即返回的四个条件：

 // 1. Fetch 请求不希望等待

 // 2. Fetch 请求不请求任何数据

 // 3. 有足够数据可以返回

 // 4. 当读取数据的时候有error 发生

   // respond immediately if 1) fetch request does not want to wait

   //                        2) fetch request does not require any data

   //                        3) has enough data to respond

   //                        4) some error happens while reading data

   if (timeout <= 0 || fetchInfos.isEmpty || bytesReadable >= fetchMinBytes || errorReadingData) {

     val fetchPartitionData = logReadResults.map { case (tp, result) =>

       tp -> FetchPartitionData(result.error, result.highWatermark, result.leaderLogStartOffset, result.info.records,

         result.lastStableOffset, result.info.abortedTransactions)

     }

     responseCallback(fetchPartitionData)

   } else {// DelayedFetch

     // construct the fetch results from the read results

     val fetchPartitionStatus = logReadResults.map { case (topicPartition, result) =>

       val fetchInfo = fetchInfos.collectFirst {

         case (tp, v) if tp == topicPartition => v

       }.getOrElse(sys.error(s"Partition $topicPartition not found in fetchInfos"))

       (topicPartition, FetchPartitionStatus(result.info.fetchOffsetMetadata, fetchInfo))

     }

     val fetchMetadata = FetchMetadata(fetchMinBytes, fetchMaxBytes, hardMaxBytesLimit, fetchOnlyFromLeader,

       fetchOnlyCommitted, isFromFollower, replicaId, fetchPartitionStatus)

     val delayedFetch = new DelayedFetch(timeout, fetchMetadata, this, quota, isolationLevel, responseCallback)

     // create a list of (topic, partition) pairs to use as keys for this delayed fetch operation

     val delayedFetchKeys = fetchPartitionStatus.map { case (tp, _) => new TopicPartitionOperationKey(tp) }

     // try to complete the request immediately, otherwise put it into the purgatory;

     // this is because while the delayed fetch operation is being created, new requests

     // may arrive and hence make this operation completable.

     delayedFetchPurgatory.tryCompleteElseWatch(delayedFetch, delayedFetchKeys)

   }

 }

继续追踪 readFromLocalLog 源码：

 /**

  * Read from multiple topic partitions at the given offset up to maxSize bytes

  */

 // 他负责从多个 topic partition中读数据到最大值，默认1M

 隔离级别： 读已提交、读未提交

 def readFromLocalLog(replicaId: Int,

                      fetchOnlyFromLeader: Boolean,

                      readOnlyCommitted: Boolean,

                      fetchMaxBytes: Int,

                      hardMaxBytesLimit: Boolean,

                      readPartitionInfo: Seq[(TopicPartition, PartitionData)],

                      quota: ReplicaQuota,

                      isolationLevel: IsolationLevel): Seq[(TopicPartition, LogReadResult)] = {

   def read(tp: TopicPartition, fetchInfo: PartitionData, limitBytes: Int, minOneMessage: Boolean): LogReadResult = {

     val offset = fetchInfo.fetchOffset

     val partitionFetchSize = fetchInfo.maxBytes

     val followerLogStartOffset = fetchInfo.logStartOffset

     brokerTopicStats.topicStats(tp.topic).totalFetchRequestRate.mark()

     brokerTopicStats.allTopicsStats.totalFetchRequestRate.mark()

     try {

       trace(s"Fetching log segment for partition $tp, offset $offset, partition fetch size $partitionFetchSize, " +

         s"remaining response limit $limitBytes" +

         (if (minOneMessage) s", ignoring response/partition size limits" else ""))

       // decide whether to only fetch from leader

       val localReplica = if (fetchOnlyFromLeader)

         getLeaderReplicaIfLocal(tp)

       else

         getReplicaOrException(tp)

       val initialHighWatermark = localReplica.highWatermark.messageOffset

       val lastStableOffset = if (isolationLevel == IsolationLevel.READ_COMMITTED)

         Some(localReplica.lastStableOffset.messageOffset)

       else

         None

       // decide whether to only fetch committed data (i.e. messages below high watermark)

       val maxOffsetOpt = if (readOnlyCommitted)

         Some(lastStableOffset.getOrElse(initialHighWatermark))

       else

         None

       /* Read the LogOffsetMetadata prior to performing the read from the log.

        * We use the LogOffsetMetadata to determine if a particular replica is in-sync or not.

        * Using the log end offset after performing the read can lead to a race condition

        * where data gets appended to the log immediately after the replica has consumed from it

        * This can cause a replica to always be out of sync.

        */

       val initialLogEndOffset = localReplica.logEndOffset.messageOffset

       val initialLogStartOffset = localReplica.logStartOffset

       val fetchTimeMs = time.milliseconds

       val logReadInfo = localReplica.log match {

         case Some(log) =>

           val adjustedFetchSize = math.min(partitionFetchSize, limitBytes)

           // Try the read first, this tells us whether we need all of adjustedFetchSize for this partition

 // 尝试从 Log 中读取数据

           val fetch = log.read(offset, adjustedFetchSize, maxOffsetOpt, minOneMessage, isolationLevel)

           // If the partition is being throttled, simply return an empty set.

           if (shouldLeaderThrottle(quota, tp, replicaId))

             FetchDataInfo(fetch.fetchOffsetMetadata, MemoryRecords.EMPTY)

           // For FetchRequest version 3, we replace incomplete message sets with an empty one as consumers can make

           // progress in such cases and don't need to report a `RecordTooLargeException`

           else if (!hardMaxBytesLimit && fetch.firstEntryIncomplete)

             FetchDataInfo(fetch.fetchOffsetMetadata, MemoryRecords.EMPTY)

           else fetch

         case None =>

           error(s"Leader for partition $tp does not have a local log")

           FetchDataInfo(LogOffsetMetadata.UnknownOffsetMetadata, MemoryRecords.EMPTY)

       }

       LogReadResult(info = logReadInfo,

                     highWatermark = initialHighWatermark,

                     leaderLogStartOffset = initialLogStartOffset,

                     leaderLogEndOffset = initialLogEndOffset,

                     followerLogStartOffset = followerLogStartOffset,

                     fetchTimeMs = fetchTimeMs,

                     readSize = partitionFetchSize,

                     lastStableOffset = lastStableOffset,

                     exception = None)

     } catch {

       // NOTE: Failed fetch requests metric is not incremented for known exceptions since it

       // is supposed to indicate un-expected failure of a broker in handling a fetch request

       case e@ (_: UnknownTopicOrPartitionException |

                _: NotLeaderForPartitionException |

                _: ReplicaNotAvailableException |

                _: OffsetOutOfRangeException) =>

         LogReadResult(info = FetchDataInfo(LogOffsetMetadata.UnknownOffsetMetadata, MemoryRecords.EMPTY),

                       highWatermark = -1L,

                       leaderLogStartOffset = -1L,

                       leaderLogEndOffset = -1L,

                       followerLogStartOffset = -1L,

                       fetchTimeMs = -1L,

                       readSize = partitionFetchSize,

                       lastStableOffset = None,

                       exception = Some(e))

       case e: Throwable =>

         brokerTopicStats.topicStats(tp.topic).failedFetchRequestRate.mark()

         brokerTopicStats.allTopicsStats.failedFetchRequestRate.mark()

         error(s"Error processing fetch operation on partition $tp, offset $offset", e)

         LogReadResult(info = FetchDataInfo(LogOffsetMetadata.UnknownOffsetMetadata, MemoryRecords.EMPTY),

                       highWatermark = -1L,

                       leaderLogStartOffset = -1L,

                       leaderLogEndOffset = -1L,

                       followerLogStartOffset = -1L,

                       fetchTimeMs = -1L,

                       readSize = partitionFetchSize,

                       lastStableOffset = None,

                       exception = Some(e))

     }

   }

  // maxSize， 默认1M

   var limitBytes = fetchMaxBytes

   val result = new mutable.ArrayBuffer[(TopicPartition, LogReadResult)]

   var minOneMessage = !hardMaxBytesLimit // hardMaxBytesLimit

   readPartitionInfo.foreach { case (tp, fetchInfo) =>

     val readResult = read(tp, fetchInfo, limitBytes, minOneMessage)

     val messageSetSize = readResult.info.records.sizeInBytes

     // Once we read from a non-empty partition, we stop ignoring request and partition level size limits

     if (messageSetSize > 0)

       minOneMessage = false

     limitBytes = math.max(0, limitBytes - messageSetSize)

     result += (tp -> readResult)

   }

   result

 }

Log.read 源码如下：

 /**

  * Read messages from the log.

  *

  * @param startOffset The offset to begin reading at

  * @param maxLength The maximum number of bytes to read

  * @param maxOffset The offset to read up to, exclusive. (i.e. this offset NOT included in the resulting message set)

  * @param minOneMessage If this is true, the first message will be returned even if it exceeds `maxLength` (if one exists)

  * @param isolationLevel The isolation level of the fetcher. The READ_UNCOMMITTED isolation level has the traditional

  *                       read semantics (e.g. consumers are limited to fetching up to the high watermark). In

  *                       READ_COMMITTED, consumers are limited to fetching up to the last stable offset. Additionally,

  *                       in READ_COMMITTED, the transaction index is consulted after fetching to collect the list

  *                       of aborted transactions in the fetch range which the consumer uses to filter the fetched

  *                       records before they are returned to the user. Note that fetches from followers always use

  *                       READ_UNCOMMITTED.

  *

  * @throws OffsetOutOfRangeException If startOffset is beyond the log end offset or before the log start offset

  * @return The fetch data information including fetch starting offset metadata and messages read.

  */

 def read(startOffset: Long, maxLength: Int, maxOffset: Option[Long] = None, minOneMessage: Boolean = false,

          isolationLevel: IsolationLevel): FetchDataInfo = {

   trace("Reading %d bytes from offset %d in log %s of length %d bytes".format(maxLength, startOffset, name, size))

   // Because we don't use lock for reading, the synchronization is a little bit tricky.

   // We create the local variables to avoid race conditions with updates to the log.

   val currentNextOffsetMetadata = nextOffsetMetadata

   val next = currentNextOffsetMetadata.messageOffset

   if (startOffset == next) {

     val abortedTransactions =

       if (isolationLevel == IsolationLevel.READ_COMMITTED) Some(List.empty[AbortedTransaction])

       else None

     return FetchDataInfo(currentNextOffsetMetadata, MemoryRecords.EMPTY, firstEntryIncomplete = false,

       abortedTransactions = abortedTransactions)

   }

   var segmentEntry = segments.floorEntry(startOffset)

   // return error on attempt to read beyond the log end offset or read below log start offset

   if (startOffset > next || segmentEntry == null || startOffset < logStartOffset)

     throw new OffsetOutOfRangeException("Request for offset %d but we only have log segments in the range %d to %d.".format(startOffset, logStartOffset, next))

   // Do the read on the segment with a base offset less than the target offset

   // but if that segment doesn't contain any messages with an offset greater than that

   // continue to read from successive segments until we get some messages or we reach the end of the log

   while(segmentEntry != null) {

     val segment = segmentEntry.getValue

     // If the fetch occurs on the active segment, there might be a race condition where two fetch requests occur after

     // the message is appended but before the nextOffsetMetadata is updated. In that case the second fetch may

     // cause OffsetOutOfRangeException. To solve that, we cap the reading up to exposed position instead of the log

     // end of the active segment.

     val maxPosition = {

       if (segmentEntry == segments.lastEntry) {

         val exposedPos = nextOffsetMetadata.relativePositionInSegment.toLong

         // Check the segment again in case a new segment has just rolled out.

         if (segmentEntry != segments.lastEntry)

           // New log segment has rolled out, we can read up to the file end.

           segment.size

         else

           exposedPos

       } else {

         segment.size

       }

     }

 // 从segment 中去读取数据

     val fetchInfo = segment.read(startOffset, maxOffset, maxLength, maxPosition, minOneMessage)

     if (fetchInfo == null) {

       segmentEntry = segments.higherEntry(segmentEntry.getKey)

     } else {

       return isolationLevel match {

         case IsolationLevel.READ_UNCOMMITTED => fetchInfo

         case IsolationLevel.READ_COMMITTED => addAbortedTransactions(startOffset, segmentEntry, fetchInfo)

       }

     }

   }

   // okay we are beyond the end of the last segment with no data fetched although the start offset is in range,

   // this can happen when all messages with offset larger than start offsets have been deleted.

   // In this case, we will return the empty set with log end offset metadata

   FetchDataInfo(nextOffsetMetadata, MemoryRecords.EMPTY)

 }

LogSegment 的 read 方法：

 /**

  * Read a message set from this segment beginning with the first offset >= startOffset. The message set will include

  * no more than maxSize bytes and will end before maxOffset if a maxOffset is specified.

  *

  * @param startOffset A lower bound on the first offset to include in the message set we read

  * @param maxSize The maximum number of bytes to include in the message set we read

  * @param maxOffset An optional maximum offset for the message set we read

  * @param maxPosition The maximum position in the log segment that should be exposed for read

  * @param minOneMessage If this is true, the first message will be returned even if it exceeds `maxSize` (if one exists)

  *

  * @return The fetched data and the offset metadata of the first message whose offset is >= startOffset,

  *         or null if the startOffset is larger than the largest offset in this log

  */

 @threadsafe

 def read(startOffset: Long, maxOffset: Option[Long], maxSize: Int, maxPosition: Long = size,

          minOneMessage: Boolean = false): FetchDataInfo = {

   if (maxSize < 0)

     throw new IllegalArgumentException("Invalid max size for log read (%d)".format(maxSize))

   val logSize = log.sizeInBytes // this may change, need to save a consistent copy

   val startOffsetAndSize = translateOffset(startOffset)

  // offset 已经到本 segment 的结尾，返回 null

   // if the start position is already off the end of the log, return null

   if (startOffsetAndSize == null)

     return null

  // 开始位置

   val startPosition = startOffsetAndSize.position

   val offsetMetadata = new LogOffsetMetadata(startOffset, this.baseOffset, startPosition)

  // 调整的最大位置

   val adjustedMaxSize =

     if (minOneMessage) math.max(maxSize, startOffsetAndSize.size)

     else maxSize

   // return a log segment but with zero size in the case below

   if (adjustedMaxSize == 0)

     return FetchDataInfo(offsetMetadata, MemoryRecords.EMPTY)

   // calculate the length of the message set to read based on whether or not they gave us a maxOffset

   val fetchSize: Int = maxOffset match {

     case None =>

       // no max offset, just read until the max position

       min((maxPosition - startPosition).toInt, adjustedMaxSize)

     case Some(offset) =>

       // there is a max offset, translate it to a file position and use that to calculate the max read size;

       // when the leader of a partition changes, it's possible for the new leader's high watermark to be less than the

       // true high watermark in the previous leader for a short window. In this window, if a consumer fetches on an

       // offset between new leader's high watermark and the log end offset, we want to return an empty response.

       if (offset < startOffset)

         return FetchDataInfo(offsetMetadata, MemoryRecords.EMPTY, firstEntryIncomplete = false)

       val mapping = translateOffset(offset, startPosition)

       val endPosition =

         if (mapping == null)

           logSize // the max offset is off the end of the log, use the end of the file

         else

           mapping.position

       min(min(maxPosition, endPosition) - startPosition, adjustedMaxSize).toInt

   }

   FetchDataInfo(offsetMetadata, log.read(startPosition, fetchSize),

     firstEntryIncomplete = adjustedMaxSize < startOffsetAndSize.size)

 }

 log.read(startPosition, fetchSize)  的源码如下：

 /**

  * Return a slice of records from this instance, which is a view into this set starting from the given position

  * and with the given size limit.

  *

  * If the size is beyond the end of the file, the end will be based on the size of the file at the time of the read.

  *

  * If this message set is already sliced, the position will be taken relative to that slicing.

  *

  * @param position The start position to begin the read from

  * @param size The number of bytes after the start position to include

  * @return A sliced wrapper on this message set limited based on the given position and size

  */

 public FileRecords read(int position, int size) throws IOException {

     if (position < 0)

         throw new IllegalArgumentException("Invalid position: " + position);

     if (size < 0)

         throw new IllegalArgumentException("Invalid size: " + size);

     final int end;

     // handle integer overflow

     if (this.start + position + size < 0)

         end = sizeInBytes();

     else

         end = Math.min(this.start + position + size, sizeInBytes());

     return new FileRecords(file, channel, this.start + position, end, true);

 }

processResponseCallback（在kafka.server.KafkaApis#handleFetchRequest 中定义）源码如下：

 // fetch response callback invoked after any throttling

   def fetchResponseCallback(bandwidthThrottleTimeMs: Int) {

     def createResponse(requestThrottleTimeMs: Int): RequestChannel.Response = {

       val convertedData = new util.LinkedHashMap[TopicPartition, FetchResponse.PartitionData]

       fetchedPartitionData.asScala.foreach { case (tp, partitionData) =>

         convertedData.put(tp, convertedPartitionData(tp, partitionData))

       }

       val response = new FetchResponse(convertedData, 0)

       val responseStruct = response.toStruct(versionId)

       trace(s"Sending fetch response to client $clientId of ${responseStruct.sizeOf} bytes.")

       response.responseData.asScala.foreach { case (topicPartition, data) =>

         // record the bytes out metrics only when the response is being sent

         brokerTopicStats.updateBytesOut(topicPartition.topic, fetchRequest.isFromFollower, data.records.sizeInBytes)

       }

       val responseSend = response.toSend(responseStruct, bandwidthThrottleTimeMs + requestThrottleTimeMs,

         request.connectionId, request.header)

       RequestChannel.Response(request, responseSend)

     }

     if (fetchRequest.isFromFollower)

       sendResponseExemptThrottle(createResponse(0))

     else

       sendResponseMaybeThrottle(request, request.header.clientId, requestThrottleMs =>

         requestChannel.sendResponse(createResponse(requestThrottleMs)))

   }

   // When this callback is triggered, the remote API call has completed.

   // Record time before any byte-rate throttling.

   request.apiRemoteCompleteTimeNanos = time.nanoseconds

   if (fetchRequest.isFromFollower) {

     // We've already evaluated against the quota and are good to go. Just need to record it now.

     val responseSize = sizeOfThrottledPartitions(versionId, fetchRequest, mergedPartitionData, quotas.leader)

     quotas.leader.record(responseSize)

     fetchResponseCallback(bandwidthThrottleTimeMs = 0)

   } else {

     // Fetch size used to determine throttle time is calculated before any down conversions.

     // This may be slightly different from the actual response size. But since down conversions

     // result in data being loaded into memory, it is better to do this after throttling to avoid OOM.

     val response = new FetchResponse(fetchedPartitionData, 0)

     val responseStruct = response.toStruct(versionId)

     quotas.fetch.recordAndMaybeThrottle(request.session.sanitizedUser, clientId, responseStruct.sizeOf,

       fetchResponseCallback)

   }

 }

结论，会具体定位到具体LogSegment，通过 start 和 size 来获取 logSegement中的记录，最大大小默认为1 M，可以设置。

并且提供了数据隔离机制，可以支持读已提交和读未提交（默认是读未提交）。如果没有数据会直接返回的。

spark streaming 接收kafka消息之三 -- kafka broker 如何处理 fetch 请求的更多相关文章

Spark Streaming接收Kafka数据存储到Hbase
Spark Streaming接收Kafka数据存储到Hbase fly spark hbase kafka 主要参考了这篇文章https://yq.aliyun.com/articles/60712 ...
Kafka：ZK+Kafka+Spark Streaming集群环境搭建（二十二）Spark Streaming接收流数据及使用窗口函数
官网文档:<http://spark.apache.org/docs/latest/streaming-programming-guide.html#a-quick-example> Sp ...
spark streaming 接收kafka消息之四 -- 运行在 worker 上的 receiver
使用分布式receiver来获取数据使用 WAL 来实现 exactly-once 操作: conf.set("spark.streaming.receiver.writeAheadLog. ...
spark streaming 接收kafka消息之二 -- 运行在driver端的receiver
先从源码来深入理解一下 DirectKafkaInputDStream 的将 kafka 作为输入流时,如何确保 exactly-once 语义. val stream: InputDStream[( ...
spark streaming 接收kafka消息之一 -- 两种接收方式
源码分析的spark版本是1.6. 首先,先看一下 org.apache.spark.streaming.dstream.InputDStream 的类说明: This is the abstrac ...
spark streaming 接收kafka消息之五 -- spark streaming 和 kafka 的对接总结
Spark streaming 和kafka 处理确保消息不丢失的总结接入kafka 我们前面的1到4 都在说 spark streaming 接入 kafka 消息的事情.讲了两种接入方式,以及s ...
demo1 spark streaming 接收 kafka 数据java代码WordCount示例
1. 首先启动zookeeper windows上的安装见zk 02之 Windows安装和使用zookeeper 启动后见: 2. 启动kafka windows的安装kafka见Windows上搭 ...
Kafka消息时间戳(kafka message timestamp)
最近碰到了消息时间戳的问题,于是花了一些功夫研究了一下,特此记录一下. Kafka消息的时间戳在消息中增加了一个时间戳字段和时间戳类型.目前支持的时间戳类型有两种: CreateTime 和 L ...
通过案例对 spark streaming 透彻理解三板斧之三：spark streaming运行机制与架构
本期内容: 1. Spark Streaming Job架构与运行机制 2. Spark Streaming 容错架构与运行机制事实上时间是不存在的,是由人的感官系统感觉时间的存在而已,是一种虚幻的 ...

随机推荐

java多线程模拟生产者消费者问题，公司面试常常问的题。。。
package com.cn.test3; //java多线程模拟生产者消费者问题 //ProducerConsumer是主类,Producer生产者,Consumer消费者,Product产品 // ...
ssm框架插入mysql数据库中文乱码问题解决
1. 检查web.xml  <filter> <filter-name>encodingFilter</filter-n ...
DDD实战12 值对象不创建表，而是直接作为实体中的字段
这里的值对象如下风格: namespace Order.Domain.PocoModels { //订单地址 //虽然是值对象但是不继承ValueObject //因为继承ValueObject会有 ...
WPF 3D Transparency Depth-Order Sorting
原文:WPF 3D Transparency Depth-Order Sorting Just a quick post here - When making WPF 3D apps, trans ...
关于fastjson用法
fastjson 是一个性能很好的 Java 语言实现的 JSON 解析器和生成器,来自阿里巴巴的工程师开发. public static final String toJSONString(Obje ...
FTPHelper
转载自 :https://blog.csdn.net/jiankunking/article/details/50016043 using System; using System.Collectio ...
深度学习概述教程--Deep Learning Overview
引言深度学习,即Deep Learning,是一种学习算法(Learning algorithm),亦是人工智能领域的一个重要分支.从快速发展到实际应用,短短几年时间里, ...
Rails 最佳实践
在你业务简单的时候,让你简简单单用 ActiveRecord 模型. 复杂的时候,你可以用官方推荐的 Concerns. 更复杂的时候,可以通过 gem 和 API 来拆分. 极端复杂的时候,由于 R ...
给 Web 开发人员推荐的通用独立 UI 组件（二）
现代 Web 开发在将体验和功能做到极致的同时,对于美观的追求也越来越高.在推荐完图形库之后,再来推荐一些精品的独立 UI 组件.这些组件可组合在一起,形成美观而交互强大的 Web UI . 给 We ...
变量的选择——Lasso&Ridge&ElasticNet
对模型参数进行限制或者规范化能将一些参数朝着0收缩(shrink).使用收缩的方法的效果提升是相当好的,岭回归(ridge regression,后续以ridge代称),lasso和弹性网络(elas ...

spark streaming 接收kafka消息之三 -- kafka broker 如何处理 fetch 请求

spark streaming 接收kafka消息之三 -- kafka broker 如何处理 fetch 请求的更多相关文章

随机推荐

热门专题