1.概述

最近有同学在学习Kafka的网络通信这块内容时遇到一些疑问，关于网络模型和通信流程的相关内容，这里笔者将通过这篇博客为大家来剖析一下这部分内容。

2.内容

Kafka系统作为一个Message Queue，涉及到的网络通信主要包含以下两个方面：

Pull：Consumer从消息队列中拉取消息数据；
Push：Producer往消息队列中推送消息数据。

要实现高性能的网络通信，可以使用更加底层的TCP协议或者UDP协议来实现。Kafka在Producer、Broker、Consumer之间设计了一套基于TCP层的通信协议，这套协议完全是为了Kafka系统自身需求而定制实现的。

提示：

这里需要注意的是，由于UDP协议是一种不可靠的传输协议，所以Kafka系统采用TCP协议作为服务间的通信协议。

2.1 基本数据类型

通信协议中的基本数据类型分为以下几种：

定长数据类型：例如，int8、int16、int32和、int64，对应到Java语言中，分别是byte、short、int和long
可变数据类型：例如，Java语言中Map、List等
数组：例如，Java语言中的int[]、String[]等

2.2 通信模型

Kafka系统采用的是Reactor多线程模型，即通过一个Acceptor线程处理所有的新连接，通过多个Processor线程对请求进行处理（比如解析协议、封装请求、、转发等）。

提示：

Reactor是一种事件模型，可以将请求提交到一个或者多个服务程序中进行处理。

当收到Client的请求后，Server处理程序使用多路分发策略，由一个非阻塞的线程来接收所有的请求，然后将这些请求转发到对应的工作线程中进行处理。

之后，在Kafka的版本迭代中，新增了一个Handler模块，它通过指定的线程数对请求进行处理。Handler和Processor之间通过一个Block Queue进行连接。如下图所示：

这里 Acceptor是一个继承于AbstractServerThread的线程类，Acceptor的主要目的是监听并且接收Client的请求，同时，建立数据传输通道（SocketChannel），然后通过轮询的方式交给一个Processor处理。其核心代码在Acceptor的run方法中，代码如下：

def run() {

    serverChannel.register(nioSelector, SelectionKey.OP_ACCEPT)

    startupComplete()

    try {

      var currentProcessor = 0

      while (isRunning) {

        try {

          val ready = nioSelector.select(500)

          if (ready > 0) {

            val keys = nioSelector.selectedKeys()

            val iter = keys.iterator()

            while (iter.hasNext && isRunning) {

              try {

                val key = iter.next

                iter.remove()

                if (key.isAcceptable)

                  accept(key, processors(currentProcessor))

                else

                  throw new IllegalStateException("Unrecognized key state for acceptor thread.")

                // round robin to the next processor thread

                currentProcessor = (currentProcessor + 1) % processors.length

              } catch {

                case e: Throwable => error("Error while accepting connection", e)

              }

            }

          }

        }

        catch {

          // We catch all the throwables to prevent the acceptor thread from exiting on exceptions due

          // to a select operation on a specific channel or a bad request. We don't want

          // the broker to stop responding to requests from other clients in these scenarios.

          case e: ControlThrowable => throw e

          case e: Throwable => error("Error occurred", e)

        }

      }

    } finally {

      debug("Closing server socket and selector.")

      swallowError(serverChannel.close())

      swallowError(nioSelector.close())

      shutdownComplete()

    }

  }

这里还有一个块通道（BlockingChannel），用于连接Processor和Handler，其代码如下所示：

class BlockingChannel( val host: String,

                       val port: Int,

                       val readBufferSize: Int,

                       val writeBufferSize: Int,

                       val readTimeoutMs: Int ) extends Logging {

  private var connected = false

  private var channel: SocketChannel = null

  private var readChannel: ReadableByteChannel = null

  private var writeChannel: GatheringByteChannel = null

  private val lock = new Object()

  private val connectTimeoutMs = readTimeoutMs

  private var connectionId: String = ""

  def connect() = lock synchronized  {

    if(!connected) {

      try {

        channel = SocketChannel.open()

        if(readBufferSize > 0)

          channel.socket.setReceiveBufferSize(readBufferSize)

        if(writeBufferSize > 0)

          channel.socket.setSendBufferSize(writeBufferSize)

        channel.configureBlocking(true)

        channel.socket.setSoTimeout(readTimeoutMs)

        channel.socket.setKeepAlive(true)

        channel.socket.setTcpNoDelay(true)

        channel.socket.connect(new InetSocketAddress(host, port), connectTimeoutMs)

        writeChannel = channel

        // Need to create a new ReadableByteChannel from input stream because SocketChannel doesn't implement read with timeout

        // See: http://stackoverflow.com/questions/2866557/timeout-for-socketchannel-doesnt-work

        readChannel = Channels.newChannel(channel.socket().getInputStream)

        connected = true

        val localHost = channel.socket.getLocalAddress.getHostAddress

        val localPort = channel.socket.getLocalPort

        val remoteHost = channel.socket.getInetAddress.getHostAddress

        val remotePort = channel.socket.getPort

        connectionId = localHost + ":" + localPort + "-" + remoteHost + ":" + remotePort

        // settings may not match what we requested above

        val msg = "Created socket with SO_TIMEOUT = %d (requested %d), SO_RCVBUF = %d (requested %d), SO_SNDBUF = %d (requested %d), connectTimeoutMs = %d."

        debug(msg.format(channel.socket.getSoTimeout,

                         readTimeoutMs,

                         channel.socket.getReceiveBufferSize,

                         readBufferSize,

                         channel.socket.getSendBufferSize,

                         writeBufferSize,

                         connectTimeoutMs))

      } catch {

        case _: Throwable => disconnect()

      }

    }

  }

  def disconnect() = lock synchronized {

    if(channel != null) {

      swallow(channel.close())

      swallow(channel.socket.close())

      channel = null

      writeChannel = null

    }

    // closing the main socket channel *should* close the read channel

    // but let's do it to be sure.

    if(readChannel != null) {

      swallow(readChannel.close())

      readChannel = null

    }

    connected = false

  }

  def isConnected = connected

  def send(request: RequestOrResponse): Long = {

    if(!connected)

      throw new ClosedChannelException()

    val send = new RequestOrResponseSend(connectionId, request)

    send.writeCompletely(writeChannel)

  }

  def receive(): NetworkReceive = {

    if(!connected)

      throw new ClosedChannelException()

    val response = readCompletely(readChannel)

    response.payload().rewind()

    response

  }

  private def readCompletely(channel: ReadableByteChannel): NetworkReceive = {

    val response = new NetworkReceive

    while (!response.complete())

      response.readFromReadableChannel(channel)

    response

  }

}

3.通信过程

Kafka系统的通信框架也是经过了不同的版本迭代的。例如，在Kafka老的版本中，以NIO作为网络通信的基础，通过将多个Socket连接注册到一个Selector上进行监听，只用一个线程就能管理多个连接，这极大的节省了多线程的资源开销。

在Kafka之后的新版本中，依然以NIO作为网络通信的基础，也使用了Reactor多线程模型，不同的是，新版本将具体的业务处理模块（Handler模块）独立出去了，并用单独的线程池进行控制。如下图所示：

通过上图，我们可以总结一下Kafka的通信流程：

Client向Server发送请求时，Acceptor负责接收TCP请求，连接成功后传递给Processor线程；
Processor线程接收到新的连接后，将其注册到自身的Selector中，并监听READ事件
当Client在当前连接对象上写入数据时，会触发READ事件，根据TCP协议调用Handler进行处理
Handler处理完成后，可能会有返回值给Client，并将Handler返回的结果绑定Response端进行发送

通过总结和分析，我们可以知道Kafka新版中独立Handler模块，用这样以下几点优势：

能够单独指定Handler的线程数，便于调优和管理
防止一个过大的请求阻塞一个Processor线程
Request、Handler、Response之间都是通过队列来进行连接的，这样它们彼此之间不存在耦合现象，对提升Kafka系统的性能很有帮助

这里需要注意的是，在Kafka的网络通信中，RequestChannel为Processor线程与Handler线程之间数据交换提供了一个缓冲区，是通信中Request和Response缓存的地方。因此，其作用就是在通信中起到了一个数据缓冲队列的作用。Processor线程将读取到的请求添加至RequestChannel的全局队列（requestQueue）中，Handler线程从请求队列中获取并处理，处理完成后将Response添加至RequestChannel的响应队列（responseQueues）中，通过responseListeners唤醒对应的Processor线程，最后Processor线程从响应队列中取出后发送到Client。实现代码如下：

class RequestChannel(val numProcessors: Int, val queueSize: Int) extends KafkaMetricsGroup {

  private var responseListeners: List[(Int) => Unit] = Nil

  private val requestQueue = new ArrayBlockingQueue[RequestChannel.Request](queueSize)

  private val responseQueues = new Array[BlockingQueue[RequestChannel.Response]](numProcessors)

  for(i <- 0 until numProcessors)

    responseQueues(i) = new LinkedBlockingQueue[RequestChannel.Response]()

  newGauge(

    "RequestQueueSize",

    new Gauge[Int] {

      def value = requestQueue.size

    }

  )

  newGauge("ResponseQueueSize", new Gauge[Int]{

    def value = responseQueues.foldLeft(0) {(total, q) => total + q.size()}

  })

  for (i <- 0 until numProcessors) {

    newGauge("ResponseQueueSize",

      new Gauge[Int] {

        def value = responseQueues(i).size()

      },

      Map("processor" -> i.toString)

    )

  }

  /** Send a request to be handled, potentially blocking until there is room in the queue for the request */

  def sendRequest(request: RequestChannel.Request) {

    requestQueue.put(request)

  }

  /** Send a response back to the socket server to be sent over the network */

  def sendResponse(response: RequestChannel.Response) {

    responseQueues(response.processor).put(response)

    for(onResponse <- responseListeners)

      onResponse(response.processor)

  }

  /** No operation to take for the request, need to read more over the network */

  def noOperation(processor: Int, request: RequestChannel.Request) {

    responseQueues(processor).put(RequestChannel.Response(processor, request, null, RequestChannel.NoOpAction))

    for(onResponse <- responseListeners)

      onResponse(processor)

  }

  /** Close the connection for the request */

  def closeConnection(processor: Int, request: RequestChannel.Request) {

    responseQueues(processor).put(RequestChannel.Response(processor, request, null, RequestChannel.CloseConnectionAction))

    for(onResponse <- responseListeners)

      onResponse(processor)

  }

  /** Get the next request or block until specified time has elapsed */

  def receiveRequest(timeout: Long): RequestChannel.Request =

    requestQueue.poll(timeout, TimeUnit.MILLISECONDS)

  /** Get the next request or block until there is one */

  def receiveRequest(): RequestChannel.Request =

    requestQueue.take()

  /** Get a response for the given processor if there is one */

  def receiveResponse(processor: Int): RequestChannel.Response = {

    val response = responseQueues(processor).poll()

    if (response != null)

      response.request.responseDequeueTimeMs = Time.SYSTEM.milliseconds

    response

  }

  def addResponseListener(onResponse: Int => Unit) {

    responseListeners ::= onResponse

  }

  def shutdown() {

    requestQueue.clear()

  }

}

4.总结

通过认真阅读和分析Kafka的网络通信层代码，可以收获不少关于NIO的网络通信知识。通过对Kafka的源代码进行阅读和学习，这对大规模Kafka集群性能的调优和问题定位排查是很有帮助的。

5.结束语

这篇博客就和大家分享到这里，如果大家在研究学习的过程当中有什么问题，可以加群进行讨论或发送邮件给我，我会尽我所能为您解答，与君共勉！

另外，博主出书了《Kafka并不难学》和《Hadoop大数据挖掘从入门到进阶实战》，喜欢的朋友或同学，可以在公告栏那里点击购买链接购买博主的书进行学习，在此感谢大家的支持。关注下面公众号，根据提示，可免费获取书籍的教学视频。

Kafka网络模型和通信流程剖析的更多相关文章

Kafka控制器选举流程剖析
1.概述平时在使用Kafka的时候,可能关注的更多的是Kafka系统层面的.今天来给大家剖析一下Kafka的控制器,了解一下Kafka控制器的选举流程. 2.内容 Kafka控制器,其实就是一个Ka ...
深度剖析MQTT协议的整个通信流程
http://www.elecfans.com/d/587483.html MQTT,目前物联网的最主要的协议,基本所有收费的云平台都是基于MQTT协议,比如机智云,和所有的开放云平台比如中国移动的o ...
【转】跟我学Kafka之NIO通信机制
from:云栖社区玛德,今天又被人打脸了,小看人,艹,确实,相对比起来,在某些方面差一点,,,,该好好捋捋了,强化下短板,规划下日程,,,引以为耻,铭记于心. 跟我学Kafka之NIO通信机制 ...
Socket通信流程
Socket通信流程 HTTP 底层就是通过socket建立连接通信管道,实现数据传输 HTTP是一个TCP的传输协议(方式),它是一个可靠,安全的协议
twemproxy代理主干流程——剖析twemproxy代码正编
在twemproxy的发送和接收流程剖析中,我们已经完全弄清楚twemproxy如何将客户端以及服务端发来的包切分成msg,获得一个独立的msg后twemproxy应该如何处理?这是本文这次需要重点介 ...
__http原理__01__通信流程_消息格式
本文转自菜鸟教程一.HTTP协议(HyperText Transfer Protocol,超文本传输协议) 是因特网上应用最为广泛的一种网络传输协议, 是用于从万维网(WWW:World Wi ...
Spring Security Oauth2 单点登录案例实现和执行流程剖析
Spring Security Oauth2 OAuth是一个关于授权的开放网络标准,在全世界得到的广泛的应用,目前是2.0的版本.OAuth2在“客户端”与“服务提供商”之间,设置了一个授权层(au ...
Jedis cluster命令执行流程剖析
Jedis cluster命令执行流程剖析在Redis Cluster集群模式下,由于key分布在各个节点上,会造成无法直接实现mget.sInter等功能.因此,无论我们使用什么客户端来操作Red ...
第二十六天- C/S架构通信流程 socket
1.C/S架构 C/S架构:Client与Server ,中文意思:客户端与服务器端架构,这种架构也是从用户层面(也可是物理层面)来划分的.这里客户端一般指需先安装再执行的应用程序,对操作系统依赖性较 ...

随机推荐

Java动态代理 ----- jdk代理与cglib代理
1.jdk代理针对接口进行代理,接口可以没有方法, InvocationHandler会拦截所有方法,不过好像意义不大....只能执行Object类的方法,执行结果有点奇怪... package t ...
Cocos Creator | 飞刀大乱斗开发教程系列（二）
预览效果具体内容 ■ 这一期,主要讲解主页中间人物效果的实现.也就是,在下方列表选择不同人物,上方显示不同的人物,播放不同的效果,即下图的效果实现,此部分也是采用预制 Prefab 进行实现. 英雄 ...
利用selenium库自动执行滑动验证码模拟登陆
破解流程 #1.输入账号.密码,然后点击登陆 #2.点击按钮,弹出没有缺口的图 #3.针对没有缺口的图片进行截图 #4.点击滑动按钮,弹出有缺口的图 #5.针对有缺口的图片进行截图 #6.对比两张图片 ...
「洛谷P1233」木棍加工解题报告
P1233 木棍加工题目描述一堆木头棍子共有n根,每根棍子的长度和宽度都是已知的.棍子可以被一台机器一个接一个地加工.机器处理一根棍子之前需要准备时间.准备时间是这样定义的: 第一根棍子的准备时间 ...
「UVA1328」「POJ1961」 Period 解题报告
UVA1328 Period 其他链接:luogu UVA1328 POJ1961 For each prefix of a given string S with N characters (eac ...
1068 万绿丛中一点红 (20分)C语言
对于计算机而言,颜色不过是像素点对应的一个 24 位的数值.现给定一幅分辨率为 M×N 的画,要求你找出万绿丛中的一点红,即有独一无二颜色的那个像素点,并且该点的颜色与其周围 8 个相邻像素的颜色差充 ...
Linux磁盘管理之LVM
一.LVM介绍在我们管理Linux磁盘的时候,通常会遇到这么一种情况.在最初规划Linux的磁盘的时候,我们给某个分区划分了一定量的磁盘空间,使用一段时间后,发现我们规划的磁盘空间不足了,这个时候怎 ...
virtualenv虚拟环境使用及介绍
一.为什么使用virtualenv虚拟环境每个虚拟环境下的依赖相互独立,不同的项目可以单独使用一套python环境,减少各依赖包的影响更容易部署容器化二.virtualenv创建虚拟环境三. ...
KindEditor.ready 不执行的解决方法
问题描述按照官网的要求,一一都设置好了,但就是没法显示富文本编辑器. 1.设置好textarea输入框 <textarea id="myEditor" name=" ...
【转】ArcGIS Server 10.1 动态图层
ArcGISServer将GIS资源以服务的方式发布,能够让更多的人在Web上浏览.使用.不过,诸如气象.环保等方面的信息是实时变化的,按照之前常规的方法,我们先要将最新获得的信息组织成地图文档后再对 ...

Kafka网络模型和通信流程剖析