When we are talking about performance of Kafka Producer, we are really talking about two different things:

  • latency: how much time passes from the time KafkaProducer.send() was called until the message shows up in a Kafka broker.
  • throughput: how many messages can the producer send to Kafka each second.

Many years ago, I was in a storage class taught by scalability expert James Morle. One of the students asked why we need to worry about both latency and throughput – after all, if processing a message takes 10ms (latency), then clearly throughput is limited to 100 messages per second. When looking at things this way, it may look like higher latency == higher throughput. However, the relation between latency and throughput is not this trivial.

Lets start our discussion with agreeing that we are only talking about the new Kafka Producer (the one in org.apache.kafka.clients package). It makes things simpler and there’s no reason to use the old producer at this point.

Kafka Producer allows to send message batches. Suppose that due to network roundtrip times, it takes 2ms to send a single Kafka message. By sending one message at a time, we have latency of 2ms and throughput of 500 messages per second. But suppose that we are in no big hurry, and are willing to wait few milliseconds and send a larger batch – lets say we decided to wait 8ms and managed to accumulate 1000 messages. Our latency is now 10ms, but our throughput is up to 100,000 messages per second! Thats the main reason I love microbatches so much. By adding a tiny delay, and 10ms is usually acceptable even for financial applications, our throughput is 200 times greater. This type of trade-off is not unique to Kafka, btw. Network and storage subsystem use this kind of “micro batching”  all the time.

Sometimes latency and throughput interact in even funnier ways. One day Ted Malaskacomplained that with Flafka, he can get 20ms latency when sending 100,000 messages per second, but huge 1-3s latency when sending just 100 messages a second. This made no sense at all, until we remembered that to save CPU, if Flafka doesn’t find messages to read from Kafka it will back off and retry later. Backoff times started at 0.5s and steadily increased. Ted kindly improved Flume to avoid this issue in FLUME-2729.

Anyway, back to the Kafka Producer. There are few settings you can modify to improve latency or throughput in Kafka Producer:

  • batch.size – This is an upper limit of how many messages Kafka Producer will attempt to batch before sending – specified in bytes (Default is 16K bytes – so 16 messages if each message is 1K in size). Kafka may send batches before this limit is reached (so latency doesn’t change by modifying this parameter), but will always send when this limit is reached. Therefore setting this limit too low will hurt throughput without improving latency. The main reason to set this low is lack of memory – Kafka will always allocate enough memory for the entire batch size, even if latency requirements cause it to send half-empty batches.
  • linger.ms – How long will the producer wait before sending in order to allow more messages to get accumulated in the same batch. Normally the producer will not wait at all, and simply send all the messages that accumulated while the previous send was in progress (2 ms in the example above), but as we’ve discussed, sometimes we are willing to wait a bit longer in order to improve the overall throughput at the expense of a little higher latency. In this case tuning linger.ms to a higher value will make sense. Note that if batch.size is low and the batch if full before linger.ms time passes, the batch will send early, so it makes sense to tune batch.size and linger.ms together.

Other than tuning these parameters, you will  want to avoid waiting on the future of the send method (i.e. the result from Kafka brokers), and instead send data continuously to Kafka. You can simply ignore the result (if success of sending messages is not critical), but its probably better to use a callback. You can find an example of how to do this in my github (look at produceAsync method).

If sending is still slow and you are trying to understand what is going on, you will want to check if the send thread is fully utilized through jvisualsm (it is called kafka-producer-network-thread) or keep an eye on average batch size metric. If you find that you can’t fill the buffer fast enough and the sender is idle, you can try adding application threads that share the same producer and increase throughput this way.

Another concern can be that the Producer will send all the batches that go to the same broker together when at least one of them is full – if you have one very busy topic and others that are less busy, you may see some skew in throughput this way.

Sometimes you will notice that the producer performance doesn’t scale as you add more partitions to a topic. This can happen because, as we mentioned, there is a send buffer for each partition. When you add more partitions, you have more send buffers, so perhaps the configuration you set to keep the buffers full before (# of threads, linger.ms) is no longer sufficient and buffers are sent half-empty (check the batch sizes). In this case you will need to add threads or increase linger.ms to improve utilization and scale your throughput.

Got more tips on ingesting data into Kafka? comments are welcome!

Kafka生产者性能优化之吞吐量VS延迟的更多相关文章

  1. 一文告诉你,Kafka在性能优化方面做了哪些举措!

    很多粉丝私信问我Kafka在性能优化方面做了哪些举措,对于相关问题的答案其实我早就写过了,就是没有系统的整理一篇,最近思考着花点时间来整理一下,下次再有粉丝问我相关的问题我就可以潇洒的甩个链接了.这个 ...

  2. 1002-谈谈ELK日志分析平台的性能优化理念

    在生产环境中,我们为了更好的服务于业务,通常会通过优化的手段来实现服务对外的性能最大化,节省系统性能开支:关注我的朋友们都知道,前段时间一直在搞ELK,同时也记录在了个人的博客篇章中,从部署到各个服务 ...

  3. 5种kafka消费端性能优化方法

    摘要:带你了解基于FusionInsight HD&MRS的5种kafka消费端性能优化方法. 本文分享自华为云社区<FusionInsight HD&MRSkafka消费端性能 ...

  4. 漫游Kafka设计篇之性能优化

    Kafka在提高效率方面做了很大努力.Kafka的一个主要使用场景是处理网站活动日志,吞吐量是非常大的,每个页面都会产生好多次写操作.读方面,假设每个消息只被消费一次,读的量的也是很大的,Kafka也 ...

  5. 漫游Kafka设计篇之性能优化(7)

    Kafka在提高效率方面做了很大努力.Kafka的一个主要使用场景是处理网站活动日志,吞吐量是非常大的,每个页面都会产生好多次写操作.读方面,假设每个消息只被消费一次,读的量的也是很大的,Kafka也 ...

  6. apache kafka系列之性能优化架构分析

    apache kafka中国社区QQ群:162272557 Apache kafka性能优化架构分析 应用程序优化:数据压缩 watermark/2/text/aHR0cDovL2Jsb2cuY3Nk ...

  7. kafka的并行度与JStorm性能优化

    kafka的并行度与JStorm性能优化 > Consumers Messaging traditionally has two models: queuing and publish-subs ...

  8. Kafka权威指南 读书笔记之(三)Kafka 生产者一一向 Kafka 写入数据

    不管是把 Kafka 作为消息队列.消息总线还是数据存储平台来使用 ,总是需要有一个可以往 Kafka 写入数据的生产者和一个从 Kafka 读取数据的消费者,或者一个兼具两种角色的应用程序. 开发者 ...

  9. kafka 基础知识梳理-kafka是一种高吞吐量的分布式发布订阅消息系统

    一.kafka 简介 今社会各种应用系统诸如商业.社交.搜索.浏览等像信息工厂一样不断的生产出各种信息,在大数据时代,我们面临如下几个挑战: 如何收集这些巨大的信息 如何分析它 如何及时做到如上两点 ...

随机推荐

  1. Eclipse经常出现未响应问题

    修改eclipse.ini文件 -startupplugins/org.eclipse.equinox.launcher_1.5.0.v20180512-1130.jar--launcher.libr ...

  2. svn 没有killall命令的解决方法 -bash: killall: command not found

    debian.ubuntu系统下:   apt-get install psmisc centos 下:   yum install psmisc

  3. 弹性盒模型:flex多行多列两端对齐,列不满左对齐

    [1]需求: [2]解决方案: 最近遇到布局上要求item两端对齐,且最后一行在列不满的情况下要求左对齐,使用flex的justify-content: space-between;实现时发现最后一行 ...

  4. 51nod1712 区间求和

    http://www.51nod.com/Challenge/Problem.html#problemId=1712 先考虑题面中的简化问题. 对于\(i\in [1,n]\),\(a_i\)的贡献为 ...

  5. Linux——安装并配置Kafka

    前言 Kafka是由Apache软件基金会开发的一个开源流处理平台,由Scala和Java编写.Kafka是一种高吞吐量的分布式发布订阅消息系统,它可以处理消费者规模的网站中的所有动作流数据. 这种动 ...

  6. 浏览器报400-Bad Request异常

    今天在使用ie浏览器在测试程序的时候,报这个错误,后台日志打印出来显示的是:连接一个远程主机失败 解决Invalid character found in the request target. Th ...

  7. 题解 UVa11461

    题目大意 多组数据,每组数据给出两个正整数 \(a,b\),请求出 \(a,b\) 之间的完全平方数的个数. 分析 前缀和即可. #include<bits/stdc++.h> using ...

  8. LeetCode 1061. Lexicographically Smallest Equivalent String

    原题链接在这里:https://leetcode.com/problems/lexicographically-smallest-equivalent-string/ 题目: Given string ...

  9. PHP截取字符串函数substr()函数实例用法详解

    在PHP中有一项非常重要的技术,就是截取指定字符串中指定长度的字符.PHP对于字符串截取可以使用PHP预定义函数substr()函数来实现.下面就来介绍一下substr()函数的语法及其应用. sub ...

  10. LOJ P10018 数的划分 题解

    每日一题 day52 打卡 Analysis 这道题直接搜索会TLE到**,但我们发现有很多没有用的状态可以删去,比如 1,1,5; 1,5,1; 5,1,1; 所以很容易想到一个优化:按不下降的顺序 ...