Apache Kafka - KIP-42: Add Producer and Consumer Interceptors
kafka 0.10.0.0 released
Interceptors的概念应该来自flume
参考,http://blog.csdn.net/xiao_jun_0820/article/details/38111305
比如,flume提供的
Timestamp Interceptor
Host Interceptor
Static Interceptor
Regex Filtering Interceptor
Regex Extractor Interceptor
可以对于流过的message进行一些包装,比如插入时间,host,或做些过滤等etl操作
所以kafka在producer和consumer端也都提供这样的Interceptors接口,
ProducerInterceptor
/**
* A plugin interface to allow things to intercept events happening to a producer record,
* such as sending producer record or getting an acknowledgement when a record gets published
*/
public interface ProducerInterceptor<K, V> extends Configurable {
/**
* This is called when client sends record to KafkaProducer, before key and value gets serialized.
* @param record the record from client
* @return record that is either original record passed to this method or new record with modified key and value.
*/
public ProducerRecord<K, V> onSend(ProducerRecord<K, V> record); /**
* This is called when the send has been acknowledged
* @param metadata The metadata for the record that was sent (i.e. the partition and offset). The metadata information may be only partially filled, if an error occurred. Topic will be always set, and if partition is not -1, partition will be set partition set/assigned to this record.
* @param exception The exception thrown during processing of this record. Null if no error occurred.
*/
public void onAcknowledgement(RecordMetadata metadata, Exception exception); /**
* This is called when interceptor is closed
*/
public void close();
}
onSend() will be called in KafkaProducer.send(), before key and value gets serialized and before partition gets assigned.
If the implementation modifies key and/or value, it must return modified key and value in a new ProducerRecord object.
onAcknowledgement() will be called when the send is acknowledged. It has same API as Callback.onCompletion(), and is called just before Callback.onCompletion() is called.
多个multiple interceptors之间是可以串联的
ProducerInterceptor APIs will be called from multiple threads: onSend() will be called on submitting thread and onAcknowledgement() will be called on producer I/O thread.
ConsumerInterceptor
/**
* A plugin interface to allow things to intercept Consumer events such as receiving a record or record being consumed
* by a client.
*/
public interface ConsumerInterceptor<K, V> extends Configurable {
/**
* This is called when the records are about to be returned to the client.
* @param records records to be consumed by the client. Null if record dropped/ignored/discarded (non consumable)
* @return records that is either original 'records' passed to this method or modified set of records
*/
public ConsumerRecords<K, V> onConsume(ConsumerRecords<K, V> records); /**
* This is called when offsets get committed
* This method will be called when the commit request sent to the server has been acknowledged.
* @param offsets A map of the offsets and associated metadata that this callback applies to
*/
public void onCommit(Map<TopicPartition, OffsetAndMetadata> offsets); /**
* This is called when interceptor is closed
*/
public void close();
}
onConsume() will be called in KafkaConsumer.poll(), just before poll() returns ConsumerRecords.
onCommit() will be called when offsets get committed: just before OffsetCommitCallback.onCompletion() is called and in ConsumerCoordinator.commitOffsetsSync() on successful commit.
Since new consumer is single-threaded, ConsumerInterceptor API will be called from a single thread.
总结,
Interceptor作为一种plugin可以做些,对message的decorate或cleaning或filtering等一些轻量的工作,最主要的用途还是用于监控,trace message
Interceptor可以串联执行
Interceptor必须要轻量,因为如果耗时就会影响链路的throughput
confluent公司也提供相应的interceptor产品,用于data stream的监控
http://docs.confluent.io/3.0.0/control-center/docs/clients.html
同时,为了更好的监控和audit
Currently, RecordMetadata contains topic/partition, offset, and timestamp (KIP-32).
We propose to add remaining record's metadata in RecordMetadata: checksum and record size. Both checksum and record size are useful for monitoring and audit.
For symmetry, we also propose to expose the same metadata on consumer side and make available to interceptors.
We will add checksum and record size fields to RecordMetadata and ConsumerRecord.
public
final
class
RecordMetadata {
private
final
long
offset;
private
final
TopicPartition topicPartition;
private
final
long
checksum; <<== NEW: checksum of the record
private
final
int
size; <<== NEW: record size in bytes(before compression)
public
final
class
ConsumerRecord<K, V> {
.......
private
final
long
checksum; <<== NEW: checksum of the record
private
final
int
size; <<== NEW: record size in bytes (after decompression)
Apache Kafka - KIP-42: Add Producer and Consumer Interceptors的更多相关文章
- 如何创建Kafka客户端:Avro Producer和Consumer Client
1.目标 - Kafka客户端 在本文的Kafka客户端中,我们将学习如何使用Kafka API 创建Apache Kafka客户端.有几种方法可以创建Kafka客户端,例如最多一次,至少一次,以及一 ...
- 漫游Kafka设计篇之Producer和Consumer
Kafka Producer 消息发送 producer直接将数据发送到broker的leader(主节点),不需要在多个节点进行分发.为了帮助producer做到这点,所有的Kafka节点都可以及时 ...
- 漫游Kafka设计篇之Producer和Consumer(4)
Kafka Producer 消息发送 producer直接将数据发送到broker的leader(主节点),不需要在多个节点进行分发.为了帮助producer做到这点,所有的Kafka节点都可以及时 ...
- apache kafka源码分析-Producer分析---转载
原文地址:http://www.aboutyun.com/thread-9938-1-1.html 问题导读1.Kafka提供了Producer类作为java producer的api,此类有几种发送 ...
- Apache Kafka – KIP 32,33 Time Index
32, 33都是和时间相关的, KIP-32 - Add timestamps to Kafka message 引入版本,0.10.0.0 需要给kafka的message加上时间戳,这样更方便一些 ...
- Apache Kafka Producer For Beginners
在我们上一篇Kafka教程中,我们讨论了Kafka Cluster.今天,我们将通过示例讨论Kafka Producer.此外,我们将看到KafkaProducer API和Producer API. ...
- 实践部署与使用apache kafka框架技术博文资料汇总
前一篇Kafka框架设计来自英文原文(Kafka Architecture Design)的翻译及整理文章,非常有借鉴性,本文是从一个企业使用Kafka框架的角度来记录及整理的Kafka框架的技术资料 ...
- Apache Kafka: Next Generation Distributed Messaging System---reference
Introduction Apache Kafka is a distributed publish-subscribe messaging system. It was originally dev ...
- 【Apache Kafka】二、Kafka安装及简单示例
(一)Apache Kafka安装 1.安装环境与前提条件 安装环境:Ubuntu16.04 前提条件: ubuntu系统下安装好jdk 1.8以上版本,正确配置环境变量 ubuntu系统下安 ...
随机推荐
- 获取应用程序根目录物理路径(Web and Windows)
这两个计划写一个小类库,需要在不同项目下任意调用.该类库需要对磁盘文件进行读写,所以就需要获取程序执行的磁盘路径,就简单的对获取磁盘路径的方法进行研究. 借助搜索引擎,我从网上搜罗来多种方法,都可以直 ...
- 【Spark深入学习 -10】基于spark构建企业级流处理系统
----本节内容------- 1.流式处理系统背景 1.1 技术背景 1.2 Spark技术很火 2.流式处理技术介绍 2.1流式处理技术概念 2.2流式处理应用场景 2.3流式处理系统分类 3.流 ...
- [20170706]SQL Server事务复制订阅端,job不小心被删,修复
右击还存在的订阅,生成脚本,有个过程sp_addpullsubscription_agent 执行,发现报错说distribution agent 已经存在 执行: UPDATE dbo.MSrepl ...
- LeetCode: Best Time to Buy and Sell Stock III 解题报告
Best Time to Buy and Sell Stock IIIQuestion SolutionSay you have an array for which the ith element ...
- wamp多站点多端口配置
1.配置httpd.conf 监听多个端口 #Listen 12.34.56.78:80 Listen 8081 Listen 8082 Listen 8083 可以通过netstat -n -a查看 ...
- postman参数获取不到原因
在使用postman时,会发现经常提示参数错误,然而代码没有问题,仔细一看,原来是粘贴复制参数到postman时,前后有空格.
- NodeJS + PhantomJS 抓取页面信息以及截图
利用PhantomJS做网页截图经济适用,但其API较少,做其他功能就比较吃力了.例如,其自带的Web Server Mongoose最高只能同时支持10个请求,指望他能独立成为一个服务是不怎么实际的 ...
- Jenkins这种构建工具,一般都是内部使用,所以外部基本上不能访问
类似于Jenkins这种构建工具,一般都是内部使用,所以外部基本上不能访问,也可以隔绝外部黑客的入侵等.直接暴露外部是非常不安全的,特别是没有什么安全验证,容易被别人入侵做一些非法的事情! 所以,希望 ...
- 【Matplotlib】利用Python进行绘图
[Matplotlib] 教程:https://morvanzhou.github.io/tutorials/data-manipulation/plt/ 官方文档:https://matplotli ...
- Retrieve id of record just inserted into a Java DB (Derby) database
https://stackoverflow.com/questions/4894754/retrieve-id-of-record-just-inserted-into-a-java-db-derby ...