背景

上一篇我们介绍了Kafka Streams中的消息转换操作map,今天我们给出另一个经典的转换操作filter的用法。依然是结合一个具体的实例展开介绍。

演示功能说明

本篇演示filter用法,即根据给定的过滤条件或逻辑实时对每条消息进行过滤处理。今天使用的输入topic消息格式如下:

{"name": "George R. R. Martin", "title": "A Song of Ice and Fire"}

{"name": "C.S. Lewis", "title": "The Silver Chair"}

我们打算过滤出name是“George R. R. Martin”的所有消息并发送到输出topic上。

初始化项目

创建项目目录:

mkdir filter-streams
cd filter-streams/

配置项目

在filter-streams目录下创建build.gradle文件,内容如下:

buildscript {

    repositories {
jcenter()
}
dependencies {
classpath 'com.github.jengelman.gradle.plugins:shadow:4.0.2'
}
} plugins {
id 'java'
id "com.google.protob" version "0.8.10"
}
apply plugin: 'com.github.johnrengelman.shadow' repositories {
mavenCentral()
jcenter() maven {
url 'http://packages.confluent.io/maven'
}
} group 'huxihx.kafkastreams' sourceCompatibility = 1.8
targetCompatibility = '1.8'
version = '0.0.1' dependencies {
implementation 'com.google.protobuf:protobuf-java:3.0.0'
implementation 'org.slf4j:slf4j-simple:1.7.26'
implementation 'org.apache.kafka:kafka-streams:2.3.0'
implementation 'com.google.protobuf:protobuf-java:3.9.1' testCompile group: 'junit', name: 'junit', version: '4.12'
} protobuf {
generatedFilesBaseDir = "$projectDir/src/"
protoc {
artifact = 'com.google.protobuf:protoc:3.0.0'
}
} jar {
manifest {
attributes(
'Class-Path': configurations.compile.collect { it.getName() }.join(' '),
'Main-Class': 'huxihx.kafkastreams.FilteredStreamsApp'
)
}
} shadowJar {
archiveName = "kstreams-transform-standalone-${version}.${extension}"
}

然后执行下列命令下载Gradle的wrapper套件:

gradle wrapper

之后在filter-streams目录下创建一个名为configuration的文件夹用于保存我们的参数配置文件:

mkdir configuration

创建一个名为dev.properties的文件:

application.id=filtering-app
bootstrap.servers=localhost:9092

input.topic.name=publications
input.topic.partitions=1
input.topic.replication.factor=1

output.topic.name=filtered-publications
output.topic.partitions=1output.topic.replication.factor=1

创建消息Schema

下一步是创建输入消息和输出消息的schema。由于我们今天只是做filter,所以输入和输出的格式一样的,只需要创建一份schema即可。首先,在filter-streams下执行命令创建保存schema的文件夹:

mkdir -p src/main/proto

之后创建publication.proto文件,内容如下:

syntax = "proto3";

package huxihx.kafkastreams.proto;

message Publication {
string name = 1;
string title = 2;
}

保存文件之后运行下列命令去编译对应的Java类:

./gradlew build

此时,你应该可以在src/main/java/huxihx/kafkastreams/proto下看到生成的Java类:PublicationOuterClass。

创建Serdes

这一步的Serdes和上一篇中的一样,因此不再赘述,直接上代码:

mkdir -p src/main/java/huxihx/kafkastreams/serdes

在新创建的serdes文件夹下创建ProtobufSerializer.java:

package huxihx.kafkastreams.serdes;

import com.google.protobuf.MessageLite;
import org.apache.kafka.common.serialization.Serializer; public class ProtobufSerializer<T extends MessageLite> implements Serializer<T> {
@Override
public byte[] serialize(String topic, T data) {
return data == null ? new byte[0] : data.toByteArray();
}
}

然后创建ProtobufDeserializer.java:

package huxihx.kafkastreams.serdes;

import com.google.protobuf.InvalidProtocolBufferException;
import com.google.protobuf.MessageLite;
import com.google.protobuf.Parser;
import org.apache.kafka.common.errors.SerializationException;
import org.apache.kafka.common.serialization.Deserializer; import java.util.Map; public class ProtobufDeserializer<T extends MessageLite> implements Deserializer<T> { private Parser<T> parser; @Override
public void configure(Map<String, ?> configs, boolean isKey) {
parser = (Parser<T>) configs.get("parser");
} @Override
public T deserialize(String topic, byte[] data) {
try {
return parser.parseFrom(data);
} catch (InvalidProtocolBufferException e) {
throw new SerializationException("Failed to deserialize from a protobuf byte array.", e);
}
}
}

最后创建ProtobufSerdes.java:

package huxihx.kafkastreams.serdes;

import com.google.protobuf.MessageLite;
import com.google.protobuf.Parser;
import org.apache.kafka.common.serialization.Deserializer;
import org.apache.kafka.common.serialization.Serde;
import org.apache.kafka.common.serialization.Serializer; import java.util.HashMap;
import java.util.Map; public class ProtobufSerdes<T extends MessageLite> implements Serde<T> { private final Serializer<T> serializer;
private final Deserializer<T> deserializer; public ProtobufSerdes(Parser<T> parser) {
serializer = new ProtobufSerializer<>();
deserializer = new ProtobufDeserializer<>();
Map<String, Parser<T>> config = new HashMap<>();
config.put("parser", parser);
deserializer.configure(config, false);
} @Override
public Serializer<T> serializer() {
return serializer;
} @Override
public Deserializer<T> deserializer() {
return deserializer;
}
}

开发主流程

在src/main/java/huxihx/kafkastreams下创建FilteredStreamsApp.java文件:

package huxihx.kafkastreams;

import huxihx.kafkastreams.proto.PublicationOuterClass;
import huxihx.kafkastreams.serdes.ProtobufSerdes;
import org.apache.kafka.clients.admin.AdminClient;
import org.apache.kafka.clients.admin.NewTopic;
import org.apache.kafka.common.serialization.Serde;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.Topology;
import org.apache.kafka.streams.kstream.Consumed;
import org.apache.kafka.streams.kstream.Produced; import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.Set;
import java.util.concurrent.CountDownLatch; public class FilteredStreamsApp { private Properties buildStreamsProperties(Properties envProps) {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, envProps.getProperty("application.id"));
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, envProps.getProperty("bootstrap.servers"));
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
return props;
} private void preCreateTopics(Properties envProps) throws Exception {
Map<String, Object> config = new HashMap<>();
config.put("bootstrap.servers", envProps.getProperty("bootstrap.servers"));
try (AdminClient client = AdminClient.create(config)) {
Set<String> existingTopics = client.listTopics().names().get(); List<NewTopic> topics = new ArrayList<>();
String inputTopic = envProps.getProperty("input.topic.name");
if (!existingTopics.contains(inputTopic)) {
topics.add(new NewTopic(inputTopic,
Integer.parseInt(envProps.getProperty("input.topic.partitions")),
Short.parseShort(envProps.getProperty("input.topic.replication.factor")))); } String outputTopic = envProps.getProperty("output.topic.name");
if (!existingTopics.contains(outputTopic)) {
topics.add(new NewTopic(outputTopic,
Integer.parseInt(envProps.getProperty("output.topic.partitions")),
Short.parseShort(envProps.getProperty("output.topic.replication.factor"))));
} client.createTopics(topics);
}
} private Properties loadEnvProperties(String filePath) throws IOException {
Properties envProps = new Properties();
try (FileInputStream input = new FileInputStream(filePath)) {
envProps.load(input);
}
return envProps;
} private Topology buildTopology(Properties envProps, final Serde<PublicationOuterClass.Publication> publicationSerde) {
final StreamsBuilder builder = new StreamsBuilder(); final String inputTopic = envProps.getProperty("input.topic.name");
final String outputTopic = envProps.getProperty("output.topic.name"); builder.stream(inputTopic, Consumed.with(Serdes.String(), publicationSerde))
.filter((key, publication) -> "George R. R. Martin".equals(publication.getName()))
.to(outputTopic, Produced.with(Serdes.String(), publicationSerde));
return builder.build();
} public static void main(String[] args) throws Exception {
if (args.length < 1) {
throw new IllegalArgumentException("Environment configuration file must be specified.");
} FilteredStreamsApp app = new FilteredStreamsApp();
Properties envProps = app.loadEnvProperties(args[0]);
Properties streamProps = app.buildStreamsProperties(envProps); app.preCreateTopics(envProps); Topology topology = app.buildTopology(envProps, new ProtobufSerdes<>(PublicationOuterClass.Publication.parser())); final KafkaStreams streams = new KafkaStreams(topology, streamProps);
final CountDownLatch latch = new CountDownLatch(1); Runtime.getRuntime().addShutdownHook(new Thread("streams-jvm-shutdown-hook") {
@Override
public void run() {
streams.close();
latch.countDown();
}
}); try {
streams.start();
latch.await();
} catch (Exception e) {
System.exit(1);
}
System.exit(0);
}
}

编写测试Producer和Consumer

在src/main/java/huxihx/kafkastreams/tests/TestProducer.java和TestConsumer.java,内容分别如下:

package huxihx.kafkastreams.tests;

import huxihx.kafkastreams.proto.PublicationOuterClass;
import huxihx.kafkastreams.serdes.ProtobufSerializer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord; import java.util.Arrays;
import java.util.List;
import java.util.Properties; public class TestProducer { // 测试输入事件
private static final List<PublicationOuterClass.Publication> TEST_PUBLICATIONS = Arrays.asList(
PublicationOuterClass.Publication.newBuilder()
.setName("George R. R. Martin").setTitle("A Song of Ice and Fire").build(),
PublicationOuterClass.Publication.newBuilder()
.setName("C.S. Lewis").setTitle("The Silver Chair").build(),
PublicationOuterClass.Publication.newBuilder()
.setName("C.S. Lewis").setTitle("Perelandra").build(),
PublicationOuterClass.Publication.newBuilder()
.setName("George R. R. Martin").setTitle("Fire & Blood").build(),
PublicationOuterClass.Publication.newBuilder()
.setName("J. R. R. Tolkien").setTitle("The Hobbit").build(),
PublicationOuterClass.Publication.newBuilder()
.setName("J. R. R. Tolkien").setTitle("The Lord of the Rings").build(),
PublicationOuterClass.Publication.newBuilder()
.setName("George R. R. Martin").setTitle("A Dream of Spring").build(),
PublicationOuterClass.Publication.newBuilder()
.setName("J. R. R. Tolkien").setTitle("The Fellowship of the Ring").build(),
PublicationOuterClass.Publication.newBuilder()
.setName("George R. R. Martin").setTitle("The Ice Dragon").build()); public static void main(String[] args) {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", new ProtobufSerializer<PublicationOuterClass.Publication>().getClass()); try (final Producer<String, PublicationOuterClass.Publication> producer = new KafkaProducer<>(props)) {
TEST_PUBLICATIONS.stream()
.map(publication -> new ProducerRecord<String, PublicationOuterClass.Publication>("publications", publication))
.forEach(producer::send);
}
}
}
package huxihx.kafkastreams.tests;

import com.google.protobuf.Parser;
import huxihx.kafkastreams.proto.PublicationOuterClass;
import huxihx.kafkastreams.serdes.ProtobufDeserializer;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.Deserializer;
import org.apache.kafka.common.serialization.StringDeserializer; import java.time.Duration;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties; public class TestConsumer { public static void main(String[] args) {
// 为输出事件构造protobuf deserializer
Deserializer<PublicationOuterClass.Publication> deserializer = new ProtobufDeserializer<>();
Map<String, Parser<PublicationOuterClass.Publication>> config = new HashMap<>();
config.put("parser", PublicationOuterClass.Publication.parser());
deserializer.configure(config, false); Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test-group");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("auto.offset.reset", "earliest");
KafkaConsumer<String, PublicationOuterClass.Publication> consumer = new KafkaConsumer<>(props, new StringDeserializer(), deserializer);
consumer.subscribe(Arrays.asList("filtered-publications"));
while (true) {
ConsumerRecords<String, PublicationOuterClass.Publication> records = consumer.poll(Duration.ofSeconds(1));
for (ConsumerRecord<String, PublicationOuterClass.Publication> record : records)
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}
}
}

测试

首先我们运行下列命令构建项目:

./gradlew shadowJar

然后启动Kafka集群,之后运行Kafka Streams应用:

java -jar build/libs/kstreams-transform-standalone-0.0.1.jar configuration/dev.properties

然后启动TestProducer发送测试事件:

java -cp build/libs/kstreams-transform-standalone-0.0.1.jar huxihx.kafkastreams.tests.TestProducer

最后启动TestConsumer验证Kafka Streams过滤出了指定的Publication消息:

java -cp build/libs/kstreams-transform-standalone-0.0.1.jar huxihx.kafkastreams.tests.TestConsumer

.......

offset = 0, key = null, value = name: "George R. R. Martin"
title: "A Song of Ice and Fire"

offset = 1, key = null, value = name: "George R. R. Martin"
title: "Fire & Blood"

offset = 2, key = null, value = name: "George R. R. Martin"
title: "A Dream of Spring"

offset = 3, key = null, value = name: "George R. R. Martin"
title: "The Ice Dragon"

总结

下一篇介绍rekey的用法,即实时修改消息的Key值~~

Kafka Streams开发入门(2)的更多相关文章

  1. Kafka Streams开发入门(5)

    1. 背景 上一篇演示了split操作算子的用法.今天展示一下split的逆操作:merge.Merge算子的作用是把多股实时消息流合并到一个单一的流中. 2. 功能演示说明 假设我们有多个Kafka ...

  2. Kafka Streams开发入门(4)

    背景 上一篇演示了filter操作算子的用法.今天展示一下如何根据不同的条件谓词(Predicate)将一个消息流实时地进行分流,划分成多个新的消息流,即所谓的流split.有的时候我们想要对消息流中 ...

  3. Kafka Streams开发入门(3)

    背景 上一篇我们介绍了Kafka Streams中的消息过滤操作filter,今天我们展示一个对消息进行转换Key的操作,依然是结合一个具体的实例展开介绍.所谓转换Key是指对流处理中每条消息的Key ...

  4. Kafka Streams开发入门(1)

    背景 最近发现Confluent公司在官网上发布了Kafka Streams教程,共有10节课,每节课给出了Kafka Streams的一个功能介绍.这个系列教程对于我们了解Kafka Streams ...

  5. Kafka .net 开发入门

    Kafka安装 首先我们需要在windows服务器上安装kafka以及zookeeper,有关zookeeper的介绍将会在后续进行讲解. 在网上可以找到相应的安装方式,我采用的是腾讯云服务器,借鉴的 ...

  6. 大全Kafka Streams

    本文将从以下三个方面全面介绍Kafka Streams 一. Kafka Streams 概念 二. Kafka Streams 使用 三. Kafka Streams WordCount   一. ...

  7. Kafka Streams | 流,实时处理和功能

    1.目标 在我们之前的Kafka教程中,我们讨论了Kafka中的ZooKeeper.今天,在这个Kafka Streams教程中,我们将学习Kafka中Streams的实际含义.此外,我们将看到Kaf ...

  8. 七 Kafka Streams VS Consumer API

    1 kafka Streams:   概念: 处理和分析储存在Kafka中的数据,并把处理结果写回Kafka或发送到外部系统的最终输出点,它建立在一些很重要的概念上,比如事件时间和消息时间的准确区分, ...

  9. Kafka入门实战教程(7):Kafka Streams

    1 关于流处理 流处理平台(Streaming Systems)是处理无限数据集(Unbounded Dataset)的数据处理引擎,而流处理是与批处理(Batch Processing)相对应的.所 ...

随机推荐

  1. IComparable<T>.CompareTo(T) 方法

    IComparable<T>.CompareTo(T) 方法 定义 命名空间: System 程序集: System.Runtime.dll, mscorlib.dll, netstand ...

  2. ES6异步操作Promise

    什么是Promise Promise是异步编程的一种解决方案,说白了就是一个构造函数,带有all,reject,resolve这几个方法,圆形上有then,catch等方法 Promise的特点 对象 ...

  3. 转载-mysql中文编码问题

    具体原理见:MySQL:windows中困扰着我们的中文乱码问题 分割线: 我的电脑win7 64位,这个问题可能是所有win系统出现的问题 我出现的问题: 是正确的 出现了中文的张三,则错误,编码错 ...

  4. Django 1.11 shell中模块导入问题

    django报错:django.core.exceptions.ImproperlyConfigured: 处理办法 import os os.environ['DJANGO_SETTINGS_MOD ...

  5. 【CF573E】Bear and Bowling

    [CF573E]Bear and Bowling 题面 洛谷 题解 首先有一个贪心的结论: 我们一次加入每个数,对于\(\forall i\),位置\(i\)的贡献为\(V_i = k_i\times ...

  6. Redis Zrevrank 命令

    Redis Zrevrank 命令返回有序集中成员的排名.其中有序集成员按分数值递减(从大到小)排序. 排名以 0 为底,也就是说, 分数值最大的成员排名为 0 . 使用 ZRANK 命令可以获得成员 ...

  7. C++判断计算式是大端存储模式,还是小端存储模式

    小端存储:数据的低字节存储在地址空间的低字节位,数据的高字节存储在地址空间的高字节位. 大端存储:数据的低字节存储在地址空间的高字节位,数据的高字节存储在地址空间的低字节位. 判断计算机是小端还是大端 ...

  8. 20165313-bof进阶

    实践基础知识 1.ALSR 1.定义: ASLR,全称为 Address Space Layout Randomization,地址空间布局随机化,它将进程的某些内存空间地址进行随机化来增大入侵者预测 ...

  9. 【技术博客】Postman接口测试教程 - 环境、附加验证、文件上传测试

    Postman接口测试教程 - 环境.附加验证.文件上传测试 v1.0 作者:ZBW 前言 继利用Postman和Jmeter进行接口性能测试之后,我们发现Postman作为一款入门容易的工具,其内置 ...

  10. 使用ps将短视频制作成gif以进行展示遇到的问题

    前言 为了将我们完成的微信小程序小小易校园展示给没用过的用户进行宣传,我们选择将小程序的页面使用过程录屏并制作成GIF来进行展示.制作选择的工具经过初步尝试及对比,最终选择了Adobe Photosh ...