Flume Channel Selectors + kafka

http://flume.apache.org/FlumeUserGuide.html#custom-channel-selector

官方文档上channel selectors 有两种类型:

Replicating Channel Selector (default)

Multiplexing Channel Selector

这两种selector的区别是:Replicating 会将source过来的events发往所有channel,而Multiplexing 可以选择该发往哪些channel。对于上面的例子来说，如果采用Replicating ,那么demo和demo2的日志会同时发往channel1和channel2,这显然是和需求不符的，需求只是让demo的日志发往channel1,而demo2的日志发往channel2。

验证replicating ,验证思路是建立两个两个kafka channel 然后当flume采集数据数据会经过kafka ,通过kakfa的消费程序看是否发送给了两个kafka channel

#测试 channel selector

#测试方法,chanel改为kafka 通过两个消费者验证消息 的发送策略

#

a1.sources = r1

a1.sinks = k1

a1.channels = c1 c2 c3

a1.sources.r1.selector.type = replicating

a1.sources.r1.channels = c1 c2

#a1.sources.r1.selector.optional = c3

# For each one of the sources, the type is defined

#agent.sources.seqGenSrc.type = seq

#a1.sources.r1.type = netcat

#a1.sources.r1.bind=mini1

#a1.sources.r1.port=

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /home/hadoop/flume/test/logs/flume2.dat

# The channel can be defined as follows.

#agent.sources.seqGenSrc.channels = memoryChannel

#a1.channels.c1.type=memory

#a1.channels.c1.capacity=

#a1.channels.c1.transactionCapacity =

a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel

a1.channels.c1.kafka.bootstrap.servers = mini1:,mini2:,mini3:

#channel selector replicating

a1.channels.c1.kafka.topic = csr1

a1.channels.c1.kafka.consumer.group.id = csr01

a1.channels.c2.type = org.apache.flume.channel.kafka.KafkaChannel

a1.channels.c2.kafka.bootstrap.servers = mini1:,mini2:,mini3:

#channel selector replicating

a1.channels.c2.kafka.topic = csr2

a1.channels.c2.kafka.consumer.group.id = csr02

# Each sink's type must be defined

#agent.sinks.loggerSink.type = logger

a1.sinks.k1.type = logger

#Specify the channel the sink should use

#agent.sinks.loggerSink.channel = memoryChannel

a1.sources.r1.channels = c1 c2

a1.sinks.k1.channel = c1

# Each channel's type is defined.

#agent.channels.memoryChannel.type = memory

# Other config values specific to each type of channel(sink or source)

# can be defined as well

# In this case, it specifies the capacity of the memory channel

#agent.channels.memoryChannel.capacity =

kafka 消费程序

 public static void main(String[] args) throws IOException {

        Properties props = new Properties();

        props.load(TestConsumer.class.getClass().getResourceAsStream("/kfkConsumer.properties"));

        KafkaConsumer<Integer, String> consumer = new KafkaConsumer<>(props);

        consumer.subscribe(Arrays.asList("csr2","csr1"));

        while (true) {

            ConsumerRecords<Integer, String> records = consumer.poll();

            for (ConsumerRecord<Integer, String> record : records) {

                System.out.print("Thread : " + Thread.currentThread().getName());

                System.out.printf("topic = %s,  offset = %d, key = %s, value = %s, partition = %d %n",record.topic(), record.offset(), record.key(), record.value(), record.partition());

            }

            consumer.commitSync();

        }

    }

消费结果

Thread : maintopic = csr1,  offset = , key = null, value =  from haishang, partition =

Thread : maintopic = csr2,  offset = , key = null, value =  from haishang, partition =

结论,flume channel selector 使用 replicating 策略时会把消息发送给所有的配置的可以用的channel

第二种验证方法,此时要启动三个节点,注意其中sources.sinks,的名字

第一个flume中

#channelSelector_replicationg_avro.conf 

# Name the components on this agent

a1.sources = r1

a1.sinks = k1 k2

a1.channels = c1 c2  

# Describe/configure the source

a1.sources.r1.type = syslogtcp

a1.sources.r1.port =

#a1.sources.r1.host = 192.168.233.128

a1.sources.r1.host = 192.168.10.201

a1.sources.r1.selector.type = replicating

a1.sources.r1.channels = c1 c2  

# Describe the sink

a1.sinks.k1.type = avro

a1.sinks.k1.channel = c1

#a1.sinks.k1.hostname = 192.168.233.129

a1.sinks.k1.hostname = 192.168.10.202

a1.sinks.k1.port =   

a1.sinks.k2.type = avro

a1.sinks.k2.channel = c2

#a1.sinks.k2.hostname = 192.168.233.130

a1.sinks.k2.hostname = 192.168.10.203

a1.sinks.k2.port =

# Use a channel which buffers events inmemory

a1.channels.c1.type = memory

a1.channels.c1.capacity =

a1.channels.c1.transactionCapacity =   

a1.channels.c2.type = memory

a1.channels.c2.capacity =

a1.channels.c2.transactionCapacity =

sink

#channelSelector_replicating_sink.conf
# Name the components on this agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1

# Describe/configure the source
a2.sources.r1.type = avro
a2.sources.r1.channels = c1
#a2.sources.r1.bind = 192.168.233.129
a2.sources.r1.bind = 192.168.10.202
a2.sources.r1.port = 50000

# Describe the sink
a2.sinks.k1.type = logger
a2.sinks.k1.channel = c1

# Use a channel which buffers events inmemory
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

sink

#channelSelector_replicating_sink.conf
# Name the components on this agent
a3.sources = r1
a3.sinks = k1
a3.channels = c1

# Describe/configure the source
a3.sources.r1.type = avro
a3.sources.r1.channels = c1
#a3.sources.r1.bind = 192.168.233.130
a3.sources.r1.bind = 192.168.10.203
a3.sources.r1.port = 50000

# Describe the sink
a3.sinks.k1.type = logger
a3.sinks.k1.channel = c1

# Use a channel which buffers events inmemory
a3.channels.c1.type = memory
a3.channels.c1.capacity = 1000
a3.channels.c1.transactionCapacity = 100
~

启动命令

启动sink

bin/flume-ng agent -c conf -f conf/channelSelector_replicating_sink.conf -n a3 -Dflume.root.logger=INFO,console

flume-ng agent -c conf -f conf/channelSelector_replicating_sink.conf -n a2 -Dflume.root.logger=INFO,console

启动source

flume-ng agent -c conf -f conf/channelSelector_replicationg_avro.conf -n a1 -Dflume.root.logger=INFO,console

发送消息 :echo "you are the best "| nc 192.168.10.201 50000

验证multiplexing

source

#配置文

a1.sources= r1

a1.sinks= k1 k2

a1.channels= c1 c2  

#Describe/configure the source

a1.sources.r1.type=http

a1.sources.r1.port= 

#a1.sources.r1.host= 192.168.233.128

a1.sources.r1.host=mini1

a1.sources.r1.selector.type= multiplexing

a1.sources.r1.channels= c1 c2  

a1.sources.r1.selector.header= state

a1.sources.r1.selector.mapping.CZ= c1

a1.sources.r1.selector.mapping.US= c2

a1.sources.r1.selector.default= c1  

#Describe the sink

a1.sinks.k1.type= avro

a1.sinks.k1.channel= c1

#a1.sinks.k1.hostname= 192.168.233.129

a1.sinks.k1.hostname=mini2

a1.sinks.k1.port=   

a1.sinks.k2.type= avro

a1.sinks.k2.channel= c2

#a1.sinks.k2.hostname= 192.168.233.130

a1.sinks.k2.hostname=mini3

a1.sinks.k2.port=

# Usea channel which buffers events in memory

a1.channels.c1.type= memory

a1.channels.c1.capacity=

a1.channels.c1.transactionCapacity=   

a1.channels.c2.type= memory

a1.channels.c2.capacity=

a1.channels.c2.transactionCapacity=

sink1

# Name the components on this agent

a2.sources = r1

a2.sinks = k1

a2.channels = c1  

# Describe/configure the source

a2.sources.r1.type = avro

a2.sources.r1.channels = c1

#a2.sources.r1.bind = 192.168.233.129

a2.sources.r1.bind = mini2

a2.sources.r1.port =   

# Describe the sink

a2.sinks.k1.type = logger

a2.sinks.k1.channel = c1  

# Use a channel which buffers events inmemory

a2.channels.c1.type = memory

a2.channels.c1.capacity =

a2.channels.c1.transactionCapacity =

sink2

# Name the components on this agent

a3.sources = r1

a3.sinks = k1

a3.channels = c1

# Describe/configure the source

a3.sources.r1.type = avro

a3.sources.r1.channels = c1

#.sources.r1.bind = 192.168.233.129

a3.sources.r1.bind = mini3

a3.sources.r1.port = 

# Describe the sink

a3.sinks.k1.type = logger

a3.sinks.k1.channel = c1

# Use a channel which buffers events inmemory

a3.channels.c1.type = memory

a3.channels.c1.capacity =

a3.channels.c1.transactionCapacity =

启动sink

bin/flume-ng agent -c conf -f conf/channelSelector_mul_sink.conf -n a3 -Dflume.root.logger=INFO,console

bin/flume-ng agent -c conf -f conf/channelSelector_mul_sink.conf -n a2 -Dflume.root.logger=INFO,console

bin/flume-ng agent -c conf -f conf/channelSelector_multi.conf -n a1 -Dflume.root.logger=INFO,console

有以上命令推断出配置文件名字

执行命令

curl -X POST -d '[{"headers" :{"state" : "CZ"},"body" :"CZ"}]' http://mini1:50000

curl -X POST -d '[{"headers" :{"state" : "US"},"body" :"US"}]' http://mini1:50000

curl -X POST -d '[{"headers" :{"state" : "NO"},"body" :"no"}]' http://mini1:50000

结果

CZ的消息会发送到sink1节点上

US会发送大sink2基点,

//,NO 的消息会发送到sink1节点上

//其中CZ和US是在上面source节点配置的,NO没有配置

//但是为什么NO的消息会一直发送到sink1

上面的source 中有连个新的类型 syslogtcp(Syslogtcp监听TCP的端口做为数据源) http()

Flume Channel Selectors + kafka的更多相关文章

Flume Channel Selectors官网剖析(博主推荐）
不多说,直接上干货! Flume Sources官网剖析(博主推荐) Flume Channels官网剖析(博主推荐) 一切来源于flume官网 http://flume.apache.org/Flu ...
Flafka: Apache Flume Meets Apache Kafka for Event Processing
The new integration between Flume and Kafka offers sub-second-latency event processing without the n ...
消费滚动滴log日志文件(flume监听,kafka消费,zookeeper协同)
第一步:数据源手写程序实现自动生成如下格式的日志文件: 15837312345,13737312345,2017-01-09 08:09:10,0360 打包放到服务器,使用如下命令执行,模拟持续不 ...
Flume下读取kafka数据后再打把数据输出到kafka,利用拦截器解决topic覆盖问题
1:如果在一个Flume Agent中同时使用Kafka Source和Kafka Sink来处理events,便会遇到Kafka Topic覆盖问题,具体表现为,Kafka Source可以正常从指 ...
Flume Channel Selector
Flume 基于Channel Selector可以实现扇入.扇出. 同一个数据源分发到不同的目的,如下图. 在source上可以定义channel selector: 1 2 3 4 5 6 7 8 ...
Flume Channel
http://blog.csdn.net/john_f_lau/article/details/20913831 http://dev.cmcm.com/archives/194
【翻译】Flume 1.8.0 User Guide(用户指南) Channel
翻译自官网flume1.8用户指南,原文地址:Flume 1.8.0 User Guide 篇幅限制,分为以下5篇: [翻译]Flume 1.8.0 User Guide(用户指南) [翻译]Flum ...
一次flume exec source采集日志到kafka因为单条日志数据非常大同步失败的踩坑带来的思考
本次遇到的问题描述,日志采集同步时,当单条日志(日志文件中一行日志)超过2M大小,数据无法采集同步到kafka,分析后,共踩到如下几个坑.1.flume采集时,通过shell+EXEC(tail -F ...
CentOS7搭建Flume与Kafka整合及基础操作与测试
前提已完成Kafka的搭建,具体步骤参照CentOS7搭建Kafka单机环境及基础操作 Flume安装下载 wget http://mirrors.tuna.tsinghua.edu.cn/apa ...

随机推荐

MIME简介
MIME(Multipurpose Internet Mail Extensions)多用途互联网邮件扩展类型.是设定某种扩展名的文件用一种应用程序来打开的方式类型,当该扩展名文件被访问的时候,浏览器 ...
c# String.IndexOf 方法 string查找字符串
c# String.IndexOf 方法 (value, [startIndex], [count]) 报告指定字符在此实例中的第一个匹配项的索引.搜索从指定字符位置开始,并检查指定数量的字符位置. ...
[RSpec] LEVEL 1: INTRODUCTION
Install RSpec: Describe Lets start writing a specification for the Tweet class. Write a describe blo ...
淘宝网前端开发面试题(一)--HTML & CSS 面试题
所有答案仅供参考,不负责答案对错(^_^) 1.DOCTYPE? 严格模式不混杂模式-如何触发这两种模式,区分它们有何意义? 分析: DOCTYPE(是DOCument TYPE的缩写,即文档类型)是 ...
xcode_6_beta.dmg
http://pan.baidu.com/s/1qW2lWoW password:5nty
nginx-rtmp流媒体服务器搭建【转】
nginx-rtmp https://github.com/pengqiuyuan/nginx-rtmp nginx-rtmp 流媒体服务器的搭建(采集桌面,手机直播) 在线Demo,直播自己的pc机 ...
【Excle数据透视】多列分别分类计数
需求今天碰到一个很特殊的需求,如下(分别对每一列的值去重并统计个数): 预期结果实现方法推荐使用第三种方案,因为不用写公式,比较简单! 方法一:使用countif函数在单元格J2输入公式COU ...
Unity3D游戏开发之SQLite让数据库开发更简单
各位朋友大家好.欢迎大家关注我的博客,我是秦元培,我是博客地址是http://blog.csdn.net/qinyuanpei.在经历了一段时间的忙碌后,博主最终有时间来研究新的东西啦,今天博客向和大 ...
PHP-流的概念与详细用法
Stream是PHP开发里最容易被忽视的函数系列(SPL系列,Stream系列,pack函数,封装协议)之一,但其是个很有用也很重要的函数.Stream可以翻译为“流”,在Java里,流是一个很重要的 ...
android开发新浪微博客户端完整攻略 [新手必读]
开始接触学习android已经有3个礼拜了,一直都是对着android的sdk文档写Tutorials从Hello World到Notepad Tutorial算是初步入门了吧,刚好最近对微博感兴趣就 ...

Flume Channel Selectors + kafka

Flume Channel Selectors + kafka的更多相关文章

随机推荐

热门专题