Flume的Source、Sink总结,及常用使用场景
数据源Source
RPC异构流数据交换
- Avro Source
- Thrift Source
文件或目录变化监听
- Exec Source
- Spooling Directory Source
- Taildir Source
MQ或队列订阅数据持续监听
- JMS Source
- SSL and JMS Source
- Kafka Source
Network类数据交换
- NetCat TCP Source
- NetCat UDP Source
- HTTP Source
- Syslog Sources
- Syslog TCP Source
- Multiport Syslog TCP Source
- Syslog UDP Source
定制源
- Custom Source
Sink
- HDFS Sink
- Hive Sink
- Logger Sink
- Avro Sink
- Thrift Sink
- IRC Sink
- File Roll Sink
- HBaseSinks
- HBaseSink
- HBase2Sink
- AsyncHBaseSink
- MorphlineSolrSink
- ElasticSearchSink
- Kite Dataset Sink
- Kafka Sink
- HTTP Sink
- Custom Sink
案例
1、监听文件变化
exec-memory-logger.properties
#指定agent的sources,sinks,channels
a1.sources = s1
a1.sinks = k1
a1.channels = c1 #配置sources属性
a1.sources.s1.type = exec
a1.sources.s1.command = tail -F /tmp/log.txt
a1.sources.s1.shell = /bin/bash -c
a1.sources.s1.channels = c1 #配置sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = 192.168.1.103
a1.sinks.k1.port = 8888
a1.sinks.k1.batch-size = 1
a1.sinks.k1.channel = c1 #配置channel类型
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
启动
flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/exec-memory-logger.properties --name a1 -Dflume.root.logger=INFO,console
测试
echo "asfsafsf" >> /tmp/log.txt
2、TCP NetCat监听
netcat.properties
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444 # Describe the sink
a1.sinks.k1.type = logger # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
启动
flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/netcat.properties --name a1 -Dflume.root.logger=INFO,console
测试
telnet localhost 44444
3、Kafka读、写 (读:从kafka到log,写:从file到kafka)
read-kafka.properties 、write-kafka.properties
#指定agent的sources,sinks,channels
a1.sources = s1
a1.sinks = k1
a1.channels = c1 #配置sources属性
a1.sources.s1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.s1.channels = c1
a1.sources.s1.batchSize = 5000
a1.sources.s1.batchDurationMillis = 2000
a1.sources.s1.kafka.bootstrap.servers = 192.168.1.103:9092
a1.sources.s1.kafka.topics = test1
a1.sources.s1.kafka.consumer.group.id = custom.g.id #将sources与channels进行绑定
a1.sources.s1.channels = c1 #配置sink
a1.sinks.k1.type = logger #将sinks与channels进行绑定
a1.sinks.k1.channel = c1 #配置channel类型
a1.channels.c1.type = memory
a1.sources = s1
a1.channels = c1
a1.sinks = k1 a1.sources.s1.type=exec
a1.sources.s1.command=tail -F /tmp/kafka.log
a1.sources.s1.channels=c1 #设置Kafka接收器
a1.sinks.k1.type= org.apache.flume.sink.kafka.KafkaSink
#设置Kafka地址
a1.sinks.k1.brokerList=192.168.1.103:9092
#设置发送到Kafka上的主题
a1.sinks.k1.topic=test1
#设置序列化方式
a1.sinks.k1.serializer.class=kafka.serializer.StringEncoder
a1.sinks.k1.channel=c1 a1.channels.c1.type=memory
a1.channels.c1.capacity=10000
a1.channels.c1.transactionCapacity=100
启动
flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/read-kafka.properties --name a1 -Dflume.root.logger=INFO,console flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/write-kafka.properties --name a1 -Dflume.root.logger=INFO,console
测试
# 创建用于测试主题
bin/kafka-topics.sh --create \
--bootstrap-server 192.168.1.103:9092 \
--replication-factor 1 \
--partitions 1 \
--topic test1
# 启动 Producer,用于发送测试数据:
bin/kafka-console-producer.sh --broker-list 192.168.1.103:9092 --topic test1
4、定制源
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = org.example.MySource
a1.sources.r1.channels = c1
5、HDFS Sink
spooling-memory-hdfs.properties ,监听目录变化,将新建的文件传到HDFS
#指定agent的sources,sinks,channels
a1.sources = s1
a1.sinks = k1
a1.channels = c1 #配置sources属性
a1.sources.s1.type =spooldir
a1.sources.s1.spoolDir =/tmp/log2
a1.sources.s1.basenameHeader = true
a1.sources.s1.basenameHeaderKey = fileName
#将sources与channels进行绑定
a1.sources.s1.channels =c1 #配置sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H/
a1.sinks.k1.hdfs.filePrefix = %{fileName}
#生成的文件类型,默认是Sequencefile,可用DataStream,则为普通文本
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.useLocalTimeStamp = true
#将sinks与channels进行绑定
a1.sinks.k1.channel = c1 #配置channel类型
a1.channels.c1.type = memory
测试
hdfs dfs -ls /flume/events/19-11-21/15
6、Hive Sink
a1.channels = c1
a1.channels.c1.type = memory
a1.sinks = k1
a1.sinks.k1.type = hive
a1.sinks.k1.channel = c1
a1.sinks.k1.hive.metastore = thrift://127.0.0.1:9083
a1.sinks.k1.hive.database = logsdb
a1.sinks.k1.hive.table = weblogs
a1.sinks.k1.hive.partition = asia,%{country},%y-%m-%d-%H-%M
a1.sinks.k1.useLocalTimeStamp = false
a1.sinks.k1.round = true
a1.sinks.k1.roundValue = 10
a1.sinks.k1.roundUnit = minute
a1.sinks.k1.serializer = DELIMITED
a1.sinks.k1.serializer.delimiter = "\t"
a1.sinks.k1.serializer.serdeSeparator = '\t'
a1.sinks.k1.serializer.fieldnames =id,,msg
7、Avro Source、Avro Sink
exec-memory-avro.properties、avro-memory-log.properties
#指定agent的sources,sinks,channels
a1.sources = s1
a1.sinks = k1
a1.channels = c1 #配置sources属性
a1.sources.s1.type = exec
a1.sources.s1.command = tail -F /tmp/log.txt
a1.sources.s1.shell = /bin/bash -c
a1.sources.s1.channels = c1 #配置sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = 192.168.1.103
a1.sinks.k1.port = 8888
a1.sinks.k1.batch-size = 1
a1.sinks.k1.channel = c1 #配置channel类型
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
#指定agent的sources,sinks,channels
a2.sources = s2
a2.sinks = k2
a2.channels = c2 #配置sources属性
a2.sources.s2.type = avro
a2.sources.s2.bind = 192.168.1.103
a2.sources.s2.port = 8888 #将sources与channels进行绑定
a2.sources.s2.channels = c2 #配置sink
a2.sinks.k2.type = logger #将sinks与channels进行绑定
a2.sinks.k2.channel = c2 #配置channel类型
a2.channels.c2.type = memory
a2.channels.c2.capacity = 1000
a2.channels.c2.transactionCapacity = 100
启动
先
flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/avro-memory-log.properties --name a2 -Dflume.root.logger=INFO,console
后
flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/exec-memory-avro.properties --name a1 -Dflume.root.logger=INFO,console
测试,使用一个Avro客户端发送数据
import org.apache.flume.Event;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.event.EventBuilder;
import org.apache.flume.api.SecureRpcClientFactory;
import org.apache.flume.api.RpcClientConfigurationConstants;
import org.apache.flume.api.RpcClient;
import java.nio.charset.Charset;
import java.util.Properties; public class MyApp {
public static void main(String[] args) {
MySecureRpcClientFacade client = new MySecureRpcClientFacade();
// Initialize client with the remote Flume agent's host, port
Properties props = new Properties();
props.setProperty(RpcClientConfigurationConstants.CONFIG_CLIENT_TYPE, "thrift");
props.setProperty("hosts", "h1");
props.setProperty("hosts.h1", "client.example.org"+":"+ String.valueOf(8888)); // Initialize client with the kerberos authentication related properties
props.setProperty("kerberos", "true");
props.setProperty("client-principal", "flumeclient/client.example.org@EXAMPLE.ORG");
props.setProperty("client-keytab", "/tmp/flumeclient.keytab");
props.setProperty("server-principal", "flume/server.example.org@EXAMPLE.ORG");
client.init(props); // Send 10 events to the remote Flume agent. That agent should be
// configured to listen with an AvroSource.
String sampleData = "Hello Flume!";
for (int i = 0; i < 10; i++) {
client.sendDataToFlume(sampleData);
} client.cleanUp();
}
} class MySecureRpcClientFacade {
private RpcClient client;
private Properties properties; public void init(Properties properties) {
// Setup the RPC connection
this.properties = properties;
// Create the ThriftSecureRpcClient instance by using SecureRpcClientFactory
this.client = SecureRpcClientFactory.getThriftInstance(properties);
} public void sendDataToFlume(String data) {
// Create a Flume Event object that encapsulates the sample data
Event event = EventBuilder.withBody(data, Charset.forName("UTF-8")); // Send the event
try {
client.append(event);
} catch (EventDeliveryException e) {
// clean up and recreate the client
client.close();
client = null;
client = SecureRpcClientFactory.getThriftInstance(properties);
}
} public void cleanUp() {
// Close the RPC connection
client.close();
}
}
8、Elasticsearch Sink
a1.channels = c1
a1.sinks = k1
a1.sinks.k1.type = elasticsearch
a1.sinks.k1.hostNames = 127.0.0.1:9200,127.0.0.2:9300
a1.sinks.k1.indexName = foo_index
a1.sinks.k1.indexType = bar_type
a1.sinks.k1.clusterName = foobar_cluster
a1.sinks.k1.batchSize = 500
a1.sinks.k1.ttl = 5d
a1.sinks.k1.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchDynamicSerializer
a1.sinks.k1.channel = c1
9、定制Source、Sink开发
public class MySink extends AbstractSink implements Configurable {
private String myProp; @Override
public void configure(Context context) {
String myProp = context.getString("myProp", "defaultValue"); // Process the myProp value (e.g. validation) // Store myProp for later retrieval by process() method
this.myProp = myProp;
} @Override
public void start() {
// Initialize the connection to the external repository (e.g. HDFS) that
// this Sink will forward Events to ..
} @Override
public void stop () {
// Disconnect from the external respository and do any
// additional cleanup (e.g. releasing resources or nulling-out
// field values) ..
} @Override
public Status process() throws EventDeliveryException {
Status status = null; // Start transaction
Channel ch = getChannel();
Transaction txn = ch.getTransaction();
txn.begin();
try {
// This try clause includes whatever Channel operations you want to do Event event = ch.take(); // Send the Event to the external repository.
// storeSomeData(e); txn.commit();
status = Status.READY;
} catch (Throwable t) {
txn.rollback(); // Log exception, handle individual exceptions as needed status = Status.BACKOFF; // re-throw all Errors
if (t instanceof Error) {
throw (Error)t;
}
}
return status;
}
}
public class MySource extends AbstractSource implements Configurable, PollableSource {
private String myProp; @Override
public void configure(Context context) {
String myProp = context.getString("myProp", "defaultValue"); // Process the myProp value (e.g. validation, convert to another type, ...) // Store myProp for later retrieval by process() method
this.myProp = myProp;
} @Override
public void start() {
// Initialize the connection to the external client
} @Override
public void stop () {
// Disconnect from external client and do any additional cleanup
// (e.g. releasing resources or nulling-out field values) ..
} @Override
public Status process() throws EventDeliveryException {
Status status = null; try {
// This try clause includes whatever Channel/Event operations you want to do // Receive new data
Event e = getSomeData(); // Store the Event into this Source's associated Channel(s)
getChannelProcessor().processEvent(e); status = Status.READY;
} catch (Throwable t) {
// Log exception, handle individual exceptions as needed status = Status.BACKOFF; // re-throw all Errors
if (t instanceof Error) {
throw (Error)t;
}
} finally {
txn.close();
}
return status;
}
}
Flume的Source、Sink总结,及常用使用场景的更多相关文章
- Flume:source和sink
Flume – 初识flume.source和sink 目录基本概念常用源 Source常用sink 基本概念 什么叫flume? 分布式,可靠的大量日志收集.聚合和移动工具. events ...
- FLUME KAFKA SOURCE 和 SINK 使用同一个 TOPIC
FLUME KAFKA SOURCE 和 SINK 使用同一个 TOPIC 最近做了一个事情,过滤下kakfa中的数据后,做这个就用到了flume,直接使用flume source 和 flume s ...
- 一次flume exec source采集日志到kafka因为单条日志数据非常大同步失败的踩坑带来的思考
本次遇到的问题描述,日志采集同步时,当单条日志(日志文件中一行日志)超过2M大小,数据无法采集同步到kafka,分析后,共踩到如下几个坑.1.flume采集时,通过shell+EXEC(tail -F ...
- 泛函编程(36)-泛函Stream IO:IO数据源-IO Source & Sink
上期我们讨论了IO处理过程:Process[I,O].我们说Process就像电视信号盒子一样有输入端和输出端两头.Process之间可以用一个Process的输出端与另一个Process的输入端连接 ...
- 把Flume的Source设置为 Spooling directory source
把Flume的Source设置为 Spooling directory source,在设定的目录下放置需要读取的文件,一些文件在读取过程中会报错. 文件格式和报错如下: 实验一 读取汉子和“:&qu ...
- flume http source示例讲解
一.介绍 flume自带的Http Source可以通过Http Post接收事件. 场景:对于有些应用程序环境,它可能不能部署Flume SDK及其依赖项,或客户端代码倾向于通过HTTP而不是Flu ...
- Redis的Python实践,以及四中常用应用场景详解——学习董伟明老师的《Python Web开发实践》
首先,简单介绍:Redis是一个基于内存的键值对存储系统,常用作数据库.缓存和消息代理. 支持:字符串,字典,列表,集合,有序集合,位图(bitmaps),地理位置,HyperLogLog等多种数据结 ...
- Flume组件source,channel,sink源码分析
LifeCycleState: IDLE, START, STOP, ERROR [Source]: org.apache.flume.Source 继承LifeCycleAware{stop() + ...
- Flume笔记--source端监听目录,sink端上传到HDFS
官方文档参数解释:http://flume.apache.org/FlumeUserGuide.html#hdfs-sink 需要注意:文件格式,fileType=DataStream 默认为Sequ ...
随机推荐
- 并发编程需要场景化:CountDownLatch
- pom.xml管理jar包——安全性框架配置文件
<properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> &l ...
- 前端小插件之手写js循环滚动特效
很多前端都离不开滚动的特效,调用插件繁琐,后期更改麻烦,考虑到这些因素,自己写了一套无限循环滚动的小特效. 首先滚动特效很好写,用css就可以完成,下面写一个基础css向上循环滚动特效 html &l ...
- JS中var声明与function声明以及构造函数声明方式的区别
JS中常见的三种函数声明(statement)方式有这三种: // 函数表达式(function expression) var h = function () { // h } // 函数声明(fu ...
- 中国工业的下一个十年在哪里?APS系统或将引领智能化转型
为什么众多的ERP软件公司没有推出相关产品,当然可以肯定的是并非客户没有此观念,如果一定要说,也只能说目前的需求还不是非常强烈,从ERP厂商非常急切的与APS公司合作,甚至有高价购买APS公司代码的情 ...
- proxychains4配置使用
一丶安装 sudo apt-get install proxychains4 二丶修改配置文件 sudo vim /etc/proxychains.conf 在文本最后加上你的代理服务器地址,如果有用 ...
- Python Django 实现简单注册功能
Python Django 实现简单注册功能 项目创建略,可参考前期文档介绍. 目录结构如下 编辑views.py from django.shortcuts import render # Crea ...
- (原)ubuntu中C++调用libotrch
转载请注明出处: https://www.cnblogs.com/darkknightzh/p/11479240.html 参考网址: https://pytorch.org/tutorials/ad ...
- Designing Data-Intensive Applications笔记
<Designing Data-Intensive Applications>书看完很久了,前段时间陈皓来公司技术分享也推荐了这本书.读起来酣畅淋漓,写篇系统总结的意愿强烈,无耐内容属实太 ...
- MySQL | MySQL 数据库系统(一)
## 1.什么是 MySQL 数据库? MySQL 数据库是一个关系型数据库管理系统,是服务器领域中受欢迎的开源数据库系统,目前有 Oracle 公司主要负责运营与维护: ## 2.MySQL 数据库 ...