数据源Source

RPC异构流数据交换

  • Avro Source
  • Thrift Source

文件或目录变化监听

  • Exec Source
  • Spooling Directory Source
  • Taildir Source

MQ或队列订阅数据持续监听

  • JMS Source
  • SSL and JMS Source
  • Kafka Source

Network类数据交换

  • NetCat TCP Source
  • NetCat UDP Source
  • HTTP Source
  • Syslog Sources
  • Syslog TCP Source
  • Multiport Syslog TCP Source
  • Syslog UDP Source

定制源

  • Custom Source

Sink

  • HDFS Sink
  • Hive Sink
  • Logger Sink
  • Avro Sink
  • Thrift Sink
  • IRC Sink
  • File Roll Sink
  • HBaseSinks
  • HBaseSink
  • HBase2Sink
  • AsyncHBaseSink
  • MorphlineSolrSink
  • ElasticSearchSink
  • Kite Dataset Sink
  • Kafka Sink
  • HTTP Sink
  • Custom Sink

案例

1、监听文件变化 

exec-memory-logger.properties

#指定agent的sources,sinks,channels
a1.sources = s1
a1.sinks = k1
a1.channels = c1 #配置sources属性
a1.sources.s1.type = exec
a1.sources.s1.command = tail -F /tmp/log.txt
a1.sources.s1.shell = /bin/bash -c
a1.sources.s1.channels = c1 #配置sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = 192.168.1.103
a1.sinks.k1.port = 8888
a1.sinks.k1.batch-size = 1
a1.sinks.k1.channel = c1 #配置channel类型
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

启动

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/exec-memory-logger.properties --name a1 -Dflume.root.logger=INFO,console

测试

echo "asfsafsf" >> /tmp/log.txt

2、TCP NetCat监听

netcat.properties
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444 # Describe the sink
a1.sinks.k1.type = logger # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/netcat.properties --name a1 -Dflume.root.logger=INFO,console

测试

telnet localhost 44444

3、Kafka读、写 (读:从kafka到log,写:从file到kafka)

read-kafka.properties 、write-kafka.properties

#指定agent的sources,sinks,channels
a1.sources = s1
a1.sinks = k1
a1.channels = c1 #配置sources属性
a1.sources.s1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.s1.channels = c1
a1.sources.s1.batchSize = 5000
a1.sources.s1.batchDurationMillis = 2000
a1.sources.s1.kafka.bootstrap.servers = 192.168.1.103:9092
a1.sources.s1.kafka.topics = test1
a1.sources.s1.kafka.consumer.group.id = custom.g.id #将sources与channels进行绑定
a1.sources.s1.channels = c1 #配置sink
a1.sinks.k1.type = logger #将sinks与channels进行绑定
a1.sinks.k1.channel = c1 #配置channel类型
a1.channels.c1.type = memory
a1.sources = s1
a1.channels = c1
a1.sinks = k1 a1.sources.s1.type=exec
a1.sources.s1.command=tail -F /tmp/kafka.log
a1.sources.s1.channels=c1 #设置Kafka接收器
a1.sinks.k1.type= org.apache.flume.sink.kafka.KafkaSink
#设置Kafka地址
a1.sinks.k1.brokerList=192.168.1.103:9092
#设置发送到Kafka上的主题
a1.sinks.k1.topic=test1
#设置序列化方式
a1.sinks.k1.serializer.class=kafka.serializer.StringEncoder
a1.sinks.k1.channel=c1 a1.channels.c1.type=memory
a1.channels.c1.capacity=10000
a1.channels.c1.transactionCapacity=100

启动

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/read-kafka.properties --name a1 -Dflume.root.logger=INFO,console

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/write-kafka.properties --name a1 -Dflume.root.logger=INFO,console

测试

# 创建用于测试主题
bin/kafka-topics.sh --create \
--bootstrap-server 192.168.1.103:9092 \
--replication-factor 1 \
--partitions 1 \
--topic test1
# 启动 Producer,用于发送测试数据:
bin/kafka-console-producer.sh --broker-list 192.168.1.103:9092 --topic test1

4、定制源

a1.sources = r1
a1.channels = c1
a1.sources.r1.type = org.example.MySource
a1.sources.r1.channels = c1

5、HDFS Sink

spooling-memory-hdfs.properties ,监听目录变化,将新建的文件传到HDFS

#指定agent的sources,sinks,channels
a1.sources = s1
a1.sinks = k1
a1.channels = c1 #配置sources属性
a1.sources.s1.type =spooldir
a1.sources.s1.spoolDir =/tmp/log2
a1.sources.s1.basenameHeader = true
a1.sources.s1.basenameHeaderKey = fileName
#将sources与channels进行绑定
a1.sources.s1.channels =c1 #配置sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H/
a1.sinks.k1.hdfs.filePrefix = %{fileName}
#生成的文件类型,默认是Sequencefile,可用DataStream,则为普通文本
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.useLocalTimeStamp = true
#将sinks与channels进行绑定
a1.sinks.k1.channel = c1 #配置channel类型
a1.channels.c1.type = memory

测试

hdfs dfs -ls /flume/events/19-11-21/15

6、Hive Sink

a1.channels = c1
a1.channels.c1.type = memory
a1.sinks = k1
a1.sinks.k1.type = hive
a1.sinks.k1.channel = c1
a1.sinks.k1.hive.metastore = thrift://127.0.0.1:9083
a1.sinks.k1.hive.database = logsdb
a1.sinks.k1.hive.table = weblogs
a1.sinks.k1.hive.partition = asia,%{country},%y-%m-%d-%H-%M
a1.sinks.k1.useLocalTimeStamp = false
a1.sinks.k1.round = true
a1.sinks.k1.roundValue = 10
a1.sinks.k1.roundUnit = minute
a1.sinks.k1.serializer = DELIMITED
a1.sinks.k1.serializer.delimiter = "\t"
a1.sinks.k1.serializer.serdeSeparator = '\t'
a1.sinks.k1.serializer.fieldnames =id,,msg

7、Avro Source、Avro Sink

exec-memory-avro.properties、avro-memory-log.properties

#指定agent的sources,sinks,channels
a1.sources = s1
a1.sinks = k1
a1.channels = c1 #配置sources属性
a1.sources.s1.type = exec
a1.sources.s1.command = tail -F /tmp/log.txt
a1.sources.s1.shell = /bin/bash -c
a1.sources.s1.channels = c1 #配置sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = 192.168.1.103
a1.sinks.k1.port = 8888
a1.sinks.k1.batch-size = 1
a1.sinks.k1.channel = c1 #配置channel类型
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
#指定agent的sources,sinks,channels
a2.sources = s2
a2.sinks = k2
a2.channels = c2 #配置sources属性
a2.sources.s2.type = avro
a2.sources.s2.bind = 192.168.1.103
a2.sources.s2.port = 8888 #将sources与channels进行绑定
a2.sources.s2.channels = c2 #配置sink
a2.sinks.k2.type = logger #将sinks与channels进行绑定
a2.sinks.k2.channel = c2 #配置channel类型
a2.channels.c2.type = memory
a2.channels.c2.capacity = 1000
a2.channels.c2.transactionCapacity = 100

启动


flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/avro-memory-log.properties --name a2 -Dflume.root.logger=INFO,console

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/exec-memory-avro.properties --name a1 -Dflume.root.logger=INFO,console

测试,使用一个Avro客户端发送数据

import org.apache.flume.Event;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.event.EventBuilder;
import org.apache.flume.api.SecureRpcClientFactory;
import org.apache.flume.api.RpcClientConfigurationConstants;
import org.apache.flume.api.RpcClient;
import java.nio.charset.Charset;
import java.util.Properties; public class MyApp {
public static void main(String[] args) {
MySecureRpcClientFacade client = new MySecureRpcClientFacade();
// Initialize client with the remote Flume agent's host, port
Properties props = new Properties();
props.setProperty(RpcClientConfigurationConstants.CONFIG_CLIENT_TYPE, "thrift");
props.setProperty("hosts", "h1");
props.setProperty("hosts.h1", "client.example.org"+":"+ String.valueOf(8888)); // Initialize client with the kerberos authentication related properties
props.setProperty("kerberos", "true");
props.setProperty("client-principal", "flumeclient/client.example.org@EXAMPLE.ORG");
props.setProperty("client-keytab", "/tmp/flumeclient.keytab");
props.setProperty("server-principal", "flume/server.example.org@EXAMPLE.ORG");
client.init(props); // Send 10 events to the remote Flume agent. That agent should be
// configured to listen with an AvroSource.
String sampleData = "Hello Flume!";
for (int i = 0; i < 10; i++) {
client.sendDataToFlume(sampleData);
} client.cleanUp();
}
} class MySecureRpcClientFacade {
private RpcClient client;
private Properties properties; public void init(Properties properties) {
// Setup the RPC connection
this.properties = properties;
// Create the ThriftSecureRpcClient instance by using SecureRpcClientFactory
this.client = SecureRpcClientFactory.getThriftInstance(properties);
} public void sendDataToFlume(String data) {
// Create a Flume Event object that encapsulates the sample data
Event event = EventBuilder.withBody(data, Charset.forName("UTF-8")); // Send the event
try {
client.append(event);
} catch (EventDeliveryException e) {
// clean up and recreate the client
client.close();
client = null;
client = SecureRpcClientFactory.getThriftInstance(properties);
}
} public void cleanUp() {
// Close the RPC connection
client.close();
}
}

8、Elasticsearch Sink

a1.channels = c1
a1.sinks = k1
a1.sinks.k1.type = elasticsearch
a1.sinks.k1.hostNames = 127.0.0.1:9200,127.0.0.2:9300
a1.sinks.k1.indexName = foo_index
a1.sinks.k1.indexType = bar_type
a1.sinks.k1.clusterName = foobar_cluster
a1.sinks.k1.batchSize = 500
a1.sinks.k1.ttl = 5d
a1.sinks.k1.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchDynamicSerializer
a1.sinks.k1.channel = c1

9、定制Source、Sink开发

public class MySink extends AbstractSink implements Configurable {
private String myProp; @Override
public void configure(Context context) {
String myProp = context.getString("myProp", "defaultValue"); // Process the myProp value (e.g. validation) // Store myProp for later retrieval by process() method
this.myProp = myProp;
} @Override
public void start() {
// Initialize the connection to the external repository (e.g. HDFS) that
// this Sink will forward Events to ..
} @Override
public void stop () {
// Disconnect from the external respository and do any
// additional cleanup (e.g. releasing resources or nulling-out
// field values) ..
} @Override
public Status process() throws EventDeliveryException {
Status status = null; // Start transaction
Channel ch = getChannel();
Transaction txn = ch.getTransaction();
txn.begin();
try {
// This try clause includes whatever Channel operations you want to do Event event = ch.take(); // Send the Event to the external repository.
// storeSomeData(e); txn.commit();
status = Status.READY;
} catch (Throwable t) {
txn.rollback(); // Log exception, handle individual exceptions as needed status = Status.BACKOFF; // re-throw all Errors
if (t instanceof Error) {
throw (Error)t;
}
}
return status;
}
}
public class MySource extends AbstractSource implements Configurable, PollableSource {
private String myProp; @Override
public void configure(Context context) {
String myProp = context.getString("myProp", "defaultValue"); // Process the myProp value (e.g. validation, convert to another type, ...) // Store myProp for later retrieval by process() method
this.myProp = myProp;
} @Override
public void start() {
// Initialize the connection to the external client
} @Override
public void stop () {
// Disconnect from external client and do any additional cleanup
// (e.g. releasing resources or nulling-out field values) ..
} @Override
public Status process() throws EventDeliveryException {
Status status = null; try {
// This try clause includes whatever Channel/Event operations you want to do // Receive new data
Event e = getSomeData(); // Store the Event into this Source's associated Channel(s)
getChannelProcessor().processEvent(e); status = Status.READY;
} catch (Throwable t) {
// Log exception, handle individual exceptions as needed status = Status.BACKOFF; // re-throw all Errors
if (t instanceof Error) {
throw (Error)t;
}
} finally {
txn.close();
}
return status;
}
}

Flume的Source、Sink总结,及常用使用场景的更多相关文章

  1. Flume:source和sink

    Flume – 初识flume.source和sink 目录基本概念常用源 Source常用sink 基本概念  什么叫flume? 分布式,可靠的大量日志收集.聚合和移动工具.  events ...

  2. FLUME KAFKA SOURCE 和 SINK 使用同一个 TOPIC

    FLUME KAFKA SOURCE 和 SINK 使用同一个 TOPIC 最近做了一个事情,过滤下kakfa中的数据后,做这个就用到了flume,直接使用flume source 和 flume s ...

  3. 一次flume exec source采集日志到kafka因为单条日志数据非常大同步失败的踩坑带来的思考

    本次遇到的问题描述,日志采集同步时,当单条日志(日志文件中一行日志)超过2M大小,数据无法采集同步到kafka,分析后,共踩到如下几个坑.1.flume采集时,通过shell+EXEC(tail -F ...

  4. 泛函编程(36)-泛函Stream IO:IO数据源-IO Source & Sink

    上期我们讨论了IO处理过程:Process[I,O].我们说Process就像电视信号盒子一样有输入端和输出端两头.Process之间可以用一个Process的输出端与另一个Process的输入端连接 ...

  5. 把Flume的Source设置为 Spooling directory source

    把Flume的Source设置为 Spooling directory source,在设定的目录下放置需要读取的文件,一些文件在读取过程中会报错. 文件格式和报错如下: 实验一 读取汉子和“:&qu ...

  6. flume http source示例讲解

    一.介绍 flume自带的Http Source可以通过Http Post接收事件. 场景:对于有些应用程序环境,它可能不能部署Flume SDK及其依赖项,或客户端代码倾向于通过HTTP而不是Flu ...

  7. Redis的Python实践,以及四中常用应用场景详解——学习董伟明老师的《Python Web开发实践》

    首先,简单介绍:Redis是一个基于内存的键值对存储系统,常用作数据库.缓存和消息代理. 支持:字符串,字典,列表,集合,有序集合,位图(bitmaps),地理位置,HyperLogLog等多种数据结 ...

  8. Flume组件source,channel,sink源码分析

    LifeCycleState: IDLE, START, STOP, ERROR [Source]: org.apache.flume.Source 继承LifeCycleAware{stop() + ...

  9. Flume笔记--source端监听目录,sink端上传到HDFS

    官方文档参数解释:http://flume.apache.org/FlumeUserGuide.html#hdfs-sink 需要注意:文件格式,fileType=DataStream 默认为Sequ ...

随机推荐

  1. 彻底搞懂B树、B+树、B*树、R 树

    出处:http://blog.csdn.net/v_JULY_v . 第一节.B树.B+树.B*树1.前言: 动态查找树主要有:二叉查找树(Binary Search Tree),平衡二叉查找树(Ba ...

  2. Mybatis连接MySQL时,可以使用的JDBC连接字符串参数

    一.举例 spring.datasource.url=jdbc:mysql://127.0.0.1:3306/test_db?useAffectedRows=true&allowMultiQu ...

  3. Python学习笔记之replace()

    10-2 可使用方法replace()将字符串中的特定单词都替换为另一个单词. 读取你刚创建的文件learning_python.txt 中的每一行,将其中的Python 都替换为另一门语言的名称,如 ...

  4. Spring扩展点之FactoryBean接口

    前言 首先看一下接口定义 public interface FactoryBean<T> { /** * 返回对象实例 */ @Nullable T getObject() throws ...

  5. ASP.NET Core使用EPPlus导入导出Excel

    开发过程中,经常会遇到导入导出数据的需求,本篇博客介绍在.NET Core中如何使用EPPlus组件导入导出Excel EPPlus: EPPlus是使用Open Office XML格式(xlsx) ...

  6. Java 之 JDK1.8之前日期时间类

    一.JDK1.8之前日期时间类 二. java.lang.System类 System类提供的public static long currentTimeMillis()用来返回当前时间与1970年1 ...

  7. Myapp

    一.github地址:https://github.com/jianghailing/rjgcsecondwork 二.PSP表格: PSP2.1 Personal Software Process ...

  8. Python从零开始——安装与运行

  9. docker gitlab-runner的安装

    参考: Run GitLab Runner in a container 前面介绍了gitlab-ce的安装,下面是gitlab-runner的安装,同样还是安装docker版本. 1.下载 dock ...

  10. CentOS6.7编译安装mysql5.6

    可能因为有各种情况,无法通过yum安装mysql,这里记录一下编译安装的简单步骤 使用yum安装一些依赖 yum -y install make gcc-c++ cmake bison-devel  ...