数据源Source

RPC异构流数据交换

  • Avro Source
  • Thrift Source

文件或目录变化监听

  • Exec Source
  • Spooling Directory Source
  • Taildir Source

MQ或队列订阅数据持续监听

  • JMS Source
  • SSL and JMS Source
  • Kafka Source

Network类数据交换

  • NetCat TCP Source
  • NetCat UDP Source
  • HTTP Source
  • Syslog Sources
  • Syslog TCP Source
  • Multiport Syslog TCP Source
  • Syslog UDP Source

定制源

  • Custom Source

Sink

  • HDFS Sink
  • Hive Sink
  • Logger Sink
  • Avro Sink
  • Thrift Sink
  • IRC Sink
  • File Roll Sink
  • HBaseSinks
  • HBaseSink
  • HBase2Sink
  • AsyncHBaseSink
  • MorphlineSolrSink
  • ElasticSearchSink
  • Kite Dataset Sink
  • Kafka Sink
  • HTTP Sink
  • Custom Sink

案例

1、监听文件变化 

exec-memory-logger.properties

#指定agent的sources,sinks,channels
a1.sources = s1
a1.sinks = k1
a1.channels = c1 #配置sources属性
a1.sources.s1.type = exec
a1.sources.s1.command = tail -F /tmp/log.txt
a1.sources.s1.shell = /bin/bash -c
a1.sources.s1.channels = c1 #配置sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = 192.168.1.103
a1.sinks.k1.port = 8888
a1.sinks.k1.batch-size = 1
a1.sinks.k1.channel = c1 #配置channel类型
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

启动

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/exec-memory-logger.properties --name a1 -Dflume.root.logger=INFO,console

测试

echo "asfsafsf" >> /tmp/log.txt

2、TCP NetCat监听

netcat.properties
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444 # Describe the sink
a1.sinks.k1.type = logger # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/netcat.properties --name a1 -Dflume.root.logger=INFO,console

测试

telnet localhost 44444

3、Kafka读、写 (读:从kafka到log,写:从file到kafka)

read-kafka.properties 、write-kafka.properties

#指定agent的sources,sinks,channels
a1.sources = s1
a1.sinks = k1
a1.channels = c1 #配置sources属性
a1.sources.s1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.s1.channels = c1
a1.sources.s1.batchSize = 5000
a1.sources.s1.batchDurationMillis = 2000
a1.sources.s1.kafka.bootstrap.servers = 192.168.1.103:9092
a1.sources.s1.kafka.topics = test1
a1.sources.s1.kafka.consumer.group.id = custom.g.id #将sources与channels进行绑定
a1.sources.s1.channels = c1 #配置sink
a1.sinks.k1.type = logger #将sinks与channels进行绑定
a1.sinks.k1.channel = c1 #配置channel类型
a1.channels.c1.type = memory
a1.sources = s1
a1.channels = c1
a1.sinks = k1 a1.sources.s1.type=exec
a1.sources.s1.command=tail -F /tmp/kafka.log
a1.sources.s1.channels=c1 #设置Kafka接收器
a1.sinks.k1.type= org.apache.flume.sink.kafka.KafkaSink
#设置Kafka地址
a1.sinks.k1.brokerList=192.168.1.103:9092
#设置发送到Kafka上的主题
a1.sinks.k1.topic=test1
#设置序列化方式
a1.sinks.k1.serializer.class=kafka.serializer.StringEncoder
a1.sinks.k1.channel=c1 a1.channels.c1.type=memory
a1.channels.c1.capacity=10000
a1.channels.c1.transactionCapacity=100

启动

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/read-kafka.properties --name a1 -Dflume.root.logger=INFO,console

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/write-kafka.properties --name a1 -Dflume.root.logger=INFO,console

测试

# 创建用于测试主题
bin/kafka-topics.sh --create \
--bootstrap-server 192.168.1.103:9092 \
--replication-factor 1 \
--partitions 1 \
--topic test1
# 启动 Producer,用于发送测试数据:
bin/kafka-console-producer.sh --broker-list 192.168.1.103:9092 --topic test1

4、定制源

a1.sources = r1
a1.channels = c1
a1.sources.r1.type = org.example.MySource
a1.sources.r1.channels = c1

5、HDFS Sink

spooling-memory-hdfs.properties ,监听目录变化,将新建的文件传到HDFS

#指定agent的sources,sinks,channels
a1.sources = s1
a1.sinks = k1
a1.channels = c1 #配置sources属性
a1.sources.s1.type =spooldir
a1.sources.s1.spoolDir =/tmp/log2
a1.sources.s1.basenameHeader = true
a1.sources.s1.basenameHeaderKey = fileName
#将sources与channels进行绑定
a1.sources.s1.channels =c1 #配置sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H/
a1.sinks.k1.hdfs.filePrefix = %{fileName}
#生成的文件类型,默认是Sequencefile,可用DataStream,则为普通文本
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.useLocalTimeStamp = true
#将sinks与channels进行绑定
a1.sinks.k1.channel = c1 #配置channel类型
a1.channels.c1.type = memory

测试

hdfs dfs -ls /flume/events/19-11-21/15

6、Hive Sink

a1.channels = c1
a1.channels.c1.type = memory
a1.sinks = k1
a1.sinks.k1.type = hive
a1.sinks.k1.channel = c1
a1.sinks.k1.hive.metastore = thrift://127.0.0.1:9083
a1.sinks.k1.hive.database = logsdb
a1.sinks.k1.hive.table = weblogs
a1.sinks.k1.hive.partition = asia,%{country},%y-%m-%d-%H-%M
a1.sinks.k1.useLocalTimeStamp = false
a1.sinks.k1.round = true
a1.sinks.k1.roundValue = 10
a1.sinks.k1.roundUnit = minute
a1.sinks.k1.serializer = DELIMITED
a1.sinks.k1.serializer.delimiter = "\t"
a1.sinks.k1.serializer.serdeSeparator = '\t'
a1.sinks.k1.serializer.fieldnames =id,,msg

7、Avro Source、Avro Sink

exec-memory-avro.properties、avro-memory-log.properties

#指定agent的sources,sinks,channels
a1.sources = s1
a1.sinks = k1
a1.channels = c1 #配置sources属性
a1.sources.s1.type = exec
a1.sources.s1.command = tail -F /tmp/log.txt
a1.sources.s1.shell = /bin/bash -c
a1.sources.s1.channels = c1 #配置sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = 192.168.1.103
a1.sinks.k1.port = 8888
a1.sinks.k1.batch-size = 1
a1.sinks.k1.channel = c1 #配置channel类型
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
#指定agent的sources,sinks,channels
a2.sources = s2
a2.sinks = k2
a2.channels = c2 #配置sources属性
a2.sources.s2.type = avro
a2.sources.s2.bind = 192.168.1.103
a2.sources.s2.port = 8888 #将sources与channels进行绑定
a2.sources.s2.channels = c2 #配置sink
a2.sinks.k2.type = logger #将sinks与channels进行绑定
a2.sinks.k2.channel = c2 #配置channel类型
a2.channels.c2.type = memory
a2.channels.c2.capacity = 1000
a2.channels.c2.transactionCapacity = 100

启动


flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/avro-memory-log.properties --name a2 -Dflume.root.logger=INFO,console

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/exec-memory-avro.properties --name a1 -Dflume.root.logger=INFO,console

测试,使用一个Avro客户端发送数据

import org.apache.flume.Event;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.event.EventBuilder;
import org.apache.flume.api.SecureRpcClientFactory;
import org.apache.flume.api.RpcClientConfigurationConstants;
import org.apache.flume.api.RpcClient;
import java.nio.charset.Charset;
import java.util.Properties; public class MyApp {
public static void main(String[] args) {
MySecureRpcClientFacade client = new MySecureRpcClientFacade();
// Initialize client with the remote Flume agent's host, port
Properties props = new Properties();
props.setProperty(RpcClientConfigurationConstants.CONFIG_CLIENT_TYPE, "thrift");
props.setProperty("hosts", "h1");
props.setProperty("hosts.h1", "client.example.org"+":"+ String.valueOf(8888)); // Initialize client with the kerberos authentication related properties
props.setProperty("kerberos", "true");
props.setProperty("client-principal", "flumeclient/client.example.org@EXAMPLE.ORG");
props.setProperty("client-keytab", "/tmp/flumeclient.keytab");
props.setProperty("server-principal", "flume/server.example.org@EXAMPLE.ORG");
client.init(props); // Send 10 events to the remote Flume agent. That agent should be
// configured to listen with an AvroSource.
String sampleData = "Hello Flume!";
for (int i = 0; i < 10; i++) {
client.sendDataToFlume(sampleData);
} client.cleanUp();
}
} class MySecureRpcClientFacade {
private RpcClient client;
private Properties properties; public void init(Properties properties) {
// Setup the RPC connection
this.properties = properties;
// Create the ThriftSecureRpcClient instance by using SecureRpcClientFactory
this.client = SecureRpcClientFactory.getThriftInstance(properties);
} public void sendDataToFlume(String data) {
// Create a Flume Event object that encapsulates the sample data
Event event = EventBuilder.withBody(data, Charset.forName("UTF-8")); // Send the event
try {
client.append(event);
} catch (EventDeliveryException e) {
// clean up and recreate the client
client.close();
client = null;
client = SecureRpcClientFactory.getThriftInstance(properties);
}
} public void cleanUp() {
// Close the RPC connection
client.close();
}
}

8、Elasticsearch Sink

a1.channels = c1
a1.sinks = k1
a1.sinks.k1.type = elasticsearch
a1.sinks.k1.hostNames = 127.0.0.1:9200,127.0.0.2:9300
a1.sinks.k1.indexName = foo_index
a1.sinks.k1.indexType = bar_type
a1.sinks.k1.clusterName = foobar_cluster
a1.sinks.k1.batchSize = 500
a1.sinks.k1.ttl = 5d
a1.sinks.k1.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchDynamicSerializer
a1.sinks.k1.channel = c1

9、定制Source、Sink开发

public class MySink extends AbstractSink implements Configurable {
private String myProp; @Override
public void configure(Context context) {
String myProp = context.getString("myProp", "defaultValue"); // Process the myProp value (e.g. validation) // Store myProp for later retrieval by process() method
this.myProp = myProp;
} @Override
public void start() {
// Initialize the connection to the external repository (e.g. HDFS) that
// this Sink will forward Events to ..
} @Override
public void stop () {
// Disconnect from the external respository and do any
// additional cleanup (e.g. releasing resources or nulling-out
// field values) ..
} @Override
public Status process() throws EventDeliveryException {
Status status = null; // Start transaction
Channel ch = getChannel();
Transaction txn = ch.getTransaction();
txn.begin();
try {
// This try clause includes whatever Channel operations you want to do Event event = ch.take(); // Send the Event to the external repository.
// storeSomeData(e); txn.commit();
status = Status.READY;
} catch (Throwable t) {
txn.rollback(); // Log exception, handle individual exceptions as needed status = Status.BACKOFF; // re-throw all Errors
if (t instanceof Error) {
throw (Error)t;
}
}
return status;
}
}
public class MySource extends AbstractSource implements Configurable, PollableSource {
private String myProp; @Override
public void configure(Context context) {
String myProp = context.getString("myProp", "defaultValue"); // Process the myProp value (e.g. validation, convert to another type, ...) // Store myProp for later retrieval by process() method
this.myProp = myProp;
} @Override
public void start() {
// Initialize the connection to the external client
} @Override
public void stop () {
// Disconnect from external client and do any additional cleanup
// (e.g. releasing resources or nulling-out field values) ..
} @Override
public Status process() throws EventDeliveryException {
Status status = null; try {
// This try clause includes whatever Channel/Event operations you want to do // Receive new data
Event e = getSomeData(); // Store the Event into this Source's associated Channel(s)
getChannelProcessor().processEvent(e); status = Status.READY;
} catch (Throwable t) {
// Log exception, handle individual exceptions as needed status = Status.BACKOFF; // re-throw all Errors
if (t instanceof Error) {
throw (Error)t;
}
} finally {
txn.close();
}
return status;
}
}

Flume的Source、Sink总结,及常用使用场景的更多相关文章

  1. Flume:source和sink

    Flume – 初识flume.source和sink 目录基本概念常用源 Source常用sink 基本概念  什么叫flume? 分布式,可靠的大量日志收集.聚合和移动工具.  events ...

  2. FLUME KAFKA SOURCE 和 SINK 使用同一个 TOPIC

    FLUME KAFKA SOURCE 和 SINK 使用同一个 TOPIC 最近做了一个事情,过滤下kakfa中的数据后,做这个就用到了flume,直接使用flume source 和 flume s ...

  3. 一次flume exec source采集日志到kafka因为单条日志数据非常大同步失败的踩坑带来的思考

    本次遇到的问题描述,日志采集同步时,当单条日志(日志文件中一行日志)超过2M大小,数据无法采集同步到kafka,分析后,共踩到如下几个坑.1.flume采集时,通过shell+EXEC(tail -F ...

  4. 泛函编程(36)-泛函Stream IO:IO数据源-IO Source & Sink

    上期我们讨论了IO处理过程:Process[I,O].我们说Process就像电视信号盒子一样有输入端和输出端两头.Process之间可以用一个Process的输出端与另一个Process的输入端连接 ...

  5. 把Flume的Source设置为 Spooling directory source

    把Flume的Source设置为 Spooling directory source,在设定的目录下放置需要读取的文件,一些文件在读取过程中会报错. 文件格式和报错如下: 实验一 读取汉子和“:&qu ...

  6. flume http source示例讲解

    一.介绍 flume自带的Http Source可以通过Http Post接收事件. 场景:对于有些应用程序环境,它可能不能部署Flume SDK及其依赖项,或客户端代码倾向于通过HTTP而不是Flu ...

  7. Redis的Python实践,以及四中常用应用场景详解——学习董伟明老师的《Python Web开发实践》

    首先,简单介绍:Redis是一个基于内存的键值对存储系统,常用作数据库.缓存和消息代理. 支持:字符串,字典,列表,集合,有序集合,位图(bitmaps),地理位置,HyperLogLog等多种数据结 ...

  8. Flume组件source,channel,sink源码分析

    LifeCycleState: IDLE, START, STOP, ERROR [Source]: org.apache.flume.Source 继承LifeCycleAware{stop() + ...

  9. Flume笔记--source端监听目录,sink端上传到HDFS

    官方文档参数解释:http://flume.apache.org/FlumeUserGuide.html#hdfs-sink 需要注意:文件格式,fileType=DataStream 默认为Sequ ...

随机推荐

  1. moodle3.7中文语言包

    Moodle官方有中文语言包,但是还有没有翻译的,为了提高用户体验,可以将部分未翻译的应用在Moodle网站管理中自己修改. 具体步骤: 先确定需要修改的关键字,也就是网站中没有翻译成中文的文字 在c ...

  2. 左右对齐Justify遇到的坑

    遇到的问题 这两天在开发一个病历的对外展示页面,设计稿上label是左右拉伸对齐的,显示效果如下: 怎么实现这种效果呢? 首先想到的是文字中间加空格,但是这种方式太low了,而且不太容易控制.网上查资 ...

  3. Object.assign的使用

    语法: Object.assign(target, ...sources)//target目标对象,sources源对象,返回值目标对象 使用说明: 如果目标对象中的属性具有相同的键,则属性将被源对象 ...

  4. wamp不能使用phpmyadmin,提示“You don't have permission to access /phpmyadmin/ on this server.”

    当你安装完成wamp后,打开localhost或ip时发现已经可以运行了 但想使用phpmyadmin时,发现提示如下内容: You don't have permission to access / ...

  5. CentOS7 firewalld防火墙规则

    在CentOS7里有几种防火墙共存:firewalld.iptables.ebtables,默认是使用firewalld来管理netfilter子系统,不过底层调用的命令仍然是iptables等. f ...

  6. Mysql-修改用户连接数据库IP地址和用户名

    将用户连接数据库(5.7.14-7)的IP地址从 10.10.5.16   修改为 10.11.4.197 Mysql> rename user 'username'@'10.10.5.16' ...

  7. Idea中Spring整合MyBatis框架中配置文件中对象注入问题解决方案

    运行环境:Spring框架整合MaBitis框架 问题叙述: 在Spring配置文件applicationContext-mybatis.xml中配置好mybatis之后 <?xml versi ...

  8. 使用Qemu运行Ubuntu文件系统 —— 搭建SVE学习环境(2)

    开发环境 PC:ubuntu18.04 Qemu:4.1 Kernel:Linux-5.2 概述 由于要学习ARM的SVE技术,但是目前还没有支持SVE指令的板子,所以只能用Qemu来模拟,但是发现Q ...

  9. UNITY Serializer 序列化 横向对比

    UNITY Serializer 序列化 横向对比 关于序列化,无论是.net还是unity自身都提供了一定保障.然而人总是吃着碗里想着锅里,跑去github挖个宝是常有的事.看看各家大佬的本事.最有 ...

  10. yum lockfile is held by another process

    使用yum安装软件报错 yum lockfile is held by another process 解决方法 rm -f /var/run/yum.pid