数据源Source

RPC异构流数据交换

Avro Source
Thrift Source

文件或目录变化监听

Exec Source
Spooling Directory Source
Taildir Source

MQ或队列订阅数据持续监听

JMS Source
SSL and JMS Source
Kafka Source

Network类数据交换

NetCat TCP Source
NetCat UDP Source
HTTP Source
Syslog Sources
Syslog TCP Source
Multiport Syslog TCP Source
Syslog UDP Source

定制源

Custom Source

Sink

HDFS Sink
Hive Sink
Logger Sink
Avro Sink
Thrift Sink
IRC Sink
File Roll Sink
HBaseSinks
HBaseSink
HBase2Sink
AsyncHBaseSink
MorphlineSolrSink
ElasticSearchSink
Kite Dataset Sink
Kafka Sink
HTTP Sink
Custom Sink

案例

1、监听文件变化

exec-memory-logger.properties

#指定agent的sources,sinks,channels

a1.sources = s1

a1.sinks = k1

a1.channels = c1

#配置sources属性

a1.sources.s1.type = exec

a1.sources.s1.command = tail -F /tmp/log.txt

a1.sources.s1.shell = /bin/bash -c

a1.sources.s1.channels = c1

#配置sink

a1.sinks.k1.type = avro

a1.sinks.k1.hostname = 192.168.1.103

a1.sinks.k1.port = 8888

a1.sinks.k1.batch-size = 1

a1.sinks.k1.channel = c1

#配置channel类型

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

启动

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/exec-memory-logger.properties --name a1 -Dflume.root.logger=INFO,console

测试

echo "asfsafsf" >> /tmp/log.txt

2、TCP NetCat监听

netcat.properties

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = netcat

a1.sources.r1.bind = localhost

a1.sources.r1.port = 44444

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

启动

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/netcat.properties --name a1 -Dflume.root.logger=INFO,console

测试

telnet localhost 44444

3、Kafka读、写（读：从kafka到log，写：从file到kafka）

read-kafka.properties 、write-kafka.properties

#指定agent的sources,sinks,channels

a1.sources = s1

a1.sinks = k1

a1.channels = c1  

#配置sources属性

a1.sources.s1.type = org.apache.flume.source.kafka.KafkaSource

a1.sources.s1.channels = c1

a1.sources.s1.batchSize = 5000

a1.sources.s1.batchDurationMillis = 2000

a1.sources.s1.kafka.bootstrap.servers = 192.168.1.103:9092

a1.sources.s1.kafka.topics = test1

a1.sources.s1.kafka.consumer.group.id = custom.g.id

#将sources与channels进行绑定

a1.sources.s1.channels = c1

#配置sink

a1.sinks.k1.type = logger

#将sinks与channels进行绑定

a1.sinks.k1.channel = c1  

#配置channel类型

a1.channels.c1.type = memory

a1.sources = s1

a1.channels = c1

a1.sinks = k1                                                                                         

a1.sources.s1.type=exec

a1.sources.s1.command=tail -F /tmp/kafka.log

a1.sources.s1.channels=c1 

#设置Kafka接收器

a1.sinks.k1.type= org.apache.flume.sink.kafka.KafkaSink

#设置Kafka地址

a1.sinks.k1.brokerList=192.168.1.103:9092

#设置发送到Kafka上的主题

a1.sinks.k1.topic=test1

#设置序列化方式

a1.sinks.k1.serializer.class=kafka.serializer.StringEncoder

a1.sinks.k1.channel=c1     

a1.channels.c1.type=memory

a1.channels.c1.capacity=10000

a1.channels.c1.transactionCapacity=100

启动

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/read-kafka.properties --name a1 -Dflume.root.logger=INFO,console

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/write-kafka.properties --name a1 -Dflume.root.logger=INFO,console

测试

# 创建用于测试主题

bin/kafka-topics.sh --create \

                    --bootstrap-server 192.168.1.103:9092 \

                    --replication-factor 1 \

                    --partitions 1  \

                    --topic test1

# 启动 Producer,用于发送测试数据：

bin/kafka-console-producer.sh --broker-list 192.168.1.103:9092 --topic test1

4、定制源

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = org.example.MySource

a1.sources.r1.channels = c1

5、HDFS Sink

spooling-memory-hdfs.properties ，监听目录变化，将新建的文件传到HDFS

#指定agent的sources,sinks,channels

a1.sources = s1

a1.sinks = k1

a1.channels = c1  

#配置sources属性

a1.sources.s1.type =spooldir

a1.sources.s1.spoolDir =/tmp/log2

a1.sources.s1.basenameHeader = true

a1.sources.s1.basenameHeaderKey = fileName

#将sources与channels进行绑定

a1.sources.s1.channels =c1 

#配置sink

a1.sinks.k1.type = hdfs

a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H/

a1.sinks.k1.hdfs.filePrefix = %{fileName}

#生成的文件类型，默认是Sequencefile，可用DataStream，则为普通文本

a1.sinks.k1.hdfs.fileType = DataStream

a1.sinks.k1.hdfs.useLocalTimeStamp = true

#将sinks与channels进行绑定

a1.sinks.k1.channel = c1

#配置channel类型

a1.channels.c1.type = memory

测试

hdfs dfs -ls /flume/events/19-11-21/15

6、Hive Sink

a1.channels = c1

a1.channels.c1.type = memory

a1.sinks = k1

a1.sinks.k1.type = hive

a1.sinks.k1.channel = c1

a1.sinks.k1.hive.metastore = thrift://127.0.0.1:9083

a1.sinks.k1.hive.database = logsdb

a1.sinks.k1.hive.table = weblogs

a1.sinks.k1.hive.partition = asia,%{country},%y-%m-%d-%H-%M

a1.sinks.k1.useLocalTimeStamp = false

a1.sinks.k1.round = true

a1.sinks.k1.roundValue = 10

a1.sinks.k1.roundUnit = minute

a1.sinks.k1.serializer = DELIMITED

a1.sinks.k1.serializer.delimiter = "\t"

a1.sinks.k1.serializer.serdeSeparator = '\t'

a1.sinks.k1.serializer.fieldnames =id,,msg

7、Avro Source、Avro Sink

exec-memory-avro.properties、avro-memory-log.properties

#指定agent的sources,sinks,channels

a1.sources = s1

a1.sinks = k1

a1.channels = c1

#配置sources属性

a1.sources.s1.type = exec

a1.sources.s1.command = tail -F /tmp/log.txt

a1.sources.s1.shell = /bin/bash -c

a1.sources.s1.channels = c1

#配置sink

a1.sinks.k1.type = avro

a1.sinks.k1.hostname = 192.168.1.103

a1.sinks.k1.port = 8888

a1.sinks.k1.batch-size = 1

a1.sinks.k1.channel = c1

#配置channel类型

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

#指定agent的sources,sinks,channels

a2.sources = s2

a2.sinks = k2

a2.channels = c2

#配置sources属性

a2.sources.s2.type = avro

a2.sources.s2.bind = 192.168.1.103

a2.sources.s2.port = 8888

#将sources与channels进行绑定

a2.sources.s2.channels = c2

#配置sink

a2.sinks.k2.type = logger

#将sinks与channels进行绑定

a2.sinks.k2.channel = c2

#配置channel类型

a2.channels.c2.type = memory

a2.channels.c2.capacity = 1000

a2.channels.c2.transactionCapacity = 100

启动

先

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/avro-memory-log.properties --name a2 -Dflume.root.logger=INFO,console

后

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/exec-memory-avro.properties --name a1 -Dflume.root.logger=INFO,console

测试，使用一个Avro客户端发送数据

import org.apache.flume.Event;

import org.apache.flume.EventDeliveryException;

import org.apache.flume.event.EventBuilder;

import org.apache.flume.api.SecureRpcClientFactory;

import org.apache.flume.api.RpcClientConfigurationConstants;

import org.apache.flume.api.RpcClient;

import java.nio.charset.Charset;

import java.util.Properties;

public class MyApp {

  public static void main(String[] args) {

    MySecureRpcClientFacade client = new MySecureRpcClientFacade();

    // Initialize client with the remote Flume agent's host, port

    Properties props = new Properties();

    props.setProperty(RpcClientConfigurationConstants.CONFIG_CLIENT_TYPE, "thrift");

    props.setProperty("hosts", "h1");

    props.setProperty("hosts.h1", "client.example.org"+":"+ String.valueOf(8888));

    // Initialize client with the kerberos authentication related properties

    props.setProperty("kerberos", "true");

    props.setProperty("client-principal", "flumeclient/client.example.org@EXAMPLE.ORG");

    props.setProperty("client-keytab", "/tmp/flumeclient.keytab");

    props.setProperty("server-principal", "flume/server.example.org@EXAMPLE.ORG");

    client.init(props);

    // Send 10 events to the remote Flume agent. That agent should be

    // configured to listen with an AvroSource.

    String sampleData = "Hello Flume!";

    for (int i = 0; i < 10; i++) {

      client.sendDataToFlume(sampleData);

    }

    client.cleanUp();

  }

}

class MySecureRpcClientFacade {

  private RpcClient client;

  private Properties properties;

  public void init(Properties properties) {

    // Setup the RPC connection

    this.properties = properties;

    // Create the ThriftSecureRpcClient instance by using SecureRpcClientFactory

    this.client = SecureRpcClientFactory.getThriftInstance(properties);

  }

  public void sendDataToFlume(String data) {

    // Create a Flume Event object that encapsulates the sample data

    Event event = EventBuilder.withBody(data, Charset.forName("UTF-8"));

    // Send the event

    try {

      client.append(event);

    } catch (EventDeliveryException e) {

      // clean up and recreate the client

      client.close();

      client = null;

      client = SecureRpcClientFactory.getThriftInstance(properties);

    }

  }

  public void cleanUp() {

    // Close the RPC connection

    client.close();

  }

}

8、Elasticsearch Sink

a1.channels = c1

a1.sinks = k1

a1.sinks.k1.type = elasticsearch

a1.sinks.k1.hostNames = 127.0.0.1:9200,127.0.0.2:9300

a1.sinks.k1.indexName = foo_index

a1.sinks.k1.indexType = bar_type

a1.sinks.k1.clusterName = foobar_cluster

a1.sinks.k1.batchSize = 500

a1.sinks.k1.ttl = 5d

a1.sinks.k1.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchDynamicSerializer

a1.sinks.k1.channel = c1

9、定制Source、Sink开发

public class MySink extends AbstractSink implements Configurable {

  private String myProp;

  @Override

  public void configure(Context context) {

    String myProp = context.getString("myProp", "defaultValue");

    // Process the myProp value (e.g. validation)

    // Store myProp for later retrieval by process() method

    this.myProp = myProp;

  }

  @Override

  public void start() {

    // Initialize the connection to the external repository (e.g. HDFS) that

    // this Sink will forward Events to ..

  }

  @Override

  public void stop () {

    // Disconnect from the external respository and do any

    // additional cleanup (e.g. releasing resources or nulling-out

    // field values) ..

  }

  @Override

  public Status process() throws EventDeliveryException {

    Status status = null;

    // Start transaction

    Channel ch = getChannel();

    Transaction txn = ch.getTransaction();

    txn.begin();

    try {

      // This try clause includes whatever Channel operations you want to do

      Event event = ch.take();

      // Send the Event to the external repository.

      // storeSomeData(e);

      txn.commit();

      status = Status.READY;

    } catch (Throwable t) {

      txn.rollback();

      // Log exception, handle individual exceptions as needed

      status = Status.BACKOFF;

      // re-throw all Errors

      if (t instanceof Error) {

        throw (Error)t;

      }

    }

    return status;

  }

}

public class MySource extends AbstractSource implements Configurable, PollableSource {

  private String myProp;

  @Override

  public void configure(Context context) {

    String myProp = context.getString("myProp", "defaultValue");

    // Process the myProp value (e.g. validation, convert to another type, ...)

    // Store myProp for later retrieval by process() method

    this.myProp = myProp;

  }

  @Override

  public void start() {

    // Initialize the connection to the external client

  }

  @Override

  public void stop () {

    // Disconnect from external client and do any additional cleanup

    // (e.g. releasing resources or nulling-out field values) ..

  }

  @Override

  public Status process() throws EventDeliveryException {

    Status status = null;

    try {

      // This try clause includes whatever Channel/Event operations you want to do

      // Receive new data

      Event e = getSomeData();

      // Store the Event into this Source's associated Channel(s)

      getChannelProcessor().processEvent(e);

      status = Status.READY;

    } catch (Throwable t) {

      // Log exception, handle individual exceptions as needed

      status = Status.BACKOFF;

      // re-throw all Errors

      if (t instanceof Error) {

        throw (Error)t;

      }

    } finally {

      txn.close();

    }

    return status;

  }

}

Flume的Source、Sink总结，及常用使用场景的更多相关文章

Flume：source和sink
Flume – 初识flume.source和sink 目录基本概念常用源 Source常用sink 基本概念  什么叫flume? 分布式,可靠的大量日志收集.聚合和移动工具.  events ...
FLUME KAFKA SOURCE 和 SINK 使用同一个 TOPIC
FLUME KAFKA SOURCE 和 SINK 使用同一个 TOPIC 最近做了一个事情,过滤下kakfa中的数据后,做这个就用到了flume,直接使用flume source 和 flume s ...
一次flume exec source采集日志到kafka因为单条日志数据非常大同步失败的踩坑带来的思考
本次遇到的问题描述,日志采集同步时,当单条日志(日志文件中一行日志)超过2M大小,数据无法采集同步到kafka,分析后,共踩到如下几个坑.1.flume采集时,通过shell+EXEC(tail -F ...
泛函编程（36）－泛函Stream IO：IO数据源－IO Source & Sink
上期我们讨论了IO处理过程:Process[I,O].我们说Process就像电视信号盒子一样有输入端和输出端两头.Process之间可以用一个Process的输出端与另一个Process的输入端连接 ...
把Flume的Source设置为 Spooling directory source
把Flume的Source设置为 Spooling directory source,在设定的目录下放置需要读取的文件,一些文件在读取过程中会报错. 文件格式和报错如下: 实验一读取汉子和“:&qu ...
flume http source示例讲解
一.介绍 flume自带的Http Source可以通过Http Post接收事件. 场景:对于有些应用程序环境,它可能不能部署Flume SDK及其依赖项,或客户端代码倾向于通过HTTP而不是Flu ...
Redis的Python实践，以及四中常用应用场景详解——学习董伟明老师的《Python Web开发实践》
首先,简单介绍:Redis是一个基于内存的键值对存储系统,常用作数据库.缓存和消息代理. 支持:字符串,字典,列表,集合,有序集合,位图(bitmaps),地理位置,HyperLogLog等多种数据结 ...
Flume组件source，channel，sink源码分析
LifeCycleState: IDLE, START, STOP, ERROR [Source]: org.apache.flume.Source 继承LifeCycleAware{stop() + ...
Flume笔记--source端监听目录，sink端上传到HDFS
官方文档参数解释:http://flume.apache.org/FlumeUserGuide.html#hdfs-sink 需要注意:文件格式,fileType=DataStream 默认为Sequ ...

随机推荐

并发编程需要场景化：CountDownLatch
pom.xml管理jar包——安全性框架配置文件
<properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> &l ...
前端小插件之手写js循环滚动特效
很多前端都离不开滚动的特效,调用插件繁琐,后期更改麻烦,考虑到这些因素,自己写了一套无限循环滚动的小特效. 首先滚动特效很好写,用css就可以完成,下面写一个基础css向上循环滚动特效 html &l ...
JS中var声明与function声明以及构造函数声明方式的区别
JS中常见的三种函数声明(statement)方式有这三种: // 函数表达式(function expression) var h = function () { // h } // 函数声明(fu ...
中国工业的下一个十年在哪里？APS系统或将引领智能化转型
为什么众多的ERP软件公司没有推出相关产品,当然可以肯定的是并非客户没有此观念,如果一定要说,也只能说目前的需求还不是非常强烈,从ERP厂商非常急切的与APS公司合作,甚至有高价购买APS公司代码的情 ...
proxychains4配置使用
一丶安装 sudo apt-get install proxychains4 二丶修改配置文件 sudo vim /etc/proxychains.conf 在文本最后加上你的代理服务器地址,如果有用 ...
Python Django 实现简单注册功能
Python Django 实现简单注册功能项目创建略,可参考前期文档介绍. 目录结构如下编辑views.py from django.shortcuts import render # Crea ...
（原）ubuntu中C++调用libotrch
转载请注明出处: https://www.cnblogs.com/darkknightzh/p/11479240.html 参考网址: https://pytorch.org/tutorials/ad ...
Designing Data-Intensive Applications笔记
<Designing Data-Intensive Applications>书看完很久了,前段时间陈皓来公司技术分享也推荐了这本书.读起来酣畅淋漓,写篇系统总结的意愿强烈,无耐内容属实太 ...
MySQL | MySQL 数据库系统（一）
## 1.什么是 MySQL 数据库? MySQL 数据库是一个关系型数据库管理系统,是服务器领域中受欢迎的开源数据库系统,目前有 Oracle 公司主要负责运营与维护: ## 2.MySQL 数据库 ...

Flume的Source、Sink总结，及常用使用场景