Source 介绍

!!!1.Avro Source

监听AVRO端口来接受来自外部AVRO客户端的事件流。

利用Avro Source可以实现多级流动、扇出流、扇入流等效果。

另外也可以接受通过flume提供的Avro客户端发送的日志信息。

支持的属性:

!channels –

!type – 类型名称，"AVRO"

!bind – 需要监听的主机名或IP

!port – 要监听的端口

threads – 工作线程最大线程数

selector.type

selector.*

interceptors – 空格分隔的拦截器列表

interceptors.*

compression-type none 压缩类型，可以是“none”或“default”，这个值必须和AvroSource的压缩格式匹配

ssl false 是否启用ssl加密，如果启用还需要配置一个“keystore”和一个“keystore-password”.

keystore – 为SSL提供的 java密钥文件所在路径

keystore-password – 为SSL提供的 java密钥文件密码

keystore-type JKS 密钥库类型可以是 “JKS” 或 “PKCS12”.

exclude-protocols SSLv3 空格分隔开的列表，用来指定在SSL / TLS协议中排除。SSLv3将总是被排除除了所指定的协议。

ipFilter false 如果需要为netty开启ip过滤，将此项设置为true

ipFilterRules – 陪netty的ip过滤设置表达式规则

案例：

编写配置文件：

＃命名Agent a1的组件

a1.sources = r1

a1.sinks = k1

a1.channels = c1

＃描述/配置Source

a1.sources.r1.type = avro

a1.sources.r1.bind = 0.0.0.0

a1.sources.r1.port = 33333

＃描述Sink

a1.sinks.k1.type = logger

＃描述内存Channel

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

＃为Channle绑定Source和Sink

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

启动flume：

./flume-ng agent --conf ../conf --conf-file ../conf/template2.conf --name a1 -Dflume.root.logger=INFO,console

通过flume提供的avro客户端向指定机器指定端口发送日志信息：

./flume-ng avro-client --conf ../conf --host 0.0.0.0 --port 33333 --filename ../mydata/log1.txt

发现确实收集到了日志

2.Exec Source

可以将命令产生的输出作为源

属性说明：

!channels –

!type – 类型名称，需要是"exec"

!command – 要执行的命令

shell – A shell invocation used to run the command. e.g. /bin/sh -c. Required only for commands relying on shell features like wildcards, back ticks, pipes etc.

restartThrottle 10000 毫秒为单位的时间，用来声明等待多久后尝试重试命令

restart false 如果cmd挂了，是否重启cmd

logStdErr false 无论是否是标准错误都该被记录

batchSize 20 同时发送到通道中的最大行数

batchTimeout 3000 如果缓冲区没有满，经过多长时间发送数据

selector.type 复制还是多路复用

selector.* Depends on the selector.type value

interceptors – 空格分隔的拦截器列表

interceptors.*

案例：

编写配置文件：

＃命名Agent a1的组件

a1.sources = r1

a1.sinks = k1

a1.channels = c1

＃描述/配置Source

a1.sources.r1.type = avro

a1.sources.r1.bind = 0.0.0.0

a1.sources.r1.port = 33333

＃描述Sink

a1.sinks.k1.type = logger

＃描述内存Channel

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

＃为Channle绑定Source和Sink

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

启动flume：

./flume-ng agent --conf ../conf --conf-file ../conf/template2.conf --name a1 -Dflume.root.logger=INFO,console

**可以通过tail命令，收集日志文件中后续追加的日志

!!!3.Spooling Directory Source

这个Source允许你将文件将要收集的数据放置到"自动搜集"目录中。这个Source将监视该目录，并将解析新文件的出现。事件处理逻辑是可插拔的，当一个文件被完全读入信道，它会被重命名或可选的直接删除。

要注意的是，放置到自动搜集目录下的文件不能修改，如果修改，则flume会报错。

另外，也不能产生重名的文件，如果有重名的文件被放置进来，则flume会报错。

属性说明：

!channels –

!type – 类型，需要指定为"spooldir"

!spoolDir – 读取文件的路径，即"搜集目录"

fileSuffix .COMPLETED 对处理完成的文件追加的后缀

deletePolicy never 处理完成后是否删除文件，需是"never"或"immediate"

fileHeader false Whether to add a header storing the absolute path filename.

fileHeaderKey file Header key to use when appending absolute path filename to event header.

basenameHeader false Whether to add a header storing the basename of the file.

basenameHeaderKey basename Header Key to use when appending basename of file to event header.

ignorePattern ^$ 正则表达式指定哪些文件需要忽略

trackerDir .flumespool Directory to store metadata related to processing of files. If this path is not an absolute path, then it is interpreted as relative to the spoolDir.

consumeOrder 处理文件的策略，oldest, youngest 或 random。

maxBackoff 4000 The maximum time (in millis) to wait between consecutive attempts to write to the channel(s) if the channel is full. The source will start at a low backoff and increase it exponentially each time the channel throws a ChannelException, upto the value specified by this parameter.

batchSize 100 Granularity at which to batch transfer to the channel

inputCharset UTF-8 读取文件时使用的编码。

decodeErrorPolicy FAIL 当在输入文件中发现无法处理的字符编码时如何处理。FAIL：抛出一个异常而无法解析该文件。REPLACE：用“替换字符”字符，通常是Unicode的U + FFFD更换不可解析角色。忽略：掉落的不可解析的字符序列。

deserializer LINE 声明用来将文件解析为事件的解析器。默认一行为一个事件。处理类必须实现EventDeserializer.Builder接口。

deserializer.* Varies per event deserializer.

bufferMaxLines – (Obselete) This option is now ignored.

bufferMaxLineLength 5000 (Deprecated) Maximum length of a line in the commit buffer. Use deserializer.maxLineLength instead.

selector.type replicating replicating or multiplexing

selector.* Depends on the selector.type value

interceptors – Space-separated list of interceptors

interceptors.*

案例：

编写配置文件：

＃命名Agent a1的组件

a1.sources = r1

a1.sinks = k1

a1.channels = c1

＃描述/配置Source

a1.sources.r1.type = spooldir

a1.sources.r1.spoolDir = /home/park/work/apache-flume-1.6.0-bin/mydata

＃描述Sink

a1.sinks.k1.type = logger

＃描述内存Channel

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

＃为Channle绑定Source和Sink

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

启动flume：

./flume-ng agent --conf ../conf --conf-file ../conf/template4.conf --name a1 -Dflume.root.logger=INFO,console

向指定目录中传输文件，发现flume收集到了该文件，将文件中的每一行都作为日志来处理。

!!!4.NetCat Source

一个NetCat Source用来监听一个指定端口，并将接收到的数据的每一行转换为一个事件。

属性说明：

！channels –

！type – 类型名称，需要被设置为"netcat"

！bind – 指定要绑定到的ip或主机名。

！port – 指定要绑定到的端口号

max-line-length 512 单行最大字节数

ack-every-event true 对于收到的每一个Event是否响应"OK"

selector.type

selector.*

interceptors –

interceptors.*

案例:

参见快速入门案例

5.Sequence Generator Source -- 序列发生器源

一个简单的序列发生器，不断的产生事件，值是从0开始每次递增1。

主要用来进行测试。

参数说明：

!channels –

!type – 类型名称，必须为"seq"

selector.type

selector.*

interceptors –

interceptors.*

batchSize 1

案例：

编写配置文件:

＃命名Agent a1的组件

a1.sources = r1

a1.sinks = k1

a1.channels = c1

＃描述/配置Source

a1.sources.r1.type = seq

＃描述Sink

a1.sinks.k1.type = logger

＃描述内存Channel

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

＃为Channle绑定Source和Sink

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

启动flume:

./flume-ng agent --conf ../conf --conf-file ../conf/template4.conf --name a1 -Dflume.root.logger=INFO,console

发现打印了日志

6.HTTP Source

此Source接受HTTP的GET和POST请求作为Flume的事件。

其中GET方式应该只用于试验。

需要提供一个可插拔的"处理器"来将请求转换为事件对象，这个处理器必须实现HTTPSourceHandler接口

这个处理器接受一个 HttpServletRequest对象，并返回一个Flume Envent对象集合。

从一个HTTP请求中得到的事件将在一个事务中提交到通道中。thus allowing for increased efficiency on channels like the file channel。

如果处理器抛出一个异常，Source将会返回一个400的HTTP状态码。

如果通道已满，无法再将Event加入Channel，则Source返回503的HTTP状态码，表示暂时不可用。

参数说明:

！type 类型，必须为"HTTP"

！port – 监听的端口

bind 0.0.0.0 监听的主机名或ip

handler org.apache.flume.source.http.JSONHandler 处理器类，需要实现HTTPSourceHandler接口

handler.* – 处理器的配置参数

selector.type

selector.*

interceptors –

interceptors.*

enableSSL false 是否开启SSL,如果需要设置为true。注意，HTTP不支持SSLv3。

excludeProtocols SSLv3 空格分隔的要排除的SSL/TLS协议。SSLv3总是被排除的。

keystore 密钥库文件所在位置。

keystorePassword Keystore 密钥库密码

案例:

编写配置文件:

＃命名Agent a1的组件

a1.sources = r1

a1.sinks = k1

a1.channels = c1

＃描述/配置Source

a1.sources.r1.type = http

a1.sources.r1.port = 66666

＃描述Sink

a1.sinks.k1.type = logger

＃描述内存Channel

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

＃为Channle绑定Source和Sink

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

启动flume:

./flume-ng agent --conf ../conf --conf-file ../conf/template6.conf --name a1 -Dflume.root.logger=INFO,console

通过命令发送HTTP请求到指定端口：

curl -X POST -d '[{ "headers" :{"a" : "a1","b" : "b1"},"body" : "hello~http~flume~"}]' http://0.0.0.0:6666

发现flume收集到了日志

常见的Handler：

JSONHandler

可以处理JSON格式的数据，并支持UTF-8 UTF-16 UTF-32字符集

该handler接受Evnet数组，并根据请求头中指定的编码将其转换为Flume Event

如果没有指定编码，默认编码为UTF-8.

JSON格式如下：

[{

"headers" : {

"timestamp" : "434324343",

"host" : "random_host.example.com"

"body" : "random_body"

{

"headers" : {

"namenode" : "namenode.example.com",

"datanode" : "random_datanode.example.com"

"body" : "really_random_body"

}]

To set the charset, the request must have content type specified as application/json;charset=UTF-8 (replace UTF-8 with UTF-16 or UTF-32 as required).

One way to create an event in the format expected by this handler is to use JSONEvent provided in the Flume SDK and use Google Gson to create the JSON string using the Gson#fromJson(Object, Type) method.

Typetype=newTypeToken<List<JSONEvent>>(){}.getType();

BlobHandler

BlobHandler是一种将请求中上传文件信息转化为event的处理器。

参数说明，加！为必须属性：

！handler – The FQCN of this class: org.apache.flume.sink.solr.morphline.BlobHandler

handler.maxBlobLength 100000000 The maximum number of bytes to read and buffer for a given request

7.Custom source -- 自定义源

自定义源是自己实现源接口得到的。

自定义源的类和其依赖包必须在开始时就放置到Flume的类加载目录下。

参数说明，加！为必须属性：

！channels –

！type – 类型，必须设置为自己的自定义处理类的全路径名

selector.type

selector.*

interceptors –

interceptors.*

Source 介绍的更多相关文章

《从0到1学习Flink》—— Data Source 介绍
前言 Data Sources 是什么呢?就字面意思其实就可以知道:数据来源. Flink 做为一款流式计算框架,它可用来做批处理,即处理静态的数据集.历史的数据集:也可以用来做流处理,即实时的处理些 ...
《从0到1学习Flink》—— 介绍Flink中的Stream Windows
前言目前有许多数据分析的场景从批处理到流处理的演变, 虽然可以将批处理作为流处理的特殊情况来处理,但是分析无穷集的流数据通常需要思维方式的转变并且具有其自己的术语(例如,"windowin ...
《从0到1学习Flink》—— Apache Flink 介绍
前言 Flink 是一种流式计算框架,为什么我会接触到 Flink 呢?因为我目前在负责的是监控平台的告警部分,负责采集到的监控数据会直接往 kafka 里塞,然后告警这边需要从 kafka topi ...
《从0到1学习Flink》—— Data Sink 介绍
前言再上一篇文章中 <从0到1学习Flink>-- Data Source 介绍讲解了 Flink Data Source ,那么这里就来讲讲 Flink Data Sink 吧. 首 ...
《从0到1学习Flink》—— 如何自定义 Data Source ？
前言在 <从0到1学习Flink>-- Data Source 介绍文章中,我给大家介绍了 Flink Data Source 以及简短的介绍了一下自定义 Data Source,这篇 ...
Flink 从 0 到 1 学习 —— 如何自定义 Data Source ？
前言在 <从0到1学习Flink>-- Data Source 介绍文章中,我给大家介绍了 Flink Data Source 以及简短的介绍了一下自定义 Data Source,这篇 ...
Flume 概述+环境配置+监听Hive日志信息并写入到hdfs
Flume介绍Flume是Apache基金会组织的一个提供的高可用的,高可靠的,分布式的海量日志采集.聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据:同时,Flume提供 ...
Flume学习之路（三）Flume的配置方式
一.单一代理流配置 1.1 官网介绍 http://flume.apache.org/FlumeUserGuide.html#avro-source 通过一个通道将来源和接收器链接.需要列出源,接收器 ...
《Flink 源码解析》—— 源码编译运行
更新一篇知识星球里面的源码分析文章,去年写的,周末自己录了个视频,大家看下效果好吗?如果好的话,后面补录发在知识星球里面的其他源码解析文章. 前言之前自己本地 clone 了 Flink 的源码,编 ...

随机推荐

SpringMVC全局异常统一处理
SpringMVC全局异常统一处理以及处理顺序最近在使用SpringMVC做全局异常统一处理的时候遇到的问题,就是想把ajax请求和普通的网页请求分开返回json错误信息或者跳转到错误页. 在实际做的 ...
Java的日期与时间 java.time.Duration （转）
一个Duration对象表示两个Instant间的一段时间,是在Java 8中加入的新功能. 一个Duration实例是不可变的,当创建出对象后就不能改变它的值了.你只能通过Duration的计算方法 ...
配置https证书
官网: https://certbot.eff.org/lets-encrypt/ubuntubionic-nginx ssl安装检测工具: https://www.myssl.cn/tools/ch ...
pdf缩略图生成上传解决方案
前言:因自己负责的项目(jetty内嵌启动的SpringMvc)中需要实现文件上传,而自己对java文件上传这一块未接触过,且对 Http 协议较模糊,故这次采用渐进的方式来学习文件上传的原理与实践. ...
LA 6434 The Busiest City dfs
Tree Land Kingdom is a prosperous and lively kingdom. It has N cities which are connected to eachoth ...
（考试大整理~）Xxy 的车厢调度
这一题我以前研究过哈哈哈~ (train.cpp/c/pas) Description 有一个火车站 , 铁路如图所示 ,每辆火车从 A 驶入,再从 B 方向驶出,同时它的车厢可以 ...
牛客网 Wannafly挑战赛3 B.遇见
遇见时间限制:C/C++ 1秒,其他语言2秒空间限制:C/C++ 65536K,其他语言131072K64bit IO Format: %lld 题目描述 A和B在同一条路上,他们之间的距离为 k ...
Complete Tripartite
D - Complete Tripartite 思路:这个题是个染色问题.理解题意就差不多写出来一半了.开始的时候还想用离散化来储存每个点的状态,即它连接的点有哪些,但很无奈,点太多了,long lo ...
Gitlab启动、停止、重启（两种启动方式）
因为Gitlab不是我部署的,是之前总监部署的,服务器突然更新系统了,Git服务器就没有自启··自启··自启······,自己操作启动没有成功,然后网上搜了一下都是这三种启动关闭重启的方式,可是我这里 ...
vue+ts搭建项目
Tip: 为了避免浪费您的时间,本文符合满足以下条件的同学借鉴参考 1.本文模版不适用于小型项目,两三个页面的也没必要用vue2.对typescript.vue全家桶能够掌握和运用此次项目模版主要涉 ...

Source 介绍

!!!1.Avro Source

2.Exec Source

!!!3.Spooling Directory Source

!!!4.NetCat Source

5.Sequence Generator Source -- 序列发生器源

6.HTTP Source

7.Custom source -- 自定义源

Source 介绍的更多相关文章

随机推荐

热门专题