Flume具体应用(多案例)

日志采集

对于flume的原理其实很容易理解，我们更应该掌握flume的具体使用方法，flume提供了大量内置的Source、Channel和Sink类型。而且不同类型的Source、Channel和Sink可以自由组合—–组合方式基于用户设置的配置文件，非常灵活。比如：Channel可以把事件暂存在内存里，也可以持久化到本地硬盘上。Sink可以把日志写入HDFS, HBase，甚至是另外一个Source等等。下面我将用具体的案例详述flume的具体用法。
其实flume的用法很简单—-书写一个配置文件，在配置文件当中描述source、channel与sink的具体实现，而后运行一个agent实例，在运行agent实例的过程中会读取配置文件的内容，这样flume就会采集到数据。
配置文件的编写原则：
1>从整体上描述代理agent中sources、sinks、channels所涉及到的组件
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
2>详细描述agent中每一个source、sink与channel的具体实现：即在描述source的时候，需要
指定source到底是什么类型的，即这个source是接受文件的、还是接受http的、还是接受thrift
的；对于sink也是同理，需要指定结果是输出到HDFS中，还是Hbase中啊等等；对于channel
需要指定是内存啊，还是数据库啊，还是文件啊等等。
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

3>通过channel将source与sink连接起来
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent的shell操作：
flume-ng agent -n a1 -c ../conf -f ../conf/example.file
-Dflume.root.logger=DEBUG,console

参数说明： -n 指定agent名称(与配置文件中代理的名字相同)
-c 指定flume中配置文件的目录
-f 指定配置文件
-Dflume.root.logger=DEBUG,console 设置日志等级

案例1

NetCat Source：监听一个指定的网络端口，即只要应用程序向这个端口里面写数据，这个source组件就可以获取到信息。其中 Sink：logger Channel：memory
flume官网中NetCat Source描述

Property Name Default Description
channels –
type – The component type name, needs to be netcat
bind – 日志需要发送到的主机名或者Ip地址，该主机运行着netcat类型的source在监听
port – 日志需要发送到的端口号，该端口号要有netcat类型的source在监听

编写配置文件

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
 
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = 192.168.1.246
a1.sources.r1.port = 44444
 
# Describe the sink
a1.sinks.k1.type = logger
 
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
 
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent

flume-ng agent -n a1 -c ../conf -f netcat.conf -Dflume.root.logger=DEBUG,console

使用telnet发送数据

[root@node-247 ~]# telnet 192.168.1.246 44444
Trying 192.168.1.246...
Connected to 192.168.1.246.
Escape character is '^]'.
111111
OK

在agent节点查看输出

18/08/01 17:32:21 INFO sink.LoggerSink: Event: { headers:{} body: 31 31 31 31 31 31 0D 111111. }

案例2：

NetCat Source：监听一个指定的网络端口，即只要应用程序向这个端口里面写数据，这个source组件就可以获取到信息。其中 Sink：hdfs Channel：file (相比于案例1的两个变化)
flume官网中HDFS Sink的描述：

编写配置文件

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
 
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = 192.168.1.246
a1.sources.r1.port = 44444
 
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://node-231:8020/user/hdfs/flume/netcat
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.rollInterval = 10
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S
a1.sinks.k1.hdfs.useLocalTimeStamp = true
 
# Use a channel which buffers events in file
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /usr/flume/checkpoint
a1.channels.c1.dataDirs = /usr/flume/data
 
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent

flume-ng agent -n a1 -c ../conf -f netcat2hdfs.conf -Dflume.root.logger=DEBUG,console

telnet发送数据

[root@node-247 ~]# telnet 192.168.1.246 44444
Trying 192.168.1.246...
Connected to 192.168.1.246.
Escape character is '^]'.
write to hdfs
OK

Agent节点日志信息

18/08/01 17:39:28 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
18/08/01 17:39:28 INFO hdfs.BucketWriter: Creating hdfs://node-231:8020/user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959.tmp
18/08/01 17:39:39 INFO hdfs.BucketWriter: Closing hdfs://node-231:8020/user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959.tmp
18/08/01 17:39:39 INFO hdfs.BucketWriter: Renaming hdfs://node-231:8020/user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959.tmp to hdfs://node-231:8020/user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959
18/08/01 17:39:39 INFO hdfs.HDFSEventSink: Writer callback called.
18/08/01 17:39:53 INFO file.EventQueueBackingStoreFile: Start checkpoint for /usr/flume/checkpoint/checkpoint_1533116333069, elements to sync = 1
18/08/01 17:39:53 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1533116333542, queueSize: 0, queueHead: 0
18/08/01 17:39:53 INFO file.Log: Updated checkpoint for file: /usr/flume/data/log-1 position: 171 logWriteOrderID: 1533116333542

写入成功，验证

[root@node-235 bin]# hadoop fs -ls /user/hdfs/flume/netcat/
Found 1 items
-rw-r--r-- 3 root hdfs 15 2018-08-01 17:39 /user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959
 
[root@node-235 bin]# hadoop fs -cat /user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959
write to hdfs

再次telnet发送数据会发现HDFS目录下会有两个数据文件

案例3：

Spooling Directory Source：监听一个指定的目录，即只要应用程序向这个指定的目录中添加新的文件，source组件就可以获取到该信息，并解析该文件的内容，然后写入到channle。写入完成后，标记该文件已完成或者删除该文件。其中 Sink：logger Channel：memory
flume官网中Spooling Directory Source描述：
Property Name Default Description
channels –
type – The component type name, needs to be spooldir.
spoolDir – Spooling Directory Source监听的目录
fileSuffix .COMPLETED 文件内容写入到channel之后，标记该文件
deletePolicy never 文件内容写入到channel之后的删除策略: never or immediate
fileHeader false Whether to add a header storing the absolute path filename.
ignorePattern ^$ Regular expression specifying which files to ignore (skip)
interceptors – 指定传输中event的head(头信息)，常用timestamp

Spooling Directory Source的两个注意事项：
①If a file is written to after being placed into the spooling directory, Flume will print an error to its log file and stop processing.
即：拷贝到spool目录下的文件不可以再打开编辑
②If a file name is reused at a later time, Flume will print an error to its log file and stop processing.
即：不能将具有相同文件名字的文件拷贝到这个目录下

编写配置文件

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
 
# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /usr/local/test/datainput
a1.sources.r1.fileHeader = true
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = timestamp
 
# Describe the sink
a1.sinks.k1.type = logger
 
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
 
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent

flume-ng agent -n a1 -c ../conf -f spool.conf -Dflume.root.logger=DEBUG,console

在该路径下放入测试文件，内容为hello spool

cp test.txt datainput/

agent日志

18/08/01 17:52:48 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /usr/local/test/datainput/test.txt to /usr/local/test/datainput/test.txt.COMPLETED
18/08/01 17:52:48 INFO sink.LoggerSink: Event: { headers:{file=/usr/local/test/datainput/test.txt, timestamp=1533117168275} body: 68 65 6C 6C 6F 20 73 70 6F 6F 6C hello spool }

从控制台显示的结果可以看出event的头信息中包含了时间戳信息。
同时我们查看一下Spooling Directory中的datafile信息—-文件内容写入到channel之后，该文件被标记了

[root@node-246 test]# ls datainput/
test.txt.COMPLETED

案例4：

Spooling Directory Source：监听一个指定的目录，即只要应用程序向这个指定的目录中添加新的文件，source组件就可以获取到该信息，并解析该文件的内容，然后写入到channle。写入完成后，标记该文件已完成或者删除该文件。其中 Sink：hdfs Channel：file (相比于案例3的两个变化)
编写配置文件

#name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
 
# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /usr/local/test/datainput
a1.sources.r1.fileHeader = true
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = timestamp
 
# Describe the sink
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://node-231:8020/user/hdfs/flume/spool/dataoutput
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.rollInterval = 10
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S
a1.sinks.k1.hdfs.useLocalTimeStamp = true
 
# Use a channel which buffers events in file
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /usr/flume/checkpoint
a1.channels.c1.dataDirs = /usr/flume/data
 
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent

flume-ng agent -n a1 -c ../conf -f spool2hdfs.conf -Dflume.root.logger=DEBUG,console

向datainput下放入新文件test1.txt

cp test1.txt datainput

agent日志

18/08/01 17:58:41 INFO file.EventQueueBackingStoreFile: Start checkpoint for /usr/flume/checkpoint/checkpoint_1533116333069, elements to sync = 1
18/08/01 17:58:41 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1533117491901, queueSize: 0, queueHead: 0
18/08/01 17:58:42 INFO file.Log: Updated checkpoint for file: /usr/flume/data/log-3 position: 241 logWriteOrderID: 1533117491901
18/08/01 17:58:42 INFO file.LogFile: Closing RandomReader /usr/flume/data/log-1
18/08/01 17:58:42 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.
18/08/01 17:58:42 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.
18/08/01 17:58:43 INFO hdfs.BucketWriter: Closing hdfs://node-231:8020/user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457.tmp
18/08/01 17:58:43 INFO hdfs.BucketWriter: Renaming hdfs://node-231:8020/user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457.tmp to hdfs://node-231:8020/user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457
18/08/01 17:58:43 INFO hdfs.HDFSEventSink: Writer callback called.
18/08/01 17:58:32 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /usr/local/test/datainput/test1.txt to /usr/local/test/datainput/test1.txt.COMPLETED
18/08/01 17:58:32 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.
18/08/01 17:58:32 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
18/08/01 17:58:32 INFO hdfs.BucketWriter: Creating hdfs://node-231:8020/user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457.tmp

查看HDFS

[root@node-235 bin]# hadoop fs -ls /user/hdfs/flume/spool/dataoutput
Found 1 items
-rw-r--r-- 3 root hdfs 12 2018-08-01 17:58 /user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457
[root@node-235 bin]# hadoop fs -cat /user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457
hello spool

查看datainput下的文件状态

[root@node-246 test]# ls datainput/
test1.txt.COMPLETED test.txt.COMPLETED

案例5：

Exec Source：监听一个指定的命令，获取一条命令的结果作为它的数据源
常用的是tail -F file指令，即只要应用程序向日志(文件)里面写数据，source组件就可以获取到日志(文件)中最新的内容。其中 Sink：hdfs Channel：file
这个案列为了方便显示Exec Source的运行效果，结合Hive中的external table进行来说明。
编写配置文件

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
 
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /usr/local/test/log.file
 
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://node-231:8020/user/hdfs/flume/exec/dataoutput
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.rollInterval = 10
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S
a1.sinks.k1.hdfs.useLocalTimeStamp = true
 
# Use a channel which buffers events in file
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /usr/flume/checkpoint
a1.channels.c1.dataDirs = /usr/flume/data
 
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

创建hive外部表

create external table flume_exec_table
(info String)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
location '/user/hdfs/flume/exec/dataoutput'

启动agent

flume-ng agent -n a1 -c ../conf -f exec.conf -Dflume.root.logger=DEBUG,console

使用echo命令向/usr/local/test/log.file中写入数据

echo firstline=1 >> /usr/local/test/log.file

查看hive中的数据

总结Exec source：Exec source和Spooling Directory Source是两种常用的日志采集的方式，其中Exec source可以实现对日志的实时采集，Spooling Directory Source在对日志的实时采集上稍有欠缺，尽管Exec source可以实现对日志的实时采集，但是当Flume不运行或者指令执行出错时，Exec source将无法收集到日志数据，日志会出现丢失，从而无法保证收集日志的完整性。

案例6：

Avro Source：监听一个指定的Avro 端口，通过Avro 端口可以获取到Avro client发送过来的文件。即只要应用程序通过Avro 端口发送文件，source组件就可以获取到该文件中的内容。其中 Sink：hdfs Channel：file
(注：Avro和Thrift都是一些序列化的网络端口–通过这些网络端口可以接受或者发送信息，Avro可以发送一个给定的文件给Flume，Avro 源使用AVRO RPC机制)
Avro Source运行原理如下图：

flume官网中Avro Source的描述：
Property Name Default Description
channels –
type – The component type name, needs to be avro
bind – 日志需要发送到的主机名或者ip，该主机运行着ARVO类型的source
port – 日志需要发送到的端口号，该端口要有ARVO类型的source在监听

编写配置文件

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
 
# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.bind = 192.168.1.246
a1.sources.r1.port = 4141
 
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.rollInterval = 10
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S
a1.sinks.k1.hdfs.useLocalTimeStamp = true
 
# Use a channel which buffers events in file
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /usr/flume/checkpoint
a1.channels.c1.dataDirs = /usr/flume/data
 
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent

flume-ng agent -n a1 -c ../conf -f avro.conf -Dflume.root.logger=DEBUG,console

使用avro-client发送文件

flume-ng avro-client -c ../conf -H 192.168.1.246 -p 4141 -F /usr/local/test/log.file

agent日志如下

18/08/02 09:56:14 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 => /192.168.1.246:4141] OPEN
18/08/02 09:56:14 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 => /192.168.1.246:4141] BOUND: /192.168.1.246:4141
18/08/02 09:56:14 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 => /192.168.1.246:4141] CONNECTED: /192.168.1.246:43750
18/08/02 09:56:15 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 :> /192.168.1.246:4141] DISCONNECTED
18/08/02 09:56:15 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 :> /192.168.1.246:4141] UNBOUND
18/08/02 09:56:15 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 :> /192.168.1.246:4141] CLOSED
18/08/02 09:56:15 INFO ipc.NettyServer: Connection to /192.168.1.246:43750 disconnected.
18/08/02 09:56:18 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
18/08/02 09:56:18 INFO hdfs.BucketWriter: Creating hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446.tmp
18/08/02 09:56:19 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
18/08/02 09:56:19 INFO hdfs.BucketWriter: Creating hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087.tmp
18/08/02 09:56:22 INFO file.EventQueueBackingStoreFile: Start checkpoint for /usr/flume/checkpoint/checkpoint_1533116333069, elements to sync = 3
18/08/02 09:56:22 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1533174892909, queueSize: 0, queueHead: 1
18/08/02 09:56:23 INFO file.Log: Updated checkpoint for file: /usr/flume/data/log-5 position: 361 logWriteOrderID: 1533174892909
18/08/02 09:56:23 INFO file.LogFile: Closing RandomReader /usr/flume/data/log-3
18/08/02 09:56:29 INFO hdfs.BucketWriter: Closing hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446.tmp
18/08/02 09:56:29 INFO hdfs.BucketWriter: Renaming hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446.tmp to hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446
18/08/02 09:56:29 INFO hdfs.HDFSEventSink: Writer callback called.
18/08/02 09:56:29 INFO hdfs.BucketWriter: Closing hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087.tmp
18/08/02 09:56:29 INFO hdfs.BucketWriter: Renaming hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087.tmp to hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087
18/08/02 09:56:29 INFO hdfs.HDFSEventSink: Writer callback called.

查看HDFS下文件

[root@node-231 ~]# hadoop fs -ls /user/hdfs/flume/avro/dataoutput/
Found 2 items
-rw-r--r-- 3 root hdfs 12 2018-08-02 09:56 /user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446
-rw-r--r-- 3 root hdfs 25 2018-08-02 09:56 /user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087

案例7：

syslogtcp
Syslogtcp监听TCP的端口做为数据源

agent配置文件如下

a1.sources = r1
a1.sinks = k1
a1.channels = c1
 
# Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5140
a1.sources.r1.host = 192.168.1.246
a1.sources.r1.channels = c1
 
# Describe the sink
a1.sinks.k1.type = logger
 
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
 
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent

flume-ng agent -n a1 -c ../conf -f syslogtcp.conf -Dflume.root.logger=DEBUG,console

产生测试syslog

echo "test syslogtcp" | nc 192.168.1.246 5140

agent日志

18/08/02 11:13:48 WARN source.SyslogUtils: Event created from Invalid Syslog data.
18/08/02 11:13:49 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 74 65 73 74 20 73 79 73 6C 6F 67 74 63 70 test syslogtcp }

案例8：

JSONHandler
创建agent配置文件

a1.sources = r1
a1.sinks = k1
a1.channels = c1
 
# Describe/configure the source
a1.sources.r1.type = org.apache.flume.source.http.HTTPSource
a1.sources.r1.host = 192.168.1.246
a1.sources.r1.port = 8888
a1.sources.r1.channels = c1
 
# Describe the sink
a1.sinks.k1.type = logger
 
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
 
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent

flume-ng agent -n a1 -c ../conf -f httpsource.conf -Dflume.root.logger=DEBUG,console

生成JSON格式的POST request

curl -X POST -d '[{ "headers" :{"a" : "a1","b" : "b1"},"body" : "idoall.org_body"}]' http://192.168.1.246:8888

agent日志

18/08/02 11:28:34 INFO sink.LoggerSink: Event: { headers:{a=a1, b=b1} body: 69 64 6F 61 6C 6C 2E 6F 72 67 5F 62 6F 64 79 idoall.org_body }

案例9：

File Roll Sink
"file_roll"表示将数据存入本地文件系统

创建配置文件

a1.sources = r1
a1.sinks = k1
a1.channels = c1
 
# Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5555
a1.sources.r1.host = 192.168.1.246
a1.sources.r1.channels = c1
 
# Describe the sink
a1.sinks.k1.type = file_roll
a1.sinks.k1.sink.directory = /usr/local/test/fileroll
 
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
 
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent

flume-ng agent -n a1 -c ../conf -f fileroll.conf -Dflume.root.logger=DEBUG,console

测试产生log

echo "hello idoall.org syslog" | nc 192.168.1.246 5555

agent日志

18/08/02 11:53:34 WARN source.SyslogUtils: Event created from Invalid Syslog data.

查看/usr/local/test/fileroll目录下文件

[root@node-246 fileroll]# ls
1533181857932-1 1533181857932-2 1533181857932-3 1533181857932-4 1533181857932-5 1533181857932-6 1533181857932-7
[root@node-246 fileroll]# cat 1533181857932-6
hello idoall.org syslog

案例10

Replicating Channel Selector
　Flume支持Fan out流从一个源到多个通道。有两种模式的Fan out，分别是复制和复用。在复制的情况下，流的事件被发送到所有的配置通道。在复用的情况下，事件被发送到可用的渠道中的一个子集。Fan out流需要指定源和Fan out通道的规则。
创建replicating_Channel_Selector配置文件

a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2
 
# Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5140
a1.sources.r1.host = 192.168.1.246
a1.sources.r1.channels = c1 c2
a1.sources.r1.selector.type = replicating
 
# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = 192.168.1.246
a1.sinks.k1.port = 5555
 
a1.sinks.k2.type = avro
a1.sinks.k2.channel = c2
a1.sinks.k2.hostname = 192.168.1.247
a1.sinks.k2.port = 5555
 
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
 
a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

创建replicating_Channel_Selector_avro配置文件

a1.sources = r1
a1.sinks = k1
a1.channels = c1
 
# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 192.168.1.246
a1.sources.r1.port = 5555
 
# Describe the sink
a1.sinks.k1.type = logger
 
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
 
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

将这两个配置文件拷贝到另一台机器247上，并修改配置中的IP

scp replicating_Channel_Selector* root@node-247:/usr/local/test/flume/

打开四个窗口，分别启动两个agent

flume-ng agent -n a1 -c ../conf -f replicating_Channel_Selector_avro.conf -Dflume.root.logger=DEBUG,console
flume-ng agent -n a1 -c ../conf -f replicating_Channel_Selector.conf -Dflume.root.logger=DEBUG,console

测试产生syslog

echo "hello idoall.org syslog" | nc 192.168.1.246 5140

agent日志

18/08/02 14:09:53 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 68 65 6C 6C 6F 20 69 64 6F 61 6C 6C 2E 6F 72 67 hello idoall.org }

案例11

Multiplexing Channel Selector
新建Multiplexing_Channel_Selector配置文件

a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2
 
# Describe/configure the source
a1.sources.r1.type = org.apache.flume.source.http.HTTPSource
a1.sources.r1.host = 192.168.1.246
a1.sources.r1.port = 5140
a1.sources.r1.channels = c1 c2
a1.sources.r1.selector.type = multiplexing
 
a1.sources.r1.selector.header = type
#映射允许每个值通道可以重叠。默认值可以包含任意数量的通道。
a1.sources.r1.selector.mapping.baidu = c1
a1.sources.r1.selector.mapping.ali = c2
a1.sources.r1.selector.default = c1
 
# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = 192.168.1.246
a1.sinks.k1.port = 5555
 
a1.sinks.k2.type = avro
a1.sinks.k2.channel = c2
a1.sinks.k2.hostname = 192.168.1.247
a1.sinks.k2.port = 5555
 
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
 
a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

新建Multiplexing_Channel_Selector_avro配置文件

a1.sources = r1
a1.sinks = k1
a1.channels = c1
 
# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 192.168.1.246
a1.sources.r1.port = 5555
 
# Describe the sink
a1.sinks.k1.type = logger
 
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
 
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

将配置文件拷贝到另一个节点，并修改为对应IP

scp Multiplexing_Channel_Selector* root@192.168.1.247:/usr/local/test/flume/

开启四个窗口，246 247分别两个，分别启动agent

flume-ng agent -n a1 -c ../conf -f Multiplexing_Channel_Selector_avro.conf -Dflume.root.logger=DEBUG,console
flume-ng agent -n a1 -c ../conf -f Multiplexing_Channel_Selector.conf -Dflume.root.logger=DEBUG,console

任意节点上，测试产生syslog

curl -X POST -d '[{ "headers" :{"type" : "baidu"},"body" : "idoall_TEST1"}]' http://192.168.1.247:5140 && curl -X POST -d '[{ "headers" :{"type" : "ali"},"body" : "idoall_TEST2"}]' http://192.168.1.247:5140 && curl -X POST -d '[{ "headers" :{"type" : "qq"},"body" : "idoall_TEST3"}]' http://192.168.1.246:5140

agent日志
246上

18/08/02 14:36:04 INFO sink.LoggerSink: Event: { headers:{type=qq} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 33 idoall_TEST3 }
18/08/02 14:36:04 INFO sink.LoggerSink: Event: { headers:{type=baidu} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 31 idoall_TEST1 }

247上

18/08/02 14:36:06 INFO sink.LoggerSink: Event: { headers:{type=ali} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 32 idoall_TEST2 }

可以看到，根据header中不同的条件分布到不同的channel上

案例12

Flume Sink Procesors
Failover的机器是一直发送给其中一个sink，当这个sink不可用的时候，自动发送到下一个sink

创建Flume_Sink_Processors配置文件

a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2
 
#这个是配置failover的关键，需要有一个sink group
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
#处理的类型是failover
a1.sinkgroups.g1.processor.type = failover
#优先级，数字越大优先级越高，每个sink的优先级必须不相同
a1.sinkgroups.g1.processor.priority.k1 = 5
a1.sinkgroups.g1.processor.priority.k2 = 10
#设置为10秒，当然可以根据你的实际状况更改成更快或者很慢
a1.sinkgroups.g1.processor.maxpenalty = 10000
 
# Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.host = 192.168.1.246
a1.sources.r1.port = 5140
a1.sources.r1.channels = c1 c2
a1.sources.r1.selector.type = replicating
 
# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = 192.168.1.246
a1.sinks.k1.port = 5555
 
a1.sinks.k2.type = avro
a1.sinks.k2.channel = c2
a1.sinks.k2.hostname = 192.168.1.247
a1.sinks.k2.port = 5555
 
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
 
a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

新建Flume_Sink_Processors_avro配置文件

a1.sources = r1
a1.sinks = k1
a1.channels = c1
 
# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 192.168.1.246
a1.sources.r1.port = 5555
 
# Describe the sink
a1.sinks.k1.type = logger
 
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
 
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

将这两个文件拷贝到247节点，并修改对应host

scp Flume_Sink_Processors* root@192.168.1.247:/usr/local/test/flume/

开启四个窗口，分别启动两个agent

flume-ng agent -n a1 -c ../conf -f Flume_Sink_Processors_avro.conf -Dflume.root.logger=DEBUG,console
flume-ng agent -n a1 -c ../conf -f Flume_Sink_Processors.conf -Dflume.root.logger=DEBUG,console

测试产生log

echo "idoall.org test1 failover" | nc 192.168.1.246 5140

因为247的优先级高，所以在247的sink窗口，可以看到日志

18/08/02 15:47:44 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 31 idoall.org test1 }

这时停掉247的sink（Ctrl+c）,再次输出测试数据

echo "idoall.org test1 failover" | nc 192.168.1.246 5140

可以看到246的sink日志

18/08/02 15:51:23 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 31 idoall.org test1 }

案例13

Load balancing Sink Processor
load balance type和failover不同的地方是，load balance有两个配置，一个是轮询，一个是随机。两种情况下如果被选择的sink不可用，就会自动尝试发送到下一个可用的sink上面。
新建Load_balancing_Sink_Processors配置文件

a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1
 
#这个是配置Load balancing的关键，需要有一个sink group
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinkgroups.g1.processor.type = load_balance
a1.sinkgroups.g1.processor.backoff = true
a1.sinkgroups.g1.processor.selector = round_robin
 
# Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.host = 192.168.1.246
a1.sources.r1.port = 5140
a1.sources.r1.channels = c1
 
# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = 192.168.1.246
a1.sinks.k1.port = 5555
 
a1.sinks.k2.type = avro
a1.sinks.k2.channel = c1
a1.sinks.k2.hostname = 192.168.1.247
a1.sinks.k2.port = 5555
 
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

新建Load_balancing_Sink_Processors_arvo配置文件

a1.sources = r1
a1.sinks = k1
a1.channels = c1
 
# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 192.168.1.246
a1.sources.r1.port = 5555
 
# Describe the sink
a1.sinks.k1.type = logger
 
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
 
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

将这两个文件拷贝到247节点下，并修改IP

scp Load_balancing_Sink_Processors* root@192.168.1.247:/usr/local/test/flume/

启动四个窗口，启动四个agent

flume-ng agent -n a1 -c ../conf -f Load_balancing_Sink_Processors_avro.conf -Dflume.root.logger=DEBUG,console

测试产生log

[root@node-246 ~]# echo "idoall.org test1" | nc 192.168.1.246 5140
[root@node-246 ~]# echo "idoall.org test2" | nc 192.168.1.246 5140
[root@node-246 ~]# echo "idoall.org test3" | nc 192.168.1.246 5140
[root@node-246 ~]# echo "idoall.org test4" | nc 192.168.1.246 5140

247日志

18/08/02 18:36:20 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 31 idoall.org test1 }
18/08/02 18:36:35 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 32 idoall.org test2 }
18/08/02 18:36:58 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 34 idoall.org test4 }

246日志

18/08/02 18:36:47 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 33 idoall.org test3 }

案例14

Hbase sink

将hbase lib下文件复制到flume lib下

protobuf-java-2.5.0.jar
hbase-client-0.96.2-hadoop2.jar
hbase-common-0.96.2-hadoop2.jar
hbase-protocol-0.96.2-hadoop2.jar
hbase-server-0.96.2-hadoop2.jar
hbase-hadoop2-compat-0.96.2-hadoop2.jar
hbase-hadoop-compat-0.96.2-hadoop2.jar
htrace-core-2.04.jar

cp protobuf-java-2.5.0.jar hbase-client-1.1.2.2.6.1.0-129.jar hbase-common-1.1.2.2.6.1.0-129.jar hbase-protocol-1.1.2.2.6.1.0-129.jar hbase-server-1.1.2.2.6.1.0-129.jar hbase-hadoop2-compat-1.1.2.2.6.1.0-129.jar hbase-hadoop-compat-1.1.2.2.6.1.0-129.jar htrace-core-3.1.0-incubating.jar /usr/hdp/2.6.1.0-129/flume/lib/

hbase新建表 flume_test 列族name

hbase(main):003:0> create 'flume_test', 'name'
0 row(s) in 2.3900 seconds
=> Hbase::Table - flume_test

新建agent配置文件hbase_simple

a1.sources = r1
a1.sinks = k1
a1.channels = c1
 
# Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5140
a1.sources.r1.host = 192.168.1.246
a1.sources.r1.channels = c1
 
# Describe the sink
a1.sinks.k1.type = logger
a1.sinks.k1.type = hbase
a1.sinks.k1.table = flume_test
a1.sinks.k1.columnFamily = name
a1.sinks.k1.column = message
a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
a1.sinks.k1.channel = memoryChannel
 
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
 
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent

flume-ng agent -n a1 -c ../conf -f hbase_simple.conf -Dflume.root.logger=DEBUG,console

产生测试log

echo "hello zzz.org from flume" | nc 192.168.1.246 5140

agent日志

18/08/03 10:01:07 WARN source.SyslogUtils: Event created from Invalid Syslog data.

查看hbase

hbase(main):006:0> scan 'flume_test'
ROW COLUMN+CELL
1533261667472-IamY4IbgS7-0 column=name:payload, timestamp=1533261670851, value=hello zzz.org from flume
1 row(s) in 0.1130 seconds

案例15

使用flume avro采集平台日志
Agent文件如下，采集完成直接写入HDFS

[root@node-246 flume]# cat avro_tag.conf
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
 
# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.bind = 192.168.1.246
a1.sources.r1.port = 44444
 
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://node-231:8020/user/hdfs/flume/avro_tag/dataoutput
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.rollInterval = 10
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S
a1.sinks.k1.hdfs.useLocalTimeStamp = true
 
# Use a channel which buffers events in file
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /usr/flume/checkpoint
a1.channels.c1.dataDirs = /usr/flume/data
 
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

平台需要引入的包

<dependency>
       <groupId>org.apache.logging.log4j</groupId>
       <artifactId>log4j-flume-ng</artifactId>
       <version>${log4j.version}</version>
</dependency>
<dependency>
       <groupId>org.apache.flume.flume-ng-clients</groupId>
       <artifactId>flume-ng-log4jappender</artifactId>
       <version>1.8.0</version>
</dependency>
<!-- log4j-core -->
<dependency>
       <groupId>org.apache.logging.log4j</groupId>
       <artifactId>log4j-core</artifactId>
       <version>${log4j.version}</version>
</dependency>
<!-- log4j-api -->
<dependency>
       <groupId>org.apache.logging.log4j</groupId>
       <artifactId>log4j-api</artifactId>
       <version>${log4j.version}</version>
</dependency>
<!-- log4j-web -->
<dependency>
       <groupId>org.apache.logging.log4j</groupId>
       <artifactId>log4j-web</artifactId>
       <version>${log4j.version}</version>
</dependency>

log4j2.xml,如下

<?xml version="1.0" encoding="UTF-8"?>
<!--日志级别以及优先级排序: OFF > FATAL > ERROR > WARN > INFO > DEBUG > TRACE > ALL -->
<!--Configuration后面的status，这个用于设置log4j2自身内部的信息输出，可以不设置，当设置成trace时，你会看到log4j2内部各种详细输出-->
<!--monitorInterval：Log4j能够自动检测修改配置 文件和重新配置本身，设置间隔秒数-->
<configuration status="INFO" monitorInterval="30">
       <properties>
        <property name="LOG_HOME">../logs</property>
        <property name="TMP_LOG_FILE_NAME">tmp</property>
        <property name="INFO_LOG_FILE_NAME">info</property>
        <property name="WARN_LOG_FILE_NAME">warn</property>
        <property name="ERROR_LOG_FILE_NAME">error</property>
    </properties>
    <!--先定义所有的appender-->
    <appenders>
           <!--这个输出控制台的配置-->
        <console name="Console" target="SYSTEM_OUT">
        <!--输出日志的格式-->
            <PatternLayout pattern="[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n"/>
        </console>
           <!--文件会打印出所有信息，这个log每次运行程序会自动清空，由append属性决定，这个也挺有用的，适合临时测试用-->
           <File name="log" fileName="${LOG_HOME}/${TMP_LOG_FILE_NAME}.log" append="false">
              <PatternLayout pattern="%d{HH:mm:ss.SSS} %-5level %class{36} %L %M - %msg%xEx%n"/>
           </File>
           <!-- 这个会打印出所有的info及以下级别的信息，每次大小超过size，则这size大小的日志会自动存入按年份-月份建立的文件夹下面并进行压缩，作为存档-->
        <RollingFile name="RollingFileInfo" fileName="${LOG_HOME}/${INFO_LOG_FILE_NAME}.log"
                     filePattern="${LOG_HOME}/${INFO_LOG_FILE_NAME}-%d{yyyy-MM-dd}-%i.log">
            <!--控制台只输出level及以上级别的信息（onMatch），其他的直接拒绝（onMismatch）-->
            <ThresholdFilter level="info" onMatch="ACCEPT" onMismatch="DENY"/>
            <PatternLayout pattern="[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n"/>
            <Policies>
                <TimeBasedTriggeringPolicy/>
                <SizeBasedTriggeringPolicy size="100 MB"/>
            </Policies>
        </RollingFile>
        <RollingFile name="RollingFileWarn" fileName="${LOG_HOME}/${WARN_LOG_FILE_NAME}.log"
                     filePattern="${LOG_HOME}/${WARN_LOG_FILE_NAME}-%d{yyyy-MM-dd}-%i.log">
            <ThresholdFilter level="warn" onMatch="ACCEPT" onMismatch="DENY"/>
            <PatternLayout pattern="[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n"/>
            <Policies>
                <TimeBasedTriggeringPolicy/>
                <SizeBasedTriggeringPolicy size="100 MB"/>
            </Policies>
        <!-- DefaultRolloverStrategy属性如不设置，则默认为最多同一文件夹下7个文件，这里设置了20 -->
            <DefaultRolloverStrategy max="20"/>
        </RollingFile>
        <RollingFile name="RollingFileError" fileName="${LOG_HOME}/${ERROR_LOG_FILE_NAME}.log"
                     filePattern="${LOG_HOME}/${ERROR_LOG_FILE_NAME}-%d{yyyy-MM-dd}-%i.log">
            <ThresholdFilter level="error" onMatch="ACCEPT" onMismatch="DENY"/>
            <PatternLayout pattern="[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n"/>
            <Policies>
                <TimeBasedTriggeringPolicy/>
                <SizeBasedTriggeringPolicy size="100 MB"/>
            </Policies>
        </RollingFile>
        <!-- flume配置 -->
        <Flume name="FlumeAppender" compress="true">
             <Agent host="192.168.1.246" port="44444"/>
             <!-- <RFC5424Layout charset="UTF-8" enterpriseNumber="18060" includeMDC="true" appName="myapp"/> -->
             <PatternLayout charset="GBK" pattern="[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n" />
           </Flume>
    </appenders>
    <!--然后定义logger，只有定义了logger并引入的appender，appender才会生效-->
    <loggers>
        <!--过滤掉spring和mybatis的一些无用的DEBUG信息-->
        <logger name="org.springframework" level="INFO"></logger>
        <logger name="org.mybatis" level="INFO"></logger>
        <!-- <Logger name="sysLog" level="trace">
                 <AppenderRef ref="FlumeAppender"/>
        </Logger> -->
        <root level="info">
            <appender-ref ref="Console"/>
            <appender-ref ref="RollingFileInfo"/>
            <appender-ref ref="RollingFileWarn"/>
            <appender-ref ref="RollingFileError"/>
            <!-- 日志写入flume source -->
            <appenderRef ref="FlumeAppender"/>
        </root>
    </loggers>
</configuration>

启动agent

flume-ng agent -n a1 -c ../conf -f avro_tag.conf -Dflume.root.logger=DEBUG,console

启动项目之后就会将日志信息通过flume写入HDFS