日志采集

对于flume的原理其实很容易理解,我们更应该掌握flume的具体使用方法,flume提供了大量内置的Source、Channel和Sink类型。而且不同类型的Source、Channel和Sink可以自由组合—–组合方式基于用户设置的配置文件,非常灵活。比如:Channel可以把事件暂存在内存里,也可以持久化到本地硬盘上。Sink可以把日志写入HDFS, HBase,甚至是另外一个Source等等。下面我将用具体的案例详述flume的具体用法。
其实flume的用法很简单—-书写一个配置文件,在配置文件当中描述source、channel与sink的具体实现,而后运行一个agent实例,在运行agent实例的过程中会读取配置文件的内容,这样flume就会采集到数据。
配置文件的编写原则:
1>从整体上描述代理agent中sources、sinks、channels所涉及到的组件
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
2>详细描述agent中每一个source、sink与channel的具体实现:即在描述source的时候,需要
指定source到底是什么类型的,即这个source是接受文件的、还是接受http的、还是接受thrift
的;对于sink也是同理,需要指定结果是输出到HDFS中,还是Hbase中啊等等;对于channel
需要指定是内存啊,还是数据库啊,还是文件啊等等。
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

3>通过channel将source与sink连接起来
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent的shell操作:
flume-ng agent -n a1 -c ../conf -f ../conf/example.file
-Dflume.root.logger=DEBUG,console

参数说明: -n 指定agent名称(与配置文件中代理的名字相同)
-c 指定flume中配置文件的目录
-f 指定配置文件
-Dflume.root.logger=DEBUG,console 设置日志等级

案例1

NetCat Source:监听一个指定的网络端口,即只要应用程序向这个端口里面写数据,这个source组件就可以获取到信息。 其中 Sink:logger Channel:memory
flume官网中NetCat Source描述

Property Name Default Description
channels –
type – The component type name, needs to be netcat
bind – 日志需要发送到的主机名或者Ip地址,该主机运行着netcat类型的source在监听
port – 日志需要发送到的端口号,该端口号要有netcat类型的source在监听

编写配置文件

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = 192.168.1.246
a1.sources.r1.port = 44444 # Describe the sink
a1.sinks.k1.type = logger # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent

flume-ng agent -n a1 -c ../conf -f netcat.conf -Dflume.root.logger=DEBUG,console

使用telnet发送数据

[root@node-247 ~]# telnet 192.168.1.246 44444
Trying 192.168.1.246...
Connected to 192.168.1.246.
Escape character is '^]'.
111111
OK

在agent节点查看输出

18/08/01 17:32:21 INFO sink.LoggerSink: Event: { headers:{} body: 31 31 31 31 31 31 0D 111111. }

案例2:

NetCat Source:监听一个指定的网络端口,即只要应用程序向这个端口里面写数据,这个source组件就可以获取到信息。 其中 Sink:hdfs Channel:file (相比于案例1的两个变化)
flume官网中HDFS Sink的描述:

编写配置文件

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = 192.168.1.246
a1.sources.r1.port = 44444 # Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://node-231:8020/user/hdfs/flume/netcat
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.rollInterval = 10
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S
a1.sinks.k1.hdfs.useLocalTimeStamp = true # Use a channel which buffers events in file
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /usr/flume/checkpoint
a1.channels.c1.dataDirs = /usr/flume/data # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent

flume-ng agent -n a1 -c ../conf -f netcat2hdfs.conf -Dflume.root.logger=DEBUG,console

telnet发送数据

[root@node-247 ~]# telnet 192.168.1.246 44444
Trying 192.168.1.246...
Connected to 192.168.1.246.
Escape character is '^]'.
write to hdfs
OK

Agent节点日志信息

18/08/01 17:39:28 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
18/08/01 17:39:28 INFO hdfs.BucketWriter: Creating hdfs://node-231:8020/user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959.tmp
18/08/01 17:39:39 INFO hdfs.BucketWriter: Closing hdfs://node-231:8020/user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959.tmp
18/08/01 17:39:39 INFO hdfs.BucketWriter: Renaming hdfs://node-231:8020/user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959.tmp to hdfs://node-231:8020/user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959
18/08/01 17:39:39 INFO hdfs.HDFSEventSink: Writer callback called.
18/08/01 17:39:53 INFO file.EventQueueBackingStoreFile: Start checkpoint for /usr/flume/checkpoint/checkpoint_1533116333069, elements to sync = 1
18/08/01 17:39:53 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1533116333542, queueSize: 0, queueHead: 0
18/08/01 17:39:53 INFO file.Log: Updated checkpoint for file: /usr/flume/data/log-1 position: 171 logWriteOrderID: 1533116333542

写入成功,验证

[root@node-235 bin]# hadoop fs -ls /user/hdfs/flume/netcat/
Found 1 items
-rw-r--r-- 3 root hdfs 15 2018-08-01 17:39 /user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959 [root@node-235 bin]# hadoop fs -cat /user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959
write to hdfs

再次telnet发送数据会发现HDFS目录下会有两个数据文件

案例3:

Spooling Directory Source:监听一个指定的目录,即只要应用程序向这个指定的目录中添加新的文件,source组件就可以获取到该信息,并解析该文件的内容,然后写入到channle。写入完成后,标记该文件已完成或者删除该文件。其中 Sink:logger Channel:memory
flume官网中Spooling Directory Source描述:
Property Name Default Description
channels –
type – The component type name, needs to be spooldir.
spoolDir – Spooling Directory Source监听的目录
fileSuffix .COMPLETED 文件内容写入到channel之后,标记该文件
deletePolicy never 文件内容写入到channel之后的删除策略: never or immediate
fileHeader false Whether to add a header storing the absolute path filename.
ignorePattern ^$ Regular expression specifying which files to ignore (skip)
interceptors – 指定传输中event的head(头信息),常用timestamp

Spooling Directory Source的两个注意事项:
①If a file is written to after being placed into the spooling directory, Flume will print an error to its log file and stop processing.
即:拷贝到spool目录下的文件不可以再打开编辑
②If a file name is reused at a later time, Flume will print an error to its log file and stop processing.
即:不能将具有相同文件名字的文件拷贝到这个目录下

编写配置文件

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /usr/local/test/datainput
a1.sources.r1.fileHeader = true
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = timestamp # Describe the sink
a1.sinks.k1.type = logger # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent

flume-ng agent -n a1 -c ../conf -f spool.conf -Dflume.root.logger=DEBUG,console 

在该路径下放入测试文件,内容为hello spool

cp test.txt datainput/

agent日志

18/08/01 17:52:48 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /usr/local/test/datainput/test.txt to /usr/local/test/datainput/test.txt.COMPLETED
18/08/01 17:52:48 INFO sink.LoggerSink: Event: { headers:{file=/usr/local/test/datainput/test.txt, timestamp=1533117168275} body: 68 65 6C 6C 6F 20 73 70 6F 6F 6C hello spool }

从控制台显示的结果可以看出event的头信息中包含了时间戳信息。
同时我们查看一下Spooling Directory中的datafile信息—-文件内容写入到channel之后,该文件被标记了

[root@node-246 test]# ls datainput/
test.txt.COMPLETED

案例4:

Spooling Directory Source:监听一个指定的目录,即只要应用程序向这个指定的目录中添加新的文件,source组件就可以获取到该信息,并解析该文件的内容,然后写入到channle。写入完成后,标记该文件已完成或者删除该文件。 其中 Sink:hdfs Channel:file (相比于案例3的两个变化)
编写配置文件

#name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /usr/local/test/datainput
a1.sources.r1.fileHeader = true
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = timestamp # Describe the sink
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://node-231:8020/user/hdfs/flume/spool/dataoutput
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.rollInterval = 10
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S
a1.sinks.k1.hdfs.useLocalTimeStamp = true # Use a channel which buffers events in file
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /usr/flume/checkpoint
a1.channels.c1.dataDirs = /usr/flume/data # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent

flume-ng agent -n a1 -c ../conf -f spool2hdfs.conf -Dflume.root.logger=DEBUG,console

向datainput下放入新文件test1.txt

cp test1.txt datainput

agent日志

18/08/01 17:58:41 INFO file.EventQueueBackingStoreFile: Start checkpoint for /usr/flume/checkpoint/checkpoint_1533116333069, elements to sync = 1
18/08/01 17:58:41 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1533117491901, queueSize: 0, queueHead: 0
18/08/01 17:58:42 INFO file.Log: Updated checkpoint for file: /usr/flume/data/log-3 position: 241 logWriteOrderID: 1533117491901
18/08/01 17:58:42 INFO file.LogFile: Closing RandomReader /usr/flume/data/log-1
18/08/01 17:58:42 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.
18/08/01 17:58:42 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.
18/08/01 17:58:43 INFO hdfs.BucketWriter: Closing hdfs://node-231:8020/user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457.tmp
18/08/01 17:58:43 INFO hdfs.BucketWriter: Renaming hdfs://node-231:8020/user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457.tmp to hdfs://node-231:8020/user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457
18/08/01 17:58:43 INFO hdfs.HDFSEventSink: Writer callback called.
18/08/01 17:58:32 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /usr/local/test/datainput/test1.txt to /usr/local/test/datainput/test1.txt.COMPLETED
18/08/01 17:58:32 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.
18/08/01 17:58:32 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
18/08/01 17:58:32 INFO hdfs.BucketWriter: Creating hdfs://node-231:8020/user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457.tmp

查看HDFS

[root@node-235 bin]# hadoop fs -ls /user/hdfs/flume/spool/dataoutput
Found 1 items
-rw-r--r-- 3 root hdfs 12 2018-08-01 17:58 /user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457
[root@node-235 bin]# hadoop fs -cat /user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457
hello spool

查看datainput下的文件状态

[root@node-246 test]# ls datainput/
test1.txt.COMPLETED test.txt.COMPLETED

案例5:

Exec Source:监听一个指定的命令,获取一条命令的结果作为它的数据源
常用的是tail -F file指令,即只要应用程序向日志(文件)里面写数据,source组件就可以获取到日志(文件)中最新的内容 。 其中 Sink:hdfs Channel:file
这个案列为了方便显示Exec Source的运行效果,结合Hive中的external table进行来说明。
编写配置文件

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /usr/local/test/log.file # Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://node-231:8020/user/hdfs/flume/exec/dataoutput
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.rollInterval = 10
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S
a1.sinks.k1.hdfs.useLocalTimeStamp = true # Use a channel which buffers events in file
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /usr/flume/checkpoint
a1.channels.c1.dataDirs = /usr/flume/data # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

创建hive外部表

create external table flume_exec_table
(info String)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
location '/user/hdfs/flume/exec/dataoutput'

启动agent

flume-ng agent -n a1 -c ../conf -f exec.conf -Dflume.root.logger=DEBUG,console

使用echo命令向/usr/local/test/log.file中写入数据

echo firstline=1 >> /usr/local/test/log.file

查看hive中的数据

总结Exec source:Exec source和Spooling Directory Source是两种常用的日志采集的方式,其中Exec source可以实现对日志的实时采集,Spooling Directory Source在对日志的实时采集上稍有欠缺,尽管Exec source可以实现对日志的实时采集,但是当Flume不运行或者指令执行出错时,Exec source将无法收集到日志数据,日志会出现丢失,从而无法保证收集日志的完整性。

案例6:

Avro Source:监听一个指定的Avro 端口,通过Avro 端口可以获取到Avro client发送过来的文件 。即只要应用程序通过Avro 端口发送文件,source组件就可以获取到该文件中的内容。 其中 Sink:hdfs Channel:file
(注:Avro和Thrift都是一些序列化的网络端口–通过这些网络端口可以接受或者发送信息,Avro可以发送一个给定的文件给Flume,Avro 源使用AVRO RPC机制)
Avro Source运行原理如下图:

flume官网中Avro Source的描述:
Property Name Default Description
channels –
type – The component type name, needs to be avro
bind – 日志需要发送到的主机名或者ip,该主机运行着ARVO类型的source
port – 日志需要发送到的端口号,该端口要有ARVO类型的source在监听

编写配置文件

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.bind = 192.168.1.246
a1.sources.r1.port = 4141 # Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.rollInterval = 10
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S
a1.sinks.k1.hdfs.useLocalTimeStamp = true # Use a channel which buffers events in file
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /usr/flume/checkpoint
a1.channels.c1.dataDirs = /usr/flume/data # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent

flume-ng agent -n a1 -c ../conf -f avro.conf -Dflume.root.logger=DEBUG,console

使用avro-client发送文件

flume-ng avro-client -c ../conf -H 192.168.1.246 -p 4141 -F /usr/local/test/log.file

agent日志如下

18/08/02 09:56:14 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 => /192.168.1.246:4141] OPEN
18/08/02 09:56:14 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 => /192.168.1.246:4141] BOUND: /192.168.1.246:4141
18/08/02 09:56:14 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 => /192.168.1.246:4141] CONNECTED: /192.168.1.246:43750
18/08/02 09:56:15 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 :> /192.168.1.246:4141] DISCONNECTED
18/08/02 09:56:15 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 :> /192.168.1.246:4141] UNBOUND
18/08/02 09:56:15 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 :> /192.168.1.246:4141] CLOSED
18/08/02 09:56:15 INFO ipc.NettyServer: Connection to /192.168.1.246:43750 disconnected.
18/08/02 09:56:18 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
18/08/02 09:56:18 INFO hdfs.BucketWriter: Creating hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446.tmp
18/08/02 09:56:19 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
18/08/02 09:56:19 INFO hdfs.BucketWriter: Creating hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087.tmp
18/08/02 09:56:22 INFO file.EventQueueBackingStoreFile: Start checkpoint for /usr/flume/checkpoint/checkpoint_1533116333069, elements to sync = 3
18/08/02 09:56:22 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1533174892909, queueSize: 0, queueHead: 1
18/08/02 09:56:23 INFO file.Log: Updated checkpoint for file: /usr/flume/data/log-5 position: 361 logWriteOrderID: 1533174892909
18/08/02 09:56:23 INFO file.LogFile: Closing RandomReader /usr/flume/data/log-3
18/08/02 09:56:29 INFO hdfs.BucketWriter: Closing hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446.tmp
18/08/02 09:56:29 INFO hdfs.BucketWriter: Renaming hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446.tmp to hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446
18/08/02 09:56:29 INFO hdfs.HDFSEventSink: Writer callback called.
18/08/02 09:56:29 INFO hdfs.BucketWriter: Closing hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087.tmp
18/08/02 09:56:29 INFO hdfs.BucketWriter: Renaming hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087.tmp to hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087
18/08/02 09:56:29 INFO hdfs.HDFSEventSink: Writer callback called.

查看HDFS下文件

[root@node-231 ~]# hadoop fs -ls /user/hdfs/flume/avro/dataoutput/
Found 2 items
-rw-r--r-- 3 root hdfs 12 2018-08-02 09:56 /user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446
-rw-r--r-- 3 root hdfs 25 2018-08-02 09:56 /user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087

案例7:

syslogtcp
Syslogtcp监听TCP的端口做为数据源

agent配置文件如下

a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5140
a1.sources.r1.host = 192.168.1.246
a1.sources.r1.channels = c1 # Describe the sink
a1.sinks.k1.type = logger # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent

flume-ng agent -n a1 -c ../conf -f syslogtcp.conf -Dflume.root.logger=DEBUG,console

产生测试syslog

echo "test syslogtcp" | nc 192.168.1.246 5140

agent日志

18/08/02 11:13:48 WARN source.SyslogUtils: Event created from Invalid Syslog data.
18/08/02 11:13:49 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 74 65 73 74 20 73 79 73 6C 6F 67 74 63 70 test syslogtcp }

案例8:

JSONHandler
创建agent配置文件

a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = org.apache.flume.source.http.HTTPSource
a1.sources.r1.host = 192.168.1.246
a1.sources.r1.port = 8888
a1.sources.r1.channels = c1 # Describe the sink
a1.sinks.k1.type = logger # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent

flume-ng agent -n a1 -c ../conf -f httpsource.conf -Dflume.root.logger=DEBUG,console

生成JSON格式的POST request

curl -X POST -d '[{ "headers" :{"a" : "a1","b" : "b1"},"body" : "idoall.org_body"}]' http://192.168.1.246:8888

agent日志

18/08/02 11:28:34 INFO sink.LoggerSink: Event: { headers:{a=a1, b=b1} body: 69 64 6F 61 6C 6C 2E 6F 72 67 5F 62 6F 64 79 idoall.org_body }

案例9:

File Roll Sink
"file_roll"表示将数据存入本地文件系统

创建配置文件

a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5555
a1.sources.r1.host = 192.168.1.246
a1.sources.r1.channels = c1 # Describe the sink
a1.sinks.k1.type = file_roll
a1.sinks.k1.sink.directory = /usr/local/test/fileroll # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent

flume-ng agent -n a1 -c ../conf -f fileroll.conf -Dflume.root.logger=DEBUG,console

测试产生log

echo "hello idoall.org syslog" | nc 192.168.1.246 5555

agent日志

18/08/02 11:53:34 WARN source.SyslogUtils: Event created from Invalid Syslog data.

查看/usr/local/test/fileroll目录下文件

[root@node-246 fileroll]# ls
1533181857932-1 1533181857932-2 1533181857932-3 1533181857932-4 1533181857932-5 1533181857932-6 1533181857932-7
[root@node-246 fileroll]# cat 1533181857932-6
hello idoall.org syslog

案例10

Replicating Channel Selector
 Flume支持Fan out流从一个源到多个通道。有两种模式的Fan out,分别是复制和复用。在复制的情况下,流的事件被发送到所有的配置通道。在复用的情况下,事件被发送到可用的渠道中的一个子集。Fan out流需要指定源和Fan out通道的规则。
创建replicating_Channel_Selector配置文件

a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2 # Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5140
a1.sources.r1.host = 192.168.1.246
a1.sources.r1.channels = c1 c2
a1.sources.r1.selector.type = replicating # Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = 192.168.1.246
a1.sinks.k1.port = 5555 a1.sinks.k2.type = avro
a1.sinks.k2.channel = c2
a1.sinks.k2.hostname = 192.168.1.247
a1.sinks.k2.port = 5555 # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100 a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

创建replicating_Channel_Selector_avro配置文件

a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 192.168.1.246
a1.sources.r1.port = 5555 # Describe the sink
a1.sinks.k1.type = logger # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

将这两个配置文件拷贝到另一台机器247上,并修改配置中的IP

scp replicating_Channel_Selector* root@node-247:/usr/local/test/flume/

打开四个窗口,分别启动两个agent

flume-ng agent -n a1 -c ../conf -f replicating_Channel_Selector_avro.conf -Dflume.root.logger=DEBUG,console
flume-ng agent -n a1 -c ../conf -f replicating_Channel_Selector.conf -Dflume.root.logger=DEBUG,console

测试产生syslog

echo "hello idoall.org syslog" | nc 192.168.1.246 5140

agent日志

18/08/02 14:09:53 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 68 65 6C 6C 6F 20 69 64 6F 61 6C 6C 2E 6F 72 67 hello idoall.org }

案例11

Multiplexing Channel Selector
新建Multiplexing_Channel_Selector配置文件

a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2 # Describe/configure the source
a1.sources.r1.type = org.apache.flume.source.http.HTTPSource
a1.sources.r1.host = 192.168.1.246
a1.sources.r1.port = 5140
a1.sources.r1.channels = c1 c2
a1.sources.r1.selector.type = multiplexing a1.sources.r1.selector.header = type
#映射允许每个值通道可以重叠。默认值可以包含任意数量的通道。
a1.sources.r1.selector.mapping.baidu = c1
a1.sources.r1.selector.mapping.ali = c2
a1.sources.r1.selector.default = c1 # Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = 192.168.1.246
a1.sinks.k1.port = 5555 a1.sinks.k2.type = avro
a1.sinks.k2.channel = c2
a1.sinks.k2.hostname = 192.168.1.247
a1.sinks.k2.port = 5555 # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100 a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100
新建Multiplexing_Channel_Selector_avro配置文件
a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 192.168.1.246
a1.sources.r1.port = 5555 # Describe the sink
a1.sinks.k1.type = logger # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

将配置文件拷贝到另一个节点,并修改为对应IP

scp Multiplexing_Channel_Selector* root@192.168.1.247:/usr/local/test/flume/

开启四个窗口,246 247分别两个,分别启动agent

flume-ng agent -n a1 -c ../conf -f Multiplexing_Channel_Selector_avro.conf -Dflume.root.logger=DEBUG,console
flume-ng agent -n a1 -c ../conf -f Multiplexing_Channel_Selector.conf -Dflume.root.logger=DEBUG,console

任意节点上,测试产生syslog

curl -X POST -d '[{ "headers" :{"type" : "baidu"},"body" : "idoall_TEST1"}]' http://192.168.1.247:5140 && curl -X POST -d '[{ "headers" :{"type" : "ali"},"body" : "idoall_TEST2"}]' http://192.168.1.247:5140 && curl -X POST -d '[{ "headers" :{"type" : "qq"},"body" : "idoall_TEST3"}]' http://192.168.1.246:5140

agent日志
246上

18/08/02 14:36:04 INFO sink.LoggerSink: Event: { headers:{type=qq} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 33 idoall_TEST3 }
18/08/02 14:36:04 INFO sink.LoggerSink: Event: { headers:{type=baidu} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 31 idoall_TEST1 }

247上

18/08/02 14:36:06 INFO sink.LoggerSink: Event: { headers:{type=ali} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 32 idoall_TEST2 }

可以看到,根据header中不同的条件分布到不同的channel上

案例12

Flume Sink Procesors
Failover的机器是一直发送给其中一个sink,当这个sink不可用的时候,自动发送到下一个sink

创建Flume_Sink_Processors配置文件

a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2 #这个是配置failover的关键,需要有一个sink group
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
#处理的类型是failover
a1.sinkgroups.g1.processor.type = failover
#优先级,数字越大优先级越高,每个sink的优先级必须不相同
a1.sinkgroups.g1.processor.priority.k1 = 5
a1.sinkgroups.g1.processor.priority.k2 = 10
#设置为10秒,当然可以根据你的实际状况更改成更快或者很慢
a1.sinkgroups.g1.processor.maxpenalty = 10000 # Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.host = 192.168.1.246
a1.sources.r1.port = 5140
a1.sources.r1.channels = c1 c2
a1.sources.r1.selector.type = replicating # Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = 192.168.1.246
a1.sinks.k1.port = 5555 a1.sinks.k2.type = avro
a1.sinks.k2.channel = c2
a1.sinks.k2.hostname = 192.168.1.247
a1.sinks.k2.port = 5555 # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100 a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

  

新建Flume_Sink_Processors_avro配置文件

a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 192.168.1.246
a1.sources.r1.port = 5555 # Describe the sink
a1.sinks.k1.type = logger # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

将这两个文件拷贝到247节点,并修改对应host

scp Flume_Sink_Processors* root@192.168.1.247:/usr/local/test/flume/

开启四个窗口,分别启动两个agent

flume-ng agent -n a1 -c ../conf -f Flume_Sink_Processors_avro.conf -Dflume.root.logger=DEBUG,console
flume-ng agent -n a1 -c ../conf -f Flume_Sink_Processors.conf -Dflume.root.logger=DEBUG,console

测试产生log

echo "idoall.org test1 failover" | nc 192.168.1.246 5140

因为247的优先级高,所以在247的sink窗口,可以看到日志

18/08/02 15:47:44 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 31 idoall.org test1 }

这时停掉247的sink(Ctrl+c),再次输出测试数据

echo "idoall.org test1 failover" | nc 192.168.1.246 5140

可以看到246的sink日志

18/08/02 15:51:23 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 31 idoall.org test1 }

案例13

Load balancing Sink Processor
load balance type和failover不同的地方是,load balance有两个配置,一个是轮询,一个是随机。两种情况下如果被选择的sink不可用,就会自动尝试发送到下一个可用的sink上面。
新建Load_balancing_Sink_Processors配置文件

a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 #这个是配置Load balancing的关键,需要有一个sink group
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinkgroups.g1.processor.type = load_balance
a1.sinkgroups.g1.processor.backoff = true
a1.sinkgroups.g1.processor.selector = round_robin # Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.host = 192.168.1.246
a1.sources.r1.port = 5140
a1.sources.r1.channels = c1 # Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = 192.168.1.246
a1.sinks.k1.port = 5555 a1.sinks.k2.type = avro
a1.sinks.k2.channel = c1
a1.sinks.k2.hostname = 192.168.1.247
a1.sinks.k2.port = 5555 # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

新建Load_balancing_Sink_Processors_arvo配置文件

a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 192.168.1.246
a1.sources.r1.port = 5555 # Describe the sink
a1.sinks.k1.type = logger # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

将这两个文件拷贝到247节点下,并修改IP

scp Load_balancing_Sink_Processors* root@192.168.1.247:/usr/local/test/flume/

启动四个窗口,启动四个agent

flume-ng agent -n a1 -c ../conf -f Load_balancing_Sink_Processors_avro.conf -Dflume.root.logger=DEBUG,console

测试产生log

[root@node-246 ~]# echo "idoall.org test1" | nc 192.168.1.246 5140
[root@node-246 ~]# echo "idoall.org test2" | nc 192.168.1.246 5140
[root@node-246 ~]# echo "idoall.org test3" | nc 192.168.1.246 5140
[root@node-246 ~]# echo "idoall.org test4" | nc 192.168.1.246 5140

247日志

18/08/02 18:36:20 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 31 idoall.org test1 }
18/08/02 18:36:35 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 32 idoall.org test2 }
18/08/02 18:36:58 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 34 idoall.org test4 }

246日志

18/08/02 18:36:47 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 33 idoall.org test3 }

案例14

Hbase sink

将hbase lib下文件复制到flume lib下

protobuf-java-2.5.0.jar
hbase-client-0.96.2-hadoop2.jar
hbase-common-0.96.2-hadoop2.jar
hbase-protocol-0.96.2-hadoop2.jar
hbase-server-0.96.2-hadoop2.jar
hbase-hadoop2-compat-0.96.2-hadoop2.jar
hbase-hadoop-compat-0.96.2-hadoop2.jar
htrace-core-2.04.jar

  

cp protobuf-java-2.5.0.jar hbase-client-1.1.2.2.6.1.0-129.jar hbase-common-1.1.2.2.6.1.0-129.jar hbase-protocol-1.1.2.2.6.1.0-129.jar hbase-server-1.1.2.2.6.1.0-129.jar hbase-hadoop2-compat-1.1.2.2.6.1.0-129.jar hbase-hadoop-compat-1.1.2.2.6.1.0-129.jar htrace-core-3.1.0-incubating.jar /usr/hdp/2.6.1.0-129/flume/lib/

hbase新建表 flume_test 列族name

hbase(main):003:0> create 'flume_test', 'name'
0 row(s) in 2.3900 seconds
=> Hbase::Table - flume_test

新建agent配置文件hbase_simple

a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5140
a1.sources.r1.host = 192.168.1.246
a1.sources.r1.channels = c1 # Describe the sink
a1.sinks.k1.type = logger
a1.sinks.k1.type = hbase
a1.sinks.k1.table = flume_test
a1.sinks.k1.columnFamily = name
a1.sinks.k1.column = message
a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
a1.sinks.k1.channel = memoryChannel # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent

flume-ng agent -n a1 -c ../conf -f hbase_simple.conf -Dflume.root.logger=DEBUG,console

产生测试log

echo "hello zzz.org from flume" | nc 192.168.1.246 5140

agent日志

18/08/03 10:01:07 WARN source.SyslogUtils: Event created from Invalid Syslog data.

查看hbase

hbase(main):006:0> scan 'flume_test'
ROW COLUMN+CELL
1533261667472-IamY4IbgS7-0 column=name:payload, timestamp=1533261670851, value=hello zzz.org from flume
1 row(s) in 0.1130 seconds

案例15

使用flume avro采集平台日志
Agent文件如下,采集完成直接写入HDFS

[root@node-246 flume]# cat avro_tag.conf
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.bind = 192.168.1.246
a1.sources.r1.port = 44444 # Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://node-231:8020/user/hdfs/flume/avro_tag/dataoutput
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.rollInterval = 10
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S
a1.sinks.k1.hdfs.useLocalTimeStamp = true # Use a channel which buffers events in file
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /usr/flume/checkpoint
a1.channels.c1.dataDirs = /usr/flume/data # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

平台需要引入的包

<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-flume-ng</artifactId>
<version>${log4j.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flume.flume-ng-clients</groupId>
<artifactId>flume-ng-log4jappender</artifactId>
<version>1.8.0</version>
</dependency>
<!-- log4j-core -->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>${log4j.version}</version>
</dependency>
<!-- log4j-api -->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>${log4j.version}</version>
</dependency>
<!-- log4j-web -->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-web</artifactId>
<version>${log4j.version}</version>
</dependency>

log4j2.xml,如下

<?xml version="1.0" encoding="UTF-8"?>
<!--日志级别以及优先级排序: OFF > FATAL > ERROR > WARN > INFO > DEBUG > TRACE > ALL -->
<!--Configuration后面的status,这个用于设置log4j2自身内部的信息输出,可以不设置,当设置成trace时,你会看到log4j2内部各种详细输出-->
<!--monitorInterval:Log4j能够自动检测修改配置 文件和重新配置本身,设置间隔秒数-->
<configuration status="INFO" monitorInterval="30">
<properties>
<property name="LOG_HOME">../logs</property>
<property name="TMP_LOG_FILE_NAME">tmp</property>
<property name="INFO_LOG_FILE_NAME">info</property>
<property name="WARN_LOG_FILE_NAME">warn</property>
<property name="ERROR_LOG_FILE_NAME">error</property>
</properties>
<!--先定义所有的appender-->
<appenders>
<!--这个输出控制台的配置-->
<console name="Console" target="SYSTEM_OUT">
<!--输出日志的格式-->
<PatternLayout pattern="[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n"/>
</console>
<!--文件会打印出所有信息,这个log每次运行程序会自动清空,由append属性决定,这个也挺有用的,适合临时测试用-->
<File name="log" fileName="${LOG_HOME}/${TMP_LOG_FILE_NAME}.log" append="false">
<PatternLayout pattern="%d{HH:mm:ss.SSS} %-5level %class{36} %L %M - %msg%xEx%n"/>
</File>
<!-- 这个会打印出所有的info及以下级别的信息,每次大小超过size,则这size大小的日志会自动存入按年份-月份建立的文件夹下面并进行压缩,作为存档-->
<RollingFile name="RollingFileInfo" fileName="${LOG_HOME}/${INFO_LOG_FILE_NAME}.log"
filePattern="${LOG_HOME}/${INFO_LOG_FILE_NAME}-%d{yyyy-MM-dd}-%i.log">
<!--控制台只输出level及以上级别的信息(onMatch),其他的直接拒绝(onMismatch)-->
<ThresholdFilter level="info" onMatch="ACCEPT" onMismatch="DENY"/>
<PatternLayout pattern="[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n"/>
<Policies>
<TimeBasedTriggeringPolicy/>
<SizeBasedTriggeringPolicy size="100 MB"/>
</Policies>
</RollingFile>
<RollingFile name="RollingFileWarn" fileName="${LOG_HOME}/${WARN_LOG_FILE_NAME}.log"
filePattern="${LOG_HOME}/${WARN_LOG_FILE_NAME}-%d{yyyy-MM-dd}-%i.log">
<ThresholdFilter level="warn" onMatch="ACCEPT" onMismatch="DENY"/>
<PatternLayout pattern="[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n"/>
<Policies>
<TimeBasedTriggeringPolicy/>
<SizeBasedTriggeringPolicy size="100 MB"/>
</Policies>
<!-- DefaultRolloverStrategy属性如不设置,则默认为最多同一文件夹下7个文件,这里设置了20 -->
<DefaultRolloverStrategy max="20"/>
</RollingFile>
<RollingFile name="RollingFileError" fileName="${LOG_HOME}/${ERROR_LOG_FILE_NAME}.log"
filePattern="${LOG_HOME}/${ERROR_LOG_FILE_NAME}-%d{yyyy-MM-dd}-%i.log">
<ThresholdFilter level="error" onMatch="ACCEPT" onMismatch="DENY"/>
<PatternLayout pattern="[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n"/>
<Policies>
<TimeBasedTriggeringPolicy/>
<SizeBasedTriggeringPolicy size="100 MB"/>
</Policies>
</RollingFile>
<!-- flume配置 -->
<Flume name="FlumeAppender" compress="true">
<Agent host="192.168.1.246" port="44444"/>
<!-- <RFC5424Layout charset="UTF-8" enterpriseNumber="18060" includeMDC="true" appName="myapp"/> -->
<PatternLayout charset="GBK" pattern="[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n" />
</Flume>
</appenders>
<!--然后定义logger,只有定义了logger并引入的appender,appender才会生效-->
<loggers>
<!--过滤掉spring和mybatis的一些无用的DEBUG信息-->
<logger name="org.springframework" level="INFO"></logger>
<logger name="org.mybatis" level="INFO"></logger>
<!-- <Logger name="sysLog" level="trace">
<AppenderRef ref="FlumeAppender"/>
</Logger> -->
<root level="info">
<appender-ref ref="Console"/>
<appender-ref ref="RollingFileInfo"/>
<appender-ref ref="RollingFileWarn"/>
<appender-ref ref="RollingFileError"/>
<!-- 日志写入flume source -->
<appenderRef ref="FlumeAppender"/>
</root>
</loggers>
</configuration>

启动agent

flume-ng agent -n a1 -c ../conf -f avro_tag.conf -Dflume.root.logger=DEBUG,console

启动项目之后就会将日志信息通过flume写入HDFS

Flume具体应用(多案例)的更多相关文章

  1. Flume 高可用配置案例+load balance负载均衡+ 案例:日志的采集及汇总

    高可用配置案例 (一).failover故障转移 在完成单点的Flume NG搭建后,下面我们搭建一个高可用的Flume NG集群,架构图如下所示: (1)节点分配 Flume的Agent和Colle ...

  2. 第1节 flume:6、flume的入门测试案例

    案例:使用网络telent命令向一台机器发送一些网络数据,然后通过flume采集网络端口数据. 1.2.1 Flume的安装部署 第一步:下载解压修改配置文件 Flume的安装非常简单,只需要解压即可 ...

  3. Flume系列二之案例实战

    Flume案例实战 写在前面 通过前面一篇文章http://blog.csdn.net/liuge36/article/details/78589505的介绍我们已经知道flume到底是什么?flum ...

  4. 日志采集框架Flume以及Flume的安装部署(一个分布式、可靠、和高可用的海量日志采集、聚合和传输的系统)

    Flume支持众多的source和sink类型,详细手册可参考官方文档,更多source和sink组件 http://flume.apache.org/FlumeUserGuide.html Flum ...

  5. Flume篇---Flume安装配置与相关使用

    一.前述 Copy过来一段介绍Apache Flume 是一个从可以收集例如日志,事件等数据资源,并将这些数量庞大的数据从各项数据资源中集中起来存储的工具/服务,或者数集中机制.flume具有高可用, ...

  6. nginx+ flume

    nginx 作用: 做负载均衡  nginx和lvs的区别:nginx可以做反向代理 1.上传nginx安装包  tar -zxvf tengine-2.1.02.安装环境  依赖 gcc opens ...

  7. flume学习笔记

    #################################################################################################### ...

  8. flume介绍及应用

    版权声明:本文为yunshuxueyuan原创文章.如需转载请标明出处: http://www.cnblogs.com/sxt-zkys/QQ技术交流群:299142667 flume的概念 1.   ...

  9. Flume系列一之架构介绍和安装

    Flume架构介绍和安装 写在前面 在学习一门新的技术之前,我们得知道了解这个东西有什么用?我们可以使用它来做些什么呢?简单来说,flume是大数据日志分析中不能缺少的一个组件,既可以使用在流处理中, ...

随机推荐

  1. 使用RestTemplate post方式提交表单数据

    HttpHeaders headers = new HttpHeaders(); headers.setContentType(MediaType.APPLICATION_FORM_URLENCODE ...

  2. eclipse content assist 出现错误

    解决方法是,在Window->preference->java->editor>Content Assist->advanced ,将 time out 由50 ms 改 ...

  3. Django分发控制器urls--白话聊Django系列

    开始前,先上一张图,让理解Django内部的处理流程,从图中我们可以知道Django内部使用MTV架构,那今天讲的第一个部分就是控制器,在Tornado框架中叫做路由系统,负责把url映射到相应的处理 ...

  4. Qt for Android 启动短暂的黑屏或白屏问题如何解决?

    解决方法一: 使用透明主题 点击项目 -> 在 构建设置 里面找到 Build Android APK 栏目,点击 create templates 创建一个 AndroidManifest.x ...

  5. 前端基础 & 初识JS(JavaScript)

    JavaScript概述 JavaScript的历史 1992年Nombas开发出C-minus-minus(C--)的嵌入式脚本语言(最初绑定在CEnvi软件中),后将其改名ScriptEase(客 ...

  6. 0x03 MySQl 库操作

    一 系统数据库 information_schema: 虚拟库,不占用磁盘空间,存储的是数据库启动后的一些参数,如用户表信息.列信息.权限信息.字符信息等performance_schema: MyS ...

  7. Linux中权限管理之sudo权限

    1.suodo的操作对象是系统命令 2.root把本来只能是超级用户执行的命令赋予普通用户执行 3.设置sudo权限 命令:visudo 找到: ## Allow root to run any co ...

  8. Latex技巧:插入参考文献

    LaTeX插入参考文献,可以使用BibTex,也可以不使用BibTex. 方法一:不使用BibTeX 先在文章文章末尾写好需要插入的参考文献,逐一写出,例如: \begin{thebibliograp ...

  9. maven 项目打包时无法解析读取properties文件

    在做项目时遇见一个问题,无法解析properties文件的 内容 异常为 Could not resolve placeholder ......... 在此之前均有做相关的 配置 但是从未出现过如上 ...

  10. docker多个容器连接 将 Rails 程序部署到 Docker 容器中

    在docker中使用MySQL数据库 https://yq.aliyun.com/articles/583765 将 Rails 程序部署到 Docker 容器中