Flume具体应用(多案例)
日志采集
对于flume的原理其实很容易理解,我们更应该掌握flume的具体使用方法,flume提供了大量内置的Source、Channel和Sink类型。而且不同类型的Source、Channel和Sink可以自由组合—–组合方式基于用户设置的配置文件,非常灵活。比如:Channel可以把事件暂存在内存里,也可以持久化到本地硬盘上。Sink可以把日志写入HDFS, HBase,甚至是另外一个Source等等。下面我将用具体的案例详述flume的具体用法。
其实flume的用法很简单—-书写一个配置文件,在配置文件当中描述source、channel与sink的具体实现,而后运行一个agent实例,在运行agent实例的过程中会读取配置文件的内容,这样flume就会采集到数据。
配置文件的编写原则:
1>从整体上描述代理agent中sources、sinks、channels所涉及到的组件
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
2>详细描述agent中每一个source、sink与channel的具体实现:即在描述source的时候,需要
指定source到底是什么类型的,即这个source是接受文件的、还是接受http的、还是接受thrift
的;对于sink也是同理,需要指定结果是输出到HDFS中,还是Hbase中啊等等;对于channel
需要指定是内存啊,还是数据库啊,还是文件啊等等。
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
3>通过channel将source与sink连接起来
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
启动agent的shell操作:
flume-ng agent -n a1 -c ../conf -f ../conf/example.file
-Dflume.root.logger=DEBUG,console
参数说明: -n 指定agent名称(与配置文件中代理的名字相同)
-c 指定flume中配置文件的目录
-f 指定配置文件
-Dflume.root.logger=DEBUG,console 设置日志等级
案例1
NetCat Source:监听一个指定的网络端口,即只要应用程序向这个端口里面写数据,这个source组件就可以获取到信息。 其中 Sink:logger Channel:memory
flume官网中NetCat Source描述
Property Name Default Description
channels –
type – The component type name, needs to be netcat
bind – 日志需要发送到的主机名或者Ip地址,该主机运行着netcat类型的source在监听
port – 日志需要发送到的端口号,该端口号要有netcat类型的source在监听
编写配置文件
- # Name the components on this agent
- a1.sources = r1
- a1.sinks = k1
- a1.channels = c1
- # Describe/configure the source
- a1.sources.r1.type = netcat
- a1.sources.r1.bind = 192.168.1.246
- a1.sources.r1.port = 44444
- # Describe the sink
- a1.sinks.k1.type = logger
- # Use a channel which buffers events in memory
- a1.channels.c1.type = memory
- a1.channels.c1.capacity = 1000
- a1.channels.c1.transactionCapacity = 100
- # Bind the source and sink to the channel
- a1.sources.r1.channels = c1
- a1.sinks.k1.channel = c1
启动agent
- flume-ng agent -n a1 -c ../conf -f netcat.conf -Dflume.root.logger=DEBUG,console
使用telnet发送数据
- [root@node-247 ~]# telnet 192.168.1.246 44444
- Trying 192.168.1.246...
- Connected to 192.168.1.246.
- Escape character is '^]'.
- 111111
- OK
在agent节点查看输出
- 18/08/01 17:32:21 INFO sink.LoggerSink: Event: { headers:{} body: 31 31 31 31 31 31 0D 111111. }
案例2:
NetCat Source:监听一个指定的网络端口,即只要应用程序向这个端口里面写数据,这个source组件就可以获取到信息。 其中 Sink:hdfs Channel:file (相比于案例1的两个变化)
flume官网中HDFS Sink的描述:
编写配置文件
- # Name the components on this agent
- a1.sources = r1
- a1.sinks = k1
- a1.channels = c1
- # Describe/configure the source
- a1.sources.r1.type = netcat
- a1.sources.r1.bind = 192.168.1.246
- a1.sources.r1.port = 44444
- # Describe the sink
- a1.sinks.k1.type = hdfs
- a1.sinks.k1.hdfs.path = hdfs://node-231:8020/user/hdfs/flume/netcat
- a1.sinks.k1.hdfs.writeFormat = Text
- a1.sinks.k1.hdfs.fileType = DataStream
- a1.sinks.k1.hdfs.rollInterval = 10
- a1.sinks.k1.hdfs.rollSize = 0
- a1.sinks.k1.hdfs.rollCount = 0
- a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S
- a1.sinks.k1.hdfs.useLocalTimeStamp = true
- # Use a channel which buffers events in file
- a1.channels.c1.type = file
- a1.channels.c1.checkpointDir = /usr/flume/checkpoint
- a1.channels.c1.dataDirs = /usr/flume/data
- # Bind the source and sink to the channel
- a1.sources.r1.channels = c1
- a1.sinks.k1.channel = c1
启动agent
- flume-ng agent -n a1 -c ../conf -f netcat2hdfs.conf -Dflume.root.logger=DEBUG,console
telnet发送数据
- [root@node-247 ~]# telnet 192.168.1.246 44444
- Trying 192.168.1.246...
- Connected to 192.168.1.246.
- Escape character is '^]'.
- write to hdfs
- OK
Agent节点日志信息
- 18/08/01 17:39:28 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
- 18/08/01 17:39:28 INFO hdfs.BucketWriter: Creating hdfs://node-231:8020/user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959.tmp
- 18/08/01 17:39:39 INFO hdfs.BucketWriter: Closing hdfs://node-231:8020/user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959.tmp
- 18/08/01 17:39:39 INFO hdfs.BucketWriter: Renaming hdfs://node-231:8020/user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959.tmp to hdfs://node-231:8020/user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959
- 18/08/01 17:39:39 INFO hdfs.HDFSEventSink: Writer callback called.
- 18/08/01 17:39:53 INFO file.EventQueueBackingStoreFile: Start checkpoint for /usr/flume/checkpoint/checkpoint_1533116333069, elements to sync = 1
- 18/08/01 17:39:53 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1533116333542, queueSize: 0, queueHead: 0
- 18/08/01 17:39:53 INFO file.Log: Updated checkpoint for file: /usr/flume/data/log-1 position: 171 logWriteOrderID: 1533116333542
写入成功,验证
- [root@node-235 bin]# hadoop fs -ls /user/hdfs/flume/netcat/
- Found 1 items
- -rw-r--r-- 3 root hdfs 15 2018-08-01 17:39 /user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959
- [root@node-235 bin]# hadoop fs -cat /user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959
- write to hdfs
再次telnet发送数据会发现HDFS目录下会有两个数据文件
案例3:
Spooling Directory Source:监听一个指定的目录,即只要应用程序向这个指定的目录中添加新的文件,source组件就可以获取到该信息,并解析该文件的内容,然后写入到channle。写入完成后,标记该文件已完成或者删除该文件。其中 Sink:logger Channel:memory
flume官网中Spooling Directory Source描述:
Property Name Default Description
channels –
type – The component type name, needs to be spooldir.
spoolDir – Spooling Directory Source监听的目录
fileSuffix .COMPLETED 文件内容写入到channel之后,标记该文件
deletePolicy never 文件内容写入到channel之后的删除策略: never or immediate
fileHeader false Whether to add a header storing the absolute path filename.
ignorePattern ^$ Regular expression specifying which files to ignore (skip)
interceptors – 指定传输中event的head(头信息),常用timestamp
Spooling Directory Source的两个注意事项:
①If a file is written to after being placed into the spooling directory, Flume will print an error to its log file and stop processing.
即:拷贝到spool目录下的文件不可以再打开编辑
②If a file name is reused at a later time, Flume will print an error to its log file and stop processing.
即:不能将具有相同文件名字的文件拷贝到这个目录下
编写配置文件
- # Name the components on this agent
- a1.sources = r1
- a1.sinks = k1
- a1.channels = c1
- # Describe/configure the source
- a1.sources.r1.type = spooldir
- a1.sources.r1.spoolDir = /usr/local/test/datainput
- a1.sources.r1.fileHeader = true
- a1.sources.r1.interceptors = i1
- a1.sources.r1.interceptors.i1.type = timestamp
- # Describe the sink
- a1.sinks.k1.type = logger
- # Use a channel which buffers events in memory
- a1.channels.c1.type = memory
- a1.channels.c1.capacity = 1000
- a1.channels.c1.transactionCapacity = 100
- # Bind the source and sink to the channel
- a1.sources.r1.channels = c1
- a1.sinks.k1.channel = c1
启动agent
- flume-ng agent -n a1 -c ../conf -f spool.conf -Dflume.root.logger=DEBUG,console
在该路径下放入测试文件,内容为hello spool
- cp test.txt datainput/
agent日志
- 18/08/01 17:52:48 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /usr/local/test/datainput/test.txt to /usr/local/test/datainput/test.txt.COMPLETED
- 18/08/01 17:52:48 INFO sink.LoggerSink: Event: { headers:{file=/usr/local/test/datainput/test.txt, timestamp=1533117168275} body: 68 65 6C 6C 6F 20 73 70 6F 6F 6C hello spool }
从控制台显示的结果可以看出event的头信息中包含了时间戳信息。
同时我们查看一下Spooling Directory中的datafile信息—-文件内容写入到channel之后,该文件被标记了
- [root@node-246 test]# ls datainput/
- test.txt.COMPLETED
案例4:
Spooling Directory Source:监听一个指定的目录,即只要应用程序向这个指定的目录中添加新的文件,source组件就可以获取到该信息,并解析该文件的内容,然后写入到channle。写入完成后,标记该文件已完成或者删除该文件。 其中 Sink:hdfs Channel:file (相比于案例3的两个变化)
编写配置文件
- #name the components on this agent
- a1.sources = r1
- a1.sinks = k1
- a1.channels = c1
- # Describe/configure the source
- a1.sources.r1.type = spooldir
- a1.sources.r1.spoolDir = /usr/local/test/datainput
- a1.sources.r1.fileHeader = true
- a1.sources.r1.interceptors = i1
- a1.sources.r1.interceptors.i1.type = timestamp
- # Describe the sink
- # Describe the sink
- a1.sinks.k1.type = hdfs
- a1.sinks.k1.hdfs.path = hdfs://node-231:8020/user/hdfs/flume/spool/dataoutput
- a1.sinks.k1.hdfs.writeFormat = Text
- a1.sinks.k1.hdfs.fileType = DataStream
- a1.sinks.k1.hdfs.rollInterval = 10
- a1.sinks.k1.hdfs.rollSize = 0
- a1.sinks.k1.hdfs.rollCount = 0
- a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S
- a1.sinks.k1.hdfs.useLocalTimeStamp = true
- # Use a channel which buffers events in file
- a1.channels.c1.type = file
- a1.channels.c1.checkpointDir = /usr/flume/checkpoint
- a1.channels.c1.dataDirs = /usr/flume/data
- # Bind the source and sink to the channel
- a1.sources.r1.channels = c1
- a1.sinks.k1.channel = c1
启动agent
- flume-ng agent -n a1 -c ../conf -f spool2hdfs.conf -Dflume.root.logger=DEBUG,console
向datainput下放入新文件test1.txt
- cp test1.txt datainput
agent日志
- 18/08/01 17:58:41 INFO file.EventQueueBackingStoreFile: Start checkpoint for /usr/flume/checkpoint/checkpoint_1533116333069, elements to sync = 1
- 18/08/01 17:58:41 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1533117491901, queueSize: 0, queueHead: 0
- 18/08/01 17:58:42 INFO file.Log: Updated checkpoint for file: /usr/flume/data/log-3 position: 241 logWriteOrderID: 1533117491901
- 18/08/01 17:58:42 INFO file.LogFile: Closing RandomReader /usr/flume/data/log-1
- 18/08/01 17:58:42 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.
- 18/08/01 17:58:42 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.
- 18/08/01 17:58:43 INFO hdfs.BucketWriter: Closing hdfs://node-231:8020/user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457.tmp
- 18/08/01 17:58:43 INFO hdfs.BucketWriter: Renaming hdfs://node-231:8020/user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457.tmp to hdfs://node-231:8020/user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457
- 18/08/01 17:58:43 INFO hdfs.HDFSEventSink: Writer callback called.
- 18/08/01 17:58:32 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /usr/local/test/datainput/test1.txt to /usr/local/test/datainput/test1.txt.COMPLETED
- 18/08/01 17:58:32 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.
- 18/08/01 17:58:32 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
- 18/08/01 17:58:32 INFO hdfs.BucketWriter: Creating hdfs://node-231:8020/user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457.tmp
查看HDFS
- [root@node-235 bin]# hadoop fs -ls /user/hdfs/flume/spool/dataoutput
- Found 1 items
- -rw-r--r-- 3 root hdfs 12 2018-08-01 17:58 /user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457
- [root@node-235 bin]# hadoop fs -cat /user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457
- hello spool
查看datainput下的文件状态
- [root@node-246 test]# ls datainput/
- test1.txt.COMPLETED test.txt.COMPLETED
案例5:
Exec Source:监听一个指定的命令,获取一条命令的结果作为它的数据源
常用的是tail -F file指令,即只要应用程序向日志(文件)里面写数据,source组件就可以获取到日志(文件)中最新的内容 。 其中 Sink:hdfs Channel:file
这个案列为了方便显示Exec Source的运行效果,结合Hive中的external table进行来说明。
编写配置文件
- # Name the components on this agent
- a1.sources = r1
- a1.sinks = k1
- a1.channels = c1
- # Describe/configure the source
- a1.sources.r1.type = exec
- a1.sources.r1.command = tail -F /usr/local/test/log.file
- # Describe the sink
- a1.sinks.k1.type = hdfs
- a1.sinks.k1.hdfs.path = hdfs://node-231:8020/user/hdfs/flume/exec/dataoutput
- a1.sinks.k1.hdfs.writeFormat = Text
- a1.sinks.k1.hdfs.fileType = DataStream
- a1.sinks.k1.hdfs.rollInterval = 10
- a1.sinks.k1.hdfs.rollSize = 0
- a1.sinks.k1.hdfs.rollCount = 0
- a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S
- a1.sinks.k1.hdfs.useLocalTimeStamp = true
- # Use a channel which buffers events in file
- a1.channels.c1.type = file
- a1.channels.c1.checkpointDir = /usr/flume/checkpoint
- a1.channels.c1.dataDirs = /usr/flume/data
- # Bind the source and sink to the channel
- a1.sources.r1.channels = c1
- a1.sinks.k1.channel = c1
创建hive外部表
- create external table flume_exec_table
- (info String)
- ROW FORMAT DELIMITED
- FIELDS TERMINATED BY '\t'
- STORED AS TEXTFILE
- location '/user/hdfs/flume/exec/dataoutput'
启动agent
- flume-ng agent -n a1 -c ../conf -f exec.conf -Dflume.root.logger=DEBUG,console
使用echo命令向/usr/local/test/log.file中写入数据
- echo firstline=1 >> /usr/local/test/log.file
查看hive中的数据
总结Exec source:Exec source和Spooling Directory Source是两种常用的日志采集的方式,其中Exec source可以实现对日志的实时采集,Spooling Directory Source在对日志的实时采集上稍有欠缺,尽管Exec source可以实现对日志的实时采集,但是当Flume不运行或者指令执行出错时,Exec source将无法收集到日志数据,日志会出现丢失,从而无法保证收集日志的完整性。
案例6:
Avro Source:监听一个指定的Avro 端口,通过Avro 端口可以获取到Avro client发送过来的文件 。即只要应用程序通过Avro 端口发送文件,source组件就可以获取到该文件中的内容。 其中 Sink:hdfs Channel:file
(注:Avro和Thrift都是一些序列化的网络端口–通过这些网络端口可以接受或者发送信息,Avro可以发送一个给定的文件给Flume,Avro 源使用AVRO RPC机制)
Avro Source运行原理如下图:
flume官网中Avro Source的描述:
Property Name Default Description
channels –
type – The component type name, needs to be avro
bind – 日志需要发送到的主机名或者ip,该主机运行着ARVO类型的source
port – 日志需要发送到的端口号,该端口要有ARVO类型的source在监听
编写配置文件
- # Name the components on this agent
- a1.sources = r1
- a1.sinks = k1
- a1.channels = c1
- # Describe/configure the source
- a1.sources.r1.type = avro
- a1.sources.r1.bind = 192.168.1.246
- a1.sources.r1.port = 4141
- # Describe the sink
- a1.sinks.k1.type = hdfs
- a1.sinks.k1.hdfs.path = hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput
- a1.sinks.k1.hdfs.writeFormat = Text
- a1.sinks.k1.hdfs.fileType = DataStream
- a1.sinks.k1.hdfs.rollInterval = 10
- a1.sinks.k1.hdfs.rollSize = 0
- a1.sinks.k1.hdfs.rollCount = 0
- a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S
- a1.sinks.k1.hdfs.useLocalTimeStamp = true
- # Use a channel which buffers events in file
- a1.channels.c1.type = file
- a1.channels.c1.checkpointDir = /usr/flume/checkpoint
- a1.channels.c1.dataDirs = /usr/flume/data
- # Bind the source and sink to the channel
- a1.sources.r1.channels = c1
- a1.sinks.k1.channel = c1
启动agent
- flume-ng agent -n a1 -c ../conf -f avro.conf -Dflume.root.logger=DEBUG,console
使用avro-client发送文件
- flume-ng avro-client -c ../conf -H 192.168.1.246 -p 4141 -F /usr/local/test/log.file
agent日志如下
- 18/08/02 09:56:14 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 => /192.168.1.246:4141] OPEN
- 18/08/02 09:56:14 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 => /192.168.1.246:4141] BOUND: /192.168.1.246:4141
- 18/08/02 09:56:14 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 => /192.168.1.246:4141] CONNECTED: /192.168.1.246:43750
- 18/08/02 09:56:15 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 :> /192.168.1.246:4141] DISCONNECTED
- 18/08/02 09:56:15 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 :> /192.168.1.246:4141] UNBOUND
- 18/08/02 09:56:15 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 :> /192.168.1.246:4141] CLOSED
- 18/08/02 09:56:15 INFO ipc.NettyServer: Connection to /192.168.1.246:43750 disconnected.
- 18/08/02 09:56:18 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
- 18/08/02 09:56:18 INFO hdfs.BucketWriter: Creating hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446.tmp
- 18/08/02 09:56:19 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
- 18/08/02 09:56:19 INFO hdfs.BucketWriter: Creating hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087.tmp
- 18/08/02 09:56:22 INFO file.EventQueueBackingStoreFile: Start checkpoint for /usr/flume/checkpoint/checkpoint_1533116333069, elements to sync = 3
- 18/08/02 09:56:22 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1533174892909, queueSize: 0, queueHead: 1
- 18/08/02 09:56:23 INFO file.Log: Updated checkpoint for file: /usr/flume/data/log-5 position: 361 logWriteOrderID: 1533174892909
- 18/08/02 09:56:23 INFO file.LogFile: Closing RandomReader /usr/flume/data/log-3
- 18/08/02 09:56:29 INFO hdfs.BucketWriter: Closing hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446.tmp
- 18/08/02 09:56:29 INFO hdfs.BucketWriter: Renaming hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446.tmp to hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446
- 18/08/02 09:56:29 INFO hdfs.HDFSEventSink: Writer callback called.
- 18/08/02 09:56:29 INFO hdfs.BucketWriter: Closing hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087.tmp
- 18/08/02 09:56:29 INFO hdfs.BucketWriter: Renaming hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087.tmp to hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087
- 18/08/02 09:56:29 INFO hdfs.HDFSEventSink: Writer callback called.
查看HDFS下文件
- [root@node-231 ~]# hadoop fs -ls /user/hdfs/flume/avro/dataoutput/
- Found 2 items
- -rw-r--r-- 3 root hdfs 12 2018-08-02 09:56 /user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446
- -rw-r--r-- 3 root hdfs 25 2018-08-02 09:56 /user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087
案例7:
syslogtcp
Syslogtcp监听TCP的端口做为数据源
agent配置文件如下
- a1.sources = r1
- a1.sinks = k1
- a1.channels = c1
- # Describe/configure the source
- a1.sources.r1.type = syslogtcp
- a1.sources.r1.port = 5140
- a1.sources.r1.host = 192.168.1.246
- a1.sources.r1.channels = c1
- # Describe the sink
- a1.sinks.k1.type = logger
- # Use a channel which buffers events in memory
- a1.channels.c1.type = memory
- a1.channels.c1.capacity = 1000
- a1.channels.c1.transactionCapacity = 100
- # Bind the source and sink to the channel
- a1.sources.r1.channels = c1
- a1.sinks.k1.channel = c1
启动agent
- flume-ng agent -n a1 -c ../conf -f syslogtcp.conf -Dflume.root.logger=DEBUG,console
产生测试syslog
- echo "test syslogtcp" | nc 192.168.1.246 5140
agent日志
- 18/08/02 11:13:48 WARN source.SyslogUtils: Event created from Invalid Syslog data.
- 18/08/02 11:13:49 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 74 65 73 74 20 73 79 73 6C 6F 67 74 63 70 test syslogtcp }
案例8:
JSONHandler
创建agent配置文件
- a1.sources = r1
- a1.sinks = k1
- a1.channels = c1
- # Describe/configure the source
- a1.sources.r1.type = org.apache.flume.source.http.HTTPSource
- a1.sources.r1.host = 192.168.1.246
- a1.sources.r1.port = 8888
- a1.sources.r1.channels = c1
- # Describe the sink
- a1.sinks.k1.type = logger
- # Use a channel which buffers events in memory
- a1.channels.c1.type = memory
- a1.channels.c1.capacity = 1000
- a1.channels.c1.transactionCapacity = 100
- # Bind the source and sink to the channel
- a1.sources.r1.channels = c1
- a1.sinks.k1.channel = c1
启动agent
- flume-ng agent -n a1 -c ../conf -f httpsource.conf -Dflume.root.logger=DEBUG,console
生成JSON格式的POST request
- curl -X POST -d '[{ "headers" :{"a" : "a1","b" : "b1"},"body" : "idoall.org_body"}]' http://192.168.1.246:8888
agent日志
- 18/08/02 11:28:34 INFO sink.LoggerSink: Event: { headers:{a=a1, b=b1} body: 69 64 6F 61 6C 6C 2E 6F 72 67 5F 62 6F 64 79 idoall.org_body }
案例9:
File Roll Sink
"file_roll"表示将数据存入本地文件系统
创建配置文件
- a1.sources = r1
- a1.sinks = k1
- a1.channels = c1
- # Describe/configure the source
- a1.sources.r1.type = syslogtcp
- a1.sources.r1.port = 5555
- a1.sources.r1.host = 192.168.1.246
- a1.sources.r1.channels = c1
- # Describe the sink
- a1.sinks.k1.type = file_roll
- a1.sinks.k1.sink.directory = /usr/local/test/fileroll
- # Use a channel which buffers events in memory
- a1.channels.c1.type = memory
- a1.channels.c1.capacity = 1000
- a1.channels.c1.transactionCapacity = 100
- # Bind the source and sink to the channel
- a1.sources.r1.channels = c1
- a1.sinks.k1.channel = c1
启动agent
- flume-ng agent -n a1 -c ../conf -f fileroll.conf -Dflume.root.logger=DEBUG,console
测试产生log
- echo "hello idoall.org syslog" | nc 192.168.1.246 5555
agent日志
- 18/08/02 11:53:34 WARN source.SyslogUtils: Event created from Invalid Syslog data.
查看/usr/local/test/fileroll目录下文件
- [root@node-246 fileroll]# ls
- 1533181857932-1 1533181857932-2 1533181857932-3 1533181857932-4 1533181857932-5 1533181857932-6 1533181857932-7
- [root@node-246 fileroll]# cat 1533181857932-6
- hello idoall.org syslog
案例10
Replicating Channel Selector
Flume支持Fan out流从一个源到多个通道。有两种模式的Fan out,分别是复制和复用。在复制的情况下,流的事件被发送到所有的配置通道。在复用的情况下,事件被发送到可用的渠道中的一个子集。Fan out流需要指定源和Fan out通道的规则。
创建replicating_Channel_Selector配置文件
- a1.sources = r1
- a1.sinks = k1 k2
- a1.channels = c1 c2
- # Describe/configure the source
- a1.sources.r1.type = syslogtcp
- a1.sources.r1.port = 5140
- a1.sources.r1.host = 192.168.1.246
- a1.sources.r1.channels = c1 c2
- a1.sources.r1.selector.type = replicating
- # Describe the sink
- a1.sinks.k1.type = avro
- a1.sinks.k1.channel = c1
- a1.sinks.k1.hostname = 192.168.1.246
- a1.sinks.k1.port = 5555
- a1.sinks.k2.type = avro
- a1.sinks.k2.channel = c2
- a1.sinks.k2.hostname = 192.168.1.247
- a1.sinks.k2.port = 5555
- # Use a channel which buffers events in memory
- a1.channels.c1.type = memory
- a1.channels.c1.capacity = 1000
- a1.channels.c1.transactionCapacity = 100
- a1.channels.c2.type = memory
- a1.channels.c2.capacity = 1000
- a1.channels.c2.transactionCapacity = 100
创建replicating_Channel_Selector_avro配置文件
- a1.sources = r1
- a1.sinks = k1
- a1.channels = c1
- # Describe/configure the source
- a1.sources.r1.type = avro
- a1.sources.r1.channels = c1
- a1.sources.r1.bind = 192.168.1.246
- a1.sources.r1.port = 5555
- # Describe the sink
- a1.sinks.k1.type = logger
- # Use a channel which buffers events in memory
- a1.channels.c1.type = memory
- a1.channels.c1.capacity = 1000
- a1.channels.c1.transactionCapacity = 100
- # Bind the source and sink to the channel
- a1.sources.r1.channels = c1
- a1.sinks.k1.channel = c1
将这两个配置文件拷贝到另一台机器247上,并修改配置中的IP
- scp replicating_Channel_Selector* root@node-247:/usr/local/test/flume/
打开四个窗口,分别启动两个agent
- flume-ng agent -n a1 -c ../conf -f replicating_Channel_Selector_avro.conf -Dflume.root.logger=DEBUG,console
- flume-ng agent -n a1 -c ../conf -f replicating_Channel_Selector.conf -Dflume.root.logger=DEBUG,console
测试产生syslog
- echo "hello idoall.org syslog" | nc 192.168.1.246 5140
agent日志
- 18/08/02 14:09:53 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 68 65 6C 6C 6F 20 69 64 6F 61 6C 6C 2E 6F 72 67 hello idoall.org }
案例11
Multiplexing Channel Selector
新建Multiplexing_Channel_Selector配置文件
- a1.sources = r1
- a1.sinks = k1 k2
- a1.channels = c1 c2
- # Describe/configure the source
- a1.sources.r1.type = org.apache.flume.source.http.HTTPSource
- a1.sources.r1.host = 192.168.1.246
- a1.sources.r1.port = 5140
- a1.sources.r1.channels = c1 c2
- a1.sources.r1.selector.type = multiplexing
- a1.sources.r1.selector.header = type
- #映射允许每个值通道可以重叠。默认值可以包含任意数量的通道。
- a1.sources.r1.selector.mapping.baidu = c1
- a1.sources.r1.selector.mapping.ali = c2
- a1.sources.r1.selector.default = c1
- # Describe the sink
- a1.sinks.k1.type = avro
- a1.sinks.k1.channel = c1
- a1.sinks.k1.hostname = 192.168.1.246
- a1.sinks.k1.port = 5555
- a1.sinks.k2.type = avro
- a1.sinks.k2.channel = c2
- a1.sinks.k2.hostname = 192.168.1.247
- a1.sinks.k2.port = 5555
- # Use a channel which buffers events in memory
- a1.channels.c1.type = memory
- a1.channels.c1.capacity = 1000
- a1.channels.c1.transactionCapacity = 100
- a1.channels.c2.type = memory
- a1.channels.c2.capacity = 1000
- a1.channels.c2.transactionCapacity = 100
- 新建Multiplexing_Channel_Selector_avro配置文件
- a1.sources = r1
- a1.sinks = k1
- a1.channels = c1
- # Describe/configure the source
- a1.sources.r1.type = avro
- a1.sources.r1.channels = c1
- a1.sources.r1.bind = 192.168.1.246
- a1.sources.r1.port = 5555
- # Describe the sink
- a1.sinks.k1.type = logger
- # Use a channel which buffers events in memory
- a1.channels.c1.type = memory
- a1.channels.c1.capacity = 1000
- a1.channels.c1.transactionCapacity = 100
- # Bind the source and sink to the channel
- a1.sources.r1.channels = c1
- a1.sinks.k1.channel = c1
将配置文件拷贝到另一个节点,并修改为对应IP
- scp Multiplexing_Channel_Selector* root@192.168.1.247:/usr/local/test/flume/
开启四个窗口,246 247分别两个,分别启动agent
- flume-ng agent -n a1 -c ../conf -f Multiplexing_Channel_Selector_avro.conf -Dflume.root.logger=DEBUG,console
- flume-ng agent -n a1 -c ../conf -f Multiplexing_Channel_Selector.conf -Dflume.root.logger=DEBUG,console
任意节点上,测试产生syslog
- curl -X POST -d '[{ "headers" :{"type" : "baidu"},"body" : "idoall_TEST1"}]' http://192.168.1.247:5140 && curl -X POST -d '[{ "headers" :{"type" : "ali"},"body" : "idoall_TEST2"}]' http://192.168.1.247:5140 && curl -X POST -d '[{ "headers" :{"type" : "qq"},"body" : "idoall_TEST3"}]' http://192.168.1.246:5140
agent日志
246上
- 18/08/02 14:36:04 INFO sink.LoggerSink: Event: { headers:{type=qq} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 33 idoall_TEST3 }
- 18/08/02 14:36:04 INFO sink.LoggerSink: Event: { headers:{type=baidu} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 31 idoall_TEST1 }
247上
- 18/08/02 14:36:06 INFO sink.LoggerSink: Event: { headers:{type=ali} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 32 idoall_TEST2 }
可以看到,根据header中不同的条件分布到不同的channel上
案例12
Flume Sink Procesors
Failover的机器是一直发送给其中一个sink,当这个sink不可用的时候,自动发送到下一个sink
创建Flume_Sink_Processors配置文件
- a1.sources = r1
- a1.sinks = k1 k2
- a1.channels = c1 c2
- #这个是配置failover的关键,需要有一个sink group
- a1.sinkgroups = g1
- a1.sinkgroups.g1.sinks = k1 k2
- #处理的类型是failover
- a1.sinkgroups.g1.processor.type = failover
- #优先级,数字越大优先级越高,每个sink的优先级必须不相同
- a1.sinkgroups.g1.processor.priority.k1 = 5
- a1.sinkgroups.g1.processor.priority.k2 = 10
- #设置为10秒,当然可以根据你的实际状况更改成更快或者很慢
- a1.sinkgroups.g1.processor.maxpenalty = 10000
- # Describe/configure the source
- a1.sources.r1.type = syslogtcp
- a1.sources.r1.host = 192.168.1.246
- a1.sources.r1.port = 5140
- a1.sources.r1.channels = c1 c2
- a1.sources.r1.selector.type = replicating
- # Describe the sink
- a1.sinks.k1.type = avro
- a1.sinks.k1.channel = c1
- a1.sinks.k1.hostname = 192.168.1.246
- a1.sinks.k1.port = 5555
- a1.sinks.k2.type = avro
- a1.sinks.k2.channel = c2
- a1.sinks.k2.hostname = 192.168.1.247
- a1.sinks.k2.port = 5555
- # Use a channel which buffers events in memory
- a1.channels.c1.type = memory
- a1.channels.c1.capacity = 1000
- a1.channels.c1.transactionCapacity = 100
- a1.channels.c2.type = memory
- a1.channels.c2.capacity = 1000
- a1.channels.c2.transactionCapacity = 100
新建Flume_Sink_Processors_avro配置文件
- a1.sources = r1
- a1.sinks = k1
- a1.channels = c1
- # Describe/configure the source
- a1.sources.r1.type = avro
- a1.sources.r1.channels = c1
- a1.sources.r1.bind = 192.168.1.246
- a1.sources.r1.port = 5555
- # Describe the sink
- a1.sinks.k1.type = logger
- # Use a channel which buffers events in memory
- a1.channels.c1.type = memory
- a1.channels.c1.capacity = 1000
- a1.channels.c1.transactionCapacity = 100
- # Bind the source and sink to the channel
- a1.sources.r1.channels = c1
- a1.sinks.k1.channel = c1
将这两个文件拷贝到247节点,并修改对应host
- scp Flume_Sink_Processors* root@192.168.1.247:/usr/local/test/flume/
开启四个窗口,分别启动两个agent
- flume-ng agent -n a1 -c ../conf -f Flume_Sink_Processors_avro.conf -Dflume.root.logger=DEBUG,console
- flume-ng agent -n a1 -c ../conf -f Flume_Sink_Processors.conf -Dflume.root.logger=DEBUG,console
测试产生log
- echo "idoall.org test1 failover" | nc 192.168.1.246 5140
因为247的优先级高,所以在247的sink窗口,可以看到日志
- 18/08/02 15:47:44 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 31 idoall.org test1 }
这时停掉247的sink(Ctrl+c),再次输出测试数据
- echo "idoall.org test1 failover" | nc 192.168.1.246 5140
可以看到246的sink日志
- 18/08/02 15:51:23 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 31 idoall.org test1 }
案例13
Load balancing Sink Processor
load balance type和failover不同的地方是,load balance有两个配置,一个是轮询,一个是随机。两种情况下如果被选择的sink不可用,就会自动尝试发送到下一个可用的sink上面。
新建Load_balancing_Sink_Processors配置文件
- a1.sources = r1
- a1.sinks = k1 k2
- a1.channels = c1
- #这个是配置Load balancing的关键,需要有一个sink group
- a1.sinkgroups = g1
- a1.sinkgroups.g1.sinks = k1 k2
- a1.sinkgroups.g1.processor.type = load_balance
- a1.sinkgroups.g1.processor.backoff = true
- a1.sinkgroups.g1.processor.selector = round_robin
- # Describe/configure the source
- a1.sources.r1.type = syslogtcp
- a1.sources.r1.host = 192.168.1.246
- a1.sources.r1.port = 5140
- a1.sources.r1.channels = c1
- # Describe the sink
- a1.sinks.k1.type = avro
- a1.sinks.k1.channel = c1
- a1.sinks.k1.hostname = 192.168.1.246
- a1.sinks.k1.port = 5555
- a1.sinks.k2.type = avro
- a1.sinks.k2.channel = c1
- a1.sinks.k2.hostname = 192.168.1.247
- a1.sinks.k2.port = 5555
- # Use a channel which buffers events in memory
- a1.channels.c1.type = memory
- a1.channels.c1.capacity = 1000
- a1.channels.c1.transactionCapacity = 100
新建Load_balancing_Sink_Processors_arvo配置文件
- a1.sources = r1
- a1.sinks = k1
- a1.channels = c1
- # Describe/configure the source
- a1.sources.r1.type = avro
- a1.sources.r1.channels = c1
- a1.sources.r1.bind = 192.168.1.246
- a1.sources.r1.port = 5555
- # Describe the sink
- a1.sinks.k1.type = logger
- # Use a channel which buffers events in memory
- a1.channels.c1.type = memory
- a1.channels.c1.capacity = 1000
- a1.channels.c1.transactionCapacity = 100
- # Bind the source and sink to the channel
- a1.sources.r1.channels = c1
- a1.sinks.k1.channel = c1
将这两个文件拷贝到247节点下,并修改IP
- scp Load_balancing_Sink_Processors* root@192.168.1.247:/usr/local/test/flume/
启动四个窗口,启动四个agent
- flume-ng agent -n a1 -c ../conf -f Load_balancing_Sink_Processors_avro.conf -Dflume.root.logger=DEBUG,console
测试产生log
- [root@node-246 ~]# echo "idoall.org test1" | nc 192.168.1.246 5140
- [root@node-246 ~]# echo "idoall.org test2" | nc 192.168.1.246 5140
- [root@node-246 ~]# echo "idoall.org test3" | nc 192.168.1.246 5140
- [root@node-246 ~]# echo "idoall.org test4" | nc 192.168.1.246 5140
247日志
- 18/08/02 18:36:20 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 31 idoall.org test1 }
- 18/08/02 18:36:35 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 32 idoall.org test2 }
- 18/08/02 18:36:58 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 34 idoall.org test4 }
246日志
- 18/08/02 18:36:47 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 33 idoall.org test3 }
案例14
Hbase sink
将hbase lib下文件复制到flume lib下
- protobuf-java-2.5.0.jar
- hbase-client-0.96.2-hadoop2.jar
- hbase-common-0.96.2-hadoop2.jar
- hbase-protocol-0.96.2-hadoop2.jar
- hbase-server-0.96.2-hadoop2.jar
- hbase-hadoop2-compat-0.96.2-hadoop2.jar
- hbase-hadoop-compat-0.96.2-hadoop2.jar
- htrace-core-2.04.jar
- cp protobuf-java-2.5.0.jar hbase-client-1.1.2.2.6.1.0-129.jar hbase-common-1.1.2.2.6.1.0-129.jar hbase-protocol-1.1.2.2.6.1.0-129.jar hbase-server-1.1.2.2.6.1.0-129.jar hbase-hadoop2-compat-1.1.2.2.6.1.0-129.jar hbase-hadoop-compat-1.1.2.2.6.1.0-129.jar htrace-core-3.1.0-incubating.jar /usr/hdp/2.6.1.0-129/flume/lib/
hbase新建表 flume_test 列族name
- hbase(main):003:0> create 'flume_test', 'name'
- 0 row(s) in 2.3900 seconds
- => Hbase::Table - flume_test
新建agent配置文件hbase_simple
- a1.sources = r1
- a1.sinks = k1
- a1.channels = c1
- # Describe/configure the source
- a1.sources.r1.type = syslogtcp
- a1.sources.r1.port = 5140
- a1.sources.r1.host = 192.168.1.246
- a1.sources.r1.channels = c1
- # Describe the sink
- a1.sinks.k1.type = logger
- a1.sinks.k1.type = hbase
- a1.sinks.k1.table = flume_test
- a1.sinks.k1.columnFamily = name
- a1.sinks.k1.column = message
- a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
- a1.sinks.k1.channel = memoryChannel
- # Use a channel which buffers events in memory
- a1.channels.c1.type = memory
- a1.channels.c1.capacity = 1000
- a1.channels.c1.transactionCapacity = 100
- # Bind the source and sink to the channel
- a1.sources.r1.channels = c1
- a1.sinks.k1.channel = c1
启动agent
- flume-ng agent -n a1 -c ../conf -f hbase_simple.conf -Dflume.root.logger=DEBUG,console
产生测试log
- echo "hello zzz.org from flume" | nc 192.168.1.246 5140
agent日志
- 18/08/03 10:01:07 WARN source.SyslogUtils: Event created from Invalid Syslog data.
查看hbase
- hbase(main):006:0> scan 'flume_test'
- ROW COLUMN+CELL
- 1533261667472-IamY4IbgS7-0 column=name:payload, timestamp=1533261670851, value=hello zzz.org from flume
- 1 row(s) in 0.1130 seconds
案例15
使用flume avro采集平台日志
Agent文件如下,采集完成直接写入HDFS
- [root@node-246 flume]# cat avro_tag.conf
- # Name the components on this agent
- a1.sources = r1
- a1.sinks = k1
- a1.channels = c1
- # Describe/configure the source
- a1.sources.r1.type = avro
- a1.sources.r1.bind = 192.168.1.246
- a1.sources.r1.port = 44444
- # Describe the sink
- a1.sinks.k1.type = hdfs
- a1.sinks.k1.hdfs.path = hdfs://node-231:8020/user/hdfs/flume/avro_tag/dataoutput
- a1.sinks.k1.hdfs.writeFormat = Text
- a1.sinks.k1.hdfs.fileType = DataStream
- a1.sinks.k1.hdfs.rollInterval = 10
- a1.sinks.k1.hdfs.rollSize = 0
- a1.sinks.k1.hdfs.rollCount = 0
- a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S
- a1.sinks.k1.hdfs.useLocalTimeStamp = true
- # Use a channel which buffers events in file
- a1.channels.c1.type = file
- a1.channels.c1.checkpointDir = /usr/flume/checkpoint
- a1.channels.c1.dataDirs = /usr/flume/data
- # Bind the source and sink to the channel
- a1.sources.r1.channels = c1
- a1.sinks.k1.channel = c1
平台需要引入的包
- <dependency>
- <groupId>org.apache.logging.log4j</groupId>
- <artifactId>log4j-flume-ng</artifactId>
- <version>${log4j.version}</version>
- </dependency>
- <dependency>
- <groupId>org.apache.flume.flume-ng-clients</groupId>
- <artifactId>flume-ng-log4jappender</artifactId>
- <version>1.8.0</version>
- </dependency>
- <!-- log4j-core -->
- <dependency>
- <groupId>org.apache.logging.log4j</groupId>
- <artifactId>log4j-core</artifactId>
- <version>${log4j.version}</version>
- </dependency>
- <!-- log4j-api -->
- <dependency>
- <groupId>org.apache.logging.log4j</groupId>
- <artifactId>log4j-api</artifactId>
- <version>${log4j.version}</version>
- </dependency>
- <!-- log4j-web -->
- <dependency>
- <groupId>org.apache.logging.log4j</groupId>
- <artifactId>log4j-web</artifactId>
- <version>${log4j.version}</version>
- </dependency>
log4j2.xml,如下
- <?xml version="1.0" encoding="UTF-8"?>
- <!--日志级别以及优先级排序: OFF > FATAL > ERROR > WARN > INFO > DEBUG > TRACE > ALL -->
- <!--Configuration后面的status,这个用于设置log4j2自身内部的信息输出,可以不设置,当设置成trace时,你会看到log4j2内部各种详细输出-->
- <!--monitorInterval:Log4j能够自动检测修改配置 文件和重新配置本身,设置间隔秒数-->
- <configuration status="INFO" monitorInterval="30">
- <properties>
- <property name="LOG_HOME">../logs</property>
- <property name="TMP_LOG_FILE_NAME">tmp</property>
- <property name="INFO_LOG_FILE_NAME">info</property>
- <property name="WARN_LOG_FILE_NAME">warn</property>
- <property name="ERROR_LOG_FILE_NAME">error</property>
- </properties>
- <!--先定义所有的appender-->
- <appenders>
- <!--这个输出控制台的配置-->
- <console name="Console" target="SYSTEM_OUT">
- <!--输出日志的格式-->
- <PatternLayout pattern="[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n"/>
- </console>
- <!--文件会打印出所有信息,这个log每次运行程序会自动清空,由append属性决定,这个也挺有用的,适合临时测试用-->
- <File name="log" fileName="${LOG_HOME}/${TMP_LOG_FILE_NAME}.log" append="false">
- <PatternLayout pattern="%d{HH:mm:ss.SSS} %-5level %class{36} %L %M - %msg%xEx%n"/>
- </File>
- <!-- 这个会打印出所有的info及以下级别的信息,每次大小超过size,则这size大小的日志会自动存入按年份-月份建立的文件夹下面并进行压缩,作为存档-->
- <RollingFile name="RollingFileInfo" fileName="${LOG_HOME}/${INFO_LOG_FILE_NAME}.log"
- filePattern="${LOG_HOME}/${INFO_LOG_FILE_NAME}-%d{yyyy-MM-dd}-%i.log">
- <!--控制台只输出level及以上级别的信息(onMatch),其他的直接拒绝(onMismatch)-->
- <ThresholdFilter level="info" onMatch="ACCEPT" onMismatch="DENY"/>
- <PatternLayout pattern="[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n"/>
- <Policies>
- <TimeBasedTriggeringPolicy/>
- <SizeBasedTriggeringPolicy size="100 MB"/>
- </Policies>
- </RollingFile>
- <RollingFile name="RollingFileWarn" fileName="${LOG_HOME}/${WARN_LOG_FILE_NAME}.log"
- filePattern="${LOG_HOME}/${WARN_LOG_FILE_NAME}-%d{yyyy-MM-dd}-%i.log">
- <ThresholdFilter level="warn" onMatch="ACCEPT" onMismatch="DENY"/>
- <PatternLayout pattern="[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n"/>
- <Policies>
- <TimeBasedTriggeringPolicy/>
- <SizeBasedTriggeringPolicy size="100 MB"/>
- </Policies>
- <!-- DefaultRolloverStrategy属性如不设置,则默认为最多同一文件夹下7个文件,这里设置了20 -->
- <DefaultRolloverStrategy max="20"/>
- </RollingFile>
- <RollingFile name="RollingFileError" fileName="${LOG_HOME}/${ERROR_LOG_FILE_NAME}.log"
- filePattern="${LOG_HOME}/${ERROR_LOG_FILE_NAME}-%d{yyyy-MM-dd}-%i.log">
- <ThresholdFilter level="error" onMatch="ACCEPT" onMismatch="DENY"/>
- <PatternLayout pattern="[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n"/>
- <Policies>
- <TimeBasedTriggeringPolicy/>
- <SizeBasedTriggeringPolicy size="100 MB"/>
- </Policies>
- </RollingFile>
- <!-- flume配置 -->
- <Flume name="FlumeAppender" compress="true">
- <Agent host="192.168.1.246" port="44444"/>
- <!-- <RFC5424Layout charset="UTF-8" enterpriseNumber="18060" includeMDC="true" appName="myapp"/> -->
- <PatternLayout charset="GBK" pattern="[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n" />
- </Flume>
- </appenders>
- <!--然后定义logger,只有定义了logger并引入的appender,appender才会生效-->
- <loggers>
- <!--过滤掉spring和mybatis的一些无用的DEBUG信息-->
- <logger name="org.springframework" level="INFO"></logger>
- <logger name="org.mybatis" level="INFO"></logger>
- <!-- <Logger name="sysLog" level="trace">
- <AppenderRef ref="FlumeAppender"/>
- </Logger> -->
- <root level="info">
- <appender-ref ref="Console"/>
- <appender-ref ref="RollingFileInfo"/>
- <appender-ref ref="RollingFileWarn"/>
- <appender-ref ref="RollingFileError"/>
- <!-- 日志写入flume source -->
- <appenderRef ref="FlumeAppender"/>
- </root>
- </loggers>
- </configuration>
启动agent
- flume-ng agent -n a1 -c ../conf -f avro_tag.conf -Dflume.root.logger=DEBUG,console
启动项目之后就会将日志信息通过flume写入HDFS
Flume具体应用(多案例)的更多相关文章
- Flume 高可用配置案例+load balance负载均衡+ 案例:日志的采集及汇总
高可用配置案例 (一).failover故障转移 在完成单点的Flume NG搭建后,下面我们搭建一个高可用的Flume NG集群,架构图如下所示: (1)节点分配 Flume的Agent和Colle ...
- 第1节 flume:6、flume的入门测试案例
案例:使用网络telent命令向一台机器发送一些网络数据,然后通过flume采集网络端口数据. 1.2.1 Flume的安装部署 第一步:下载解压修改配置文件 Flume的安装非常简单,只需要解压即可 ...
- Flume系列二之案例实战
Flume案例实战 写在前面 通过前面一篇文章http://blog.csdn.net/liuge36/article/details/78589505的介绍我们已经知道flume到底是什么?flum ...
- 日志采集框架Flume以及Flume的安装部署(一个分布式、可靠、和高可用的海量日志采集、聚合和传输的系统)
Flume支持众多的source和sink类型,详细手册可参考官方文档,更多source和sink组件 http://flume.apache.org/FlumeUserGuide.html Flum ...
- Flume篇---Flume安装配置与相关使用
一.前述 Copy过来一段介绍Apache Flume 是一个从可以收集例如日志,事件等数据资源,并将这些数量庞大的数据从各项数据资源中集中起来存储的工具/服务,或者数集中机制.flume具有高可用, ...
- nginx+ flume
nginx 作用: 做负载均衡 nginx和lvs的区别:nginx可以做反向代理 1.上传nginx安装包 tar -zxvf tengine-2.1.02.安装环境 依赖 gcc opens ...
- flume学习笔记
#################################################################################################### ...
- flume介绍及应用
版权声明:本文为yunshuxueyuan原创文章.如需转载请标明出处: http://www.cnblogs.com/sxt-zkys/QQ技术交流群:299142667 flume的概念 1. ...
- Flume系列一之架构介绍和安装
Flume架构介绍和安装 写在前面 在学习一门新的技术之前,我们得知道了解这个东西有什么用?我们可以使用它来做些什么呢?简单来说,flume是大数据日志分析中不能缺少的一个组件,既可以使用在流处理中, ...
随机推荐
- Android 版本更新升级
推荐一款很好的版本升级开源框架: https://github.com/WVector/AppUpdate 个人地址:总结https://gitee.com/anan9303/AppVersionUp ...
- (转载)java提高篇(五)-----抽象类与接口
接口和内部类为我们提供了一种将接口与实现分离的更加结构化的方法. 本文是转载的(尊重原著),原文地址:http://www.cnblogs.com/chenssy/p/3376708.html 抽象类 ...
- 1718 Cos的多项式
1718 Cos的多项式 基准时间限制:1 秒 空间限制:131072 KB 分值: 40 难度:4级算法题 小明对三角函数充满了兴趣,有一天他突然发现一个神奇的性质. 2cos(nx)似乎可以表示成 ...
- 封装AJax实现JSON前台与后台交互
实践技术点:1.AJax自定义封装 2.后台序列化与反序列化JSON 3.客户端解析JSON字符串,处理DOM 实现代码如下: 1.JS脚本代码: 1 /*** NOTE:AJAX处理JS TIM ...
- C#关于AutoResetEvent的使用介绍----修正
说明 之前在博客园看到有位仁兄发表一篇关于AutoResetEvent介绍,看了下他写的代码,看上去没什么问题,但仔细看还是能发现问题.下图是这位仁兄代码截图. 仁兄博客地址:http://www.c ...
- mixin 在传参中可以出现 参数 在类内部可以定义 作用域
mixin 在传参中可以出现 参数 在类内部可以定义
- mix-in class selectors
语言特性 | Less 中文网 http://lesscss.cn/features/#mixins-feature Mixins "mix-in" properties from ...
- _utf8_encode _utf8_decode base64_encode base64_decode
const Base64 = { // private property _keyStr: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuv ...
- AJAX 入门
1. 同步交互与异步交互 1.1 同步交互 客户端发送一个请求, 需要等待服务器的响应结束,才能发送第二个请求! 刷新的是整个页面. 1.2 异步交互 客户端发送一个请求,无需等待服务器的响应,然后就 ...
- 通过jdt解析spring mvc中url-类-方法的对应关系
依赖 <dependencies> <dependency> <groupId>org.eclipse.jdt</groupId> <artifa ...