日志采集

对于flume的原理其实很容易理解,我们更应该掌握flume的具体使用方法,flume提供了大量内置的Source、Channel和Sink类型。而且不同类型的Source、Channel和Sink可以自由组合—–组合方式基于用户设置的配置文件,非常灵活。比如:Channel可以把事件暂存在内存里,也可以持久化到本地硬盘上。Sink可以把日志写入HDFS, HBase,甚至是另外一个Source等等。下面我将用具体的案例详述flume的具体用法。
其实flume的用法很简单—-书写一个配置文件,在配置文件当中描述source、channel与sink的具体实现,而后运行一个agent实例,在运行agent实例的过程中会读取配置文件的内容,这样flume就会采集到数据。
配置文件的编写原则:
1>从整体上描述代理agent中sources、sinks、channels所涉及到的组件
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
2>详细描述agent中每一个source、sink与channel的具体实现:即在描述source的时候,需要
指定source到底是什么类型的,即这个source是接受文件的、还是接受http的、还是接受thrift
的;对于sink也是同理,需要指定结果是输出到HDFS中,还是Hbase中啊等等;对于channel
需要指定是内存啊,还是数据库啊,还是文件啊等等。
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

3>通过channel将source与sink连接起来
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent的shell操作:
flume-ng agent -n a1 -c ../conf -f ../conf/example.file
-Dflume.root.logger=DEBUG,console

参数说明: -n 指定agent名称(与配置文件中代理的名字相同)
-c 指定flume中配置文件的目录
-f 指定配置文件
-Dflume.root.logger=DEBUG,console 设置日志等级

案例1

NetCat Source:监听一个指定的网络端口,即只要应用程序向这个端口里面写数据,这个source组件就可以获取到信息。 其中 Sink:logger Channel:memory
flume官网中NetCat Source描述

Property Name Default Description
channels –
type – The component type name, needs to be netcat
bind – 日志需要发送到的主机名或者Ip地址,该主机运行着netcat类型的source在监听
port – 日志需要发送到的端口号,该端口号要有netcat类型的source在监听

编写配置文件

  1. # Name the components on this agent
  2. a1.sources = r1
  3. a1.sinks = k1
  4. a1.channels = c1
  5.  
  6. # Describe/configure the source
  7. a1.sources.r1.type = netcat
  8. a1.sources.r1.bind = 192.168.1.246
  9. a1.sources.r1.port = 44444
  10.  
  11. # Describe the sink
  12. a1.sinks.k1.type = logger
  13.  
  14. # Use a channel which buffers events in memory
  15. a1.channels.c1.type = memory
  16. a1.channels.c1.capacity = 1000
  17. a1.channels.c1.transactionCapacity = 100
  18.  
  19. # Bind the source and sink to the channel
  20. a1.sources.r1.channels = c1
  21. a1.sinks.k1.channel = c1

启动agent

  1. flume-ng agent -n a1 -c ../conf -f netcat.conf -Dflume.root.logger=DEBUG,console

使用telnet发送数据

  1. [root@node-247 ~]# telnet 192.168.1.246 44444
  2. Trying 192.168.1.246...
  3. Connected to 192.168.1.246.
  4. Escape character is '^]'.
  5. 111111
  6. OK

在agent节点查看输出

  1. 18/08/01 17:32:21 INFO sink.LoggerSink: Event: { headers:{} body: 31 31 31 31 31 31 0D 111111. }

案例2:

NetCat Source:监听一个指定的网络端口,即只要应用程序向这个端口里面写数据,这个source组件就可以获取到信息。 其中 Sink:hdfs Channel:file (相比于案例1的两个变化)
flume官网中HDFS Sink的描述:

编写配置文件

  1. # Name the components on this agent
  2. a1.sources = r1
  3. a1.sinks = k1
  4. a1.channels = c1
  5.  
  6. # Describe/configure the source
  7. a1.sources.r1.type = netcat
  8. a1.sources.r1.bind = 192.168.1.246
  9. a1.sources.r1.port = 44444
  10.  
  11. # Describe the sink
  12. a1.sinks.k1.type = hdfs
  13. a1.sinks.k1.hdfs.path = hdfs://node-231:8020/user/hdfs/flume/netcat
  14. a1.sinks.k1.hdfs.writeFormat = Text
  15. a1.sinks.k1.hdfs.fileType = DataStream
  16. a1.sinks.k1.hdfs.rollInterval = 10
  17. a1.sinks.k1.hdfs.rollSize = 0
  18. a1.sinks.k1.hdfs.rollCount = 0
  19. a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S
  20. a1.sinks.k1.hdfs.useLocalTimeStamp = true
  21.  
  22. # Use a channel which buffers events in file
  23. a1.channels.c1.type = file
  24. a1.channels.c1.checkpointDir = /usr/flume/checkpoint
  25. a1.channels.c1.dataDirs = /usr/flume/data
  26.  
  27. # Bind the source and sink to the channel
  28. a1.sources.r1.channels = c1
  29. a1.sinks.k1.channel = c1

启动agent

  1. flume-ng agent -n a1 -c ../conf -f netcat2hdfs.conf -Dflume.root.logger=DEBUG,console

telnet发送数据

  1. [root@node-247 ~]# telnet 192.168.1.246 44444
  2. Trying 192.168.1.246...
  3. Connected to 192.168.1.246.
  4. Escape character is '^]'.
  5. write to hdfs
  6. OK

Agent节点日志信息

  1. 18/08/01 17:39:28 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
  2. 18/08/01 17:39:28 INFO hdfs.BucketWriter: Creating hdfs://node-231:8020/user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959.tmp
  3. 18/08/01 17:39:39 INFO hdfs.BucketWriter: Closing hdfs://node-231:8020/user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959.tmp
  4. 18/08/01 17:39:39 INFO hdfs.BucketWriter: Renaming hdfs://node-231:8020/user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959.tmp to hdfs://node-231:8020/user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959
  5. 18/08/01 17:39:39 INFO hdfs.HDFSEventSink: Writer callback called.
  6. 18/08/01 17:39:53 INFO file.EventQueueBackingStoreFile: Start checkpoint for /usr/flume/checkpoint/checkpoint_1533116333069, elements to sync = 1
  7. 18/08/01 17:39:53 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1533116333542, queueSize: 0, queueHead: 0
  8. 18/08/01 17:39:53 INFO file.Log: Updated checkpoint for file: /usr/flume/data/log-1 position: 171 logWriteOrderID: 1533116333542

写入成功,验证

  1. [root@node-235 bin]# hadoop fs -ls /user/hdfs/flume/netcat/
  2. Found 1 items
  3. -rw-r--r-- 3 root hdfs 15 2018-08-01 17:39 /user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959
  4.  
  5. [root@node-235 bin]# hadoop fs -cat /user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959
  6. write to hdfs

再次telnet发送数据会发现HDFS目录下会有两个数据文件

案例3:

Spooling Directory Source:监听一个指定的目录,即只要应用程序向这个指定的目录中添加新的文件,source组件就可以获取到该信息,并解析该文件的内容,然后写入到channle。写入完成后,标记该文件已完成或者删除该文件。其中 Sink:logger Channel:memory
flume官网中Spooling Directory Source描述:
Property Name Default Description
channels –
type – The component type name, needs to be spooldir.
spoolDir – Spooling Directory Source监听的目录
fileSuffix .COMPLETED 文件内容写入到channel之后,标记该文件
deletePolicy never 文件内容写入到channel之后的删除策略: never or immediate
fileHeader false Whether to add a header storing the absolute path filename.
ignorePattern ^$ Regular expression specifying which files to ignore (skip)
interceptors – 指定传输中event的head(头信息),常用timestamp

Spooling Directory Source的两个注意事项:
①If a file is written to after being placed into the spooling directory, Flume will print an error to its log file and stop processing.
即:拷贝到spool目录下的文件不可以再打开编辑
②If a file name is reused at a later time, Flume will print an error to its log file and stop processing.
即:不能将具有相同文件名字的文件拷贝到这个目录下

编写配置文件

  1. # Name the components on this agent
  2. a1.sources = r1
  3. a1.sinks = k1
  4. a1.channels = c1
  5.  
  6. # Describe/configure the source
  7. a1.sources.r1.type = spooldir
  8. a1.sources.r1.spoolDir = /usr/local/test/datainput
  9. a1.sources.r1.fileHeader = true
  10. a1.sources.r1.interceptors = i1
  11. a1.sources.r1.interceptors.i1.type = timestamp
  12.  
  13. # Describe the sink
  14. a1.sinks.k1.type = logger
  15.  
  16. # Use a channel which buffers events in memory
  17. a1.channels.c1.type = memory
  18. a1.channels.c1.capacity = 1000
  19. a1.channels.c1.transactionCapacity = 100
  20.  
  21. # Bind the source and sink to the channel
  22. a1.sources.r1.channels = c1
  23. a1.sinks.k1.channel = c1

启动agent

  1. flume-ng agent -n a1 -c ../conf -f spool.conf -Dflume.root.logger=DEBUG,console

在该路径下放入测试文件,内容为hello spool

  1. cp test.txt datainput/

agent日志

  1. 18/08/01 17:52:48 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /usr/local/test/datainput/test.txt to /usr/local/test/datainput/test.txt.COMPLETED
  2. 18/08/01 17:52:48 INFO sink.LoggerSink: Event: { headers:{file=/usr/local/test/datainput/test.txt, timestamp=1533117168275} body: 68 65 6C 6C 6F 20 73 70 6F 6F 6C hello spool }

从控制台显示的结果可以看出event的头信息中包含了时间戳信息。
同时我们查看一下Spooling Directory中的datafile信息—-文件内容写入到channel之后,该文件被标记了

  1. [root@node-246 test]# ls datainput/
  2. test.txt.COMPLETED

案例4:

Spooling Directory Source:监听一个指定的目录,即只要应用程序向这个指定的目录中添加新的文件,source组件就可以获取到该信息,并解析该文件的内容,然后写入到channle。写入完成后,标记该文件已完成或者删除该文件。 其中 Sink:hdfs Channel:file (相比于案例3的两个变化)
编写配置文件

  1. #name the components on this agent
  2. a1.sources = r1
  3. a1.sinks = k1
  4. a1.channels = c1
  5.  
  6. # Describe/configure the source
  7. a1.sources.r1.type = spooldir
  8. a1.sources.r1.spoolDir = /usr/local/test/datainput
  9. a1.sources.r1.fileHeader = true
  10. a1.sources.r1.interceptors = i1
  11. a1.sources.r1.interceptors.i1.type = timestamp
  12.  
  13. # Describe the sink
  14. # Describe the sink
  15. a1.sinks.k1.type = hdfs
  16. a1.sinks.k1.hdfs.path = hdfs://node-231:8020/user/hdfs/flume/spool/dataoutput
  17. a1.sinks.k1.hdfs.writeFormat = Text
  18. a1.sinks.k1.hdfs.fileType = DataStream
  19. a1.sinks.k1.hdfs.rollInterval = 10
  20. a1.sinks.k1.hdfs.rollSize = 0
  21. a1.sinks.k1.hdfs.rollCount = 0
  22. a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S
  23. a1.sinks.k1.hdfs.useLocalTimeStamp = true
  24.  
  25. # Use a channel which buffers events in file
  26. a1.channels.c1.type = file
  27. a1.channels.c1.checkpointDir = /usr/flume/checkpoint
  28. a1.channels.c1.dataDirs = /usr/flume/data
  29.  
  30. # Bind the source and sink to the channel
  31. a1.sources.r1.channels = c1
  32. a1.sinks.k1.channel = c1

启动agent

  1. flume-ng agent -n a1 -c ../conf -f spool2hdfs.conf -Dflume.root.logger=DEBUG,console

向datainput下放入新文件test1.txt

  1. cp test1.txt datainput

agent日志

  1. 18/08/01 17:58:41 INFO file.EventQueueBackingStoreFile: Start checkpoint for /usr/flume/checkpoint/checkpoint_1533116333069, elements to sync = 1
  2. 18/08/01 17:58:41 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1533117491901, queueSize: 0, queueHead: 0
  3. 18/08/01 17:58:42 INFO file.Log: Updated checkpoint for file: /usr/flume/data/log-3 position: 241 logWriteOrderID: 1533117491901
  4. 18/08/01 17:58:42 INFO file.LogFile: Closing RandomReader /usr/flume/data/log-1
  5. 18/08/01 17:58:42 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.
  6. 18/08/01 17:58:42 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.
  7. 18/08/01 17:58:43 INFO hdfs.BucketWriter: Closing hdfs://node-231:8020/user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457.tmp
  8. 18/08/01 17:58:43 INFO hdfs.BucketWriter: Renaming hdfs://node-231:8020/user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457.tmp to hdfs://node-231:8020/user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457
  9. 18/08/01 17:58:43 INFO hdfs.HDFSEventSink: Writer callback called.
  10. 18/08/01 17:58:32 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /usr/local/test/datainput/test1.txt to /usr/local/test/datainput/test1.txt.COMPLETED
  11. 18/08/01 17:58:32 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.
  12. 18/08/01 17:58:32 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
  13. 18/08/01 17:58:32 INFO hdfs.BucketWriter: Creating hdfs://node-231:8020/user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457.tmp

查看HDFS

  1. [root@node-235 bin]# hadoop fs -ls /user/hdfs/flume/spool/dataoutput
  2. Found 1 items
  3. -rw-r--r-- 3 root hdfs 12 2018-08-01 17:58 /user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457
  4. [root@node-235 bin]# hadoop fs -cat /user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457
  5. hello spool

查看datainput下的文件状态

  1. [root@node-246 test]# ls datainput/
  2. test1.txt.COMPLETED test.txt.COMPLETED

案例5:

Exec Source:监听一个指定的命令,获取一条命令的结果作为它的数据源
常用的是tail -F file指令,即只要应用程序向日志(文件)里面写数据,source组件就可以获取到日志(文件)中最新的内容 。 其中 Sink:hdfs Channel:file
这个案列为了方便显示Exec Source的运行效果,结合Hive中的external table进行来说明。
编写配置文件

  1. # Name the components on this agent
  2. a1.sources = r1
  3. a1.sinks = k1
  4. a1.channels = c1
  5.  
  6. # Describe/configure the source
  7. a1.sources.r1.type = exec
  8. a1.sources.r1.command = tail -F /usr/local/test/log.file
  9.  
  10. # Describe the sink
  11. a1.sinks.k1.type = hdfs
  12. a1.sinks.k1.hdfs.path = hdfs://node-231:8020/user/hdfs/flume/exec/dataoutput
  13. a1.sinks.k1.hdfs.writeFormat = Text
  14. a1.sinks.k1.hdfs.fileType = DataStream
  15. a1.sinks.k1.hdfs.rollInterval = 10
  16. a1.sinks.k1.hdfs.rollSize = 0
  17. a1.sinks.k1.hdfs.rollCount = 0
  18. a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S
  19. a1.sinks.k1.hdfs.useLocalTimeStamp = true
  20.  
  21. # Use a channel which buffers events in file
  22. a1.channels.c1.type = file
  23. a1.channels.c1.checkpointDir = /usr/flume/checkpoint
  24. a1.channels.c1.dataDirs = /usr/flume/data
  25.  
  26. # Bind the source and sink to the channel
  27. a1.sources.r1.channels = c1
  28. a1.sinks.k1.channel = c1

创建hive外部表

  1. create external table flume_exec_table
  2. (info String)
  3. ROW FORMAT DELIMITED
  4. FIELDS TERMINATED BY '\t'
  5. STORED AS TEXTFILE
  6. location '/user/hdfs/flume/exec/dataoutput'

启动agent

  1. flume-ng agent -n a1 -c ../conf -f exec.conf -Dflume.root.logger=DEBUG,console

使用echo命令向/usr/local/test/log.file中写入数据

  1. echo firstline=1 >> /usr/local/test/log.file

查看hive中的数据

总结Exec source:Exec source和Spooling Directory Source是两种常用的日志采集的方式,其中Exec source可以实现对日志的实时采集,Spooling Directory Source在对日志的实时采集上稍有欠缺,尽管Exec source可以实现对日志的实时采集,但是当Flume不运行或者指令执行出错时,Exec source将无法收集到日志数据,日志会出现丢失,从而无法保证收集日志的完整性。

案例6:

Avro Source:监听一个指定的Avro 端口,通过Avro 端口可以获取到Avro client发送过来的文件 。即只要应用程序通过Avro 端口发送文件,source组件就可以获取到该文件中的内容。 其中 Sink:hdfs Channel:file
(注:Avro和Thrift都是一些序列化的网络端口–通过这些网络端口可以接受或者发送信息,Avro可以发送一个给定的文件给Flume,Avro 源使用AVRO RPC机制)
Avro Source运行原理如下图:

flume官网中Avro Source的描述:
Property Name Default Description
channels –
type – The component type name, needs to be avro
bind – 日志需要发送到的主机名或者ip,该主机运行着ARVO类型的source
port – 日志需要发送到的端口号,该端口要有ARVO类型的source在监听

编写配置文件

  1. # Name the components on this agent
  2. a1.sources = r1
  3. a1.sinks = k1
  4. a1.channels = c1
  5.  
  6. # Describe/configure the source
  7. a1.sources.r1.type = avro
  8. a1.sources.r1.bind = 192.168.1.246
  9. a1.sources.r1.port = 4141
  10.  
  11. # Describe the sink
  12. a1.sinks.k1.type = hdfs
  13. a1.sinks.k1.hdfs.path = hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput
  14. a1.sinks.k1.hdfs.writeFormat = Text
  15. a1.sinks.k1.hdfs.fileType = DataStream
  16. a1.sinks.k1.hdfs.rollInterval = 10
  17. a1.sinks.k1.hdfs.rollSize = 0
  18. a1.sinks.k1.hdfs.rollCount = 0
  19. a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S
  20. a1.sinks.k1.hdfs.useLocalTimeStamp = true
  21.  
  22. # Use a channel which buffers events in file
  23. a1.channels.c1.type = file
  24. a1.channels.c1.checkpointDir = /usr/flume/checkpoint
  25. a1.channels.c1.dataDirs = /usr/flume/data
  26.  
  27. # Bind the source and sink to the channel
  28. a1.sources.r1.channels = c1
  29. a1.sinks.k1.channel = c1

启动agent

  1. flume-ng agent -n a1 -c ../conf -f avro.conf -Dflume.root.logger=DEBUG,console

使用avro-client发送文件

  1. flume-ng avro-client -c ../conf -H 192.168.1.246 -p 4141 -F /usr/local/test/log.file

agent日志如下

  1. 18/08/02 09:56:14 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 => /192.168.1.246:4141] OPEN
  2. 18/08/02 09:56:14 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 => /192.168.1.246:4141] BOUND: /192.168.1.246:4141
  3. 18/08/02 09:56:14 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 => /192.168.1.246:4141] CONNECTED: /192.168.1.246:43750
  4. 18/08/02 09:56:15 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 :> /192.168.1.246:4141] DISCONNECTED
  5. 18/08/02 09:56:15 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 :> /192.168.1.246:4141] UNBOUND
  6. 18/08/02 09:56:15 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 :> /192.168.1.246:4141] CLOSED
  7. 18/08/02 09:56:15 INFO ipc.NettyServer: Connection to /192.168.1.246:43750 disconnected.
  8. 18/08/02 09:56:18 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
  9. 18/08/02 09:56:18 INFO hdfs.BucketWriter: Creating hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446.tmp
  10. 18/08/02 09:56:19 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
  11. 18/08/02 09:56:19 INFO hdfs.BucketWriter: Creating hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087.tmp
  12. 18/08/02 09:56:22 INFO file.EventQueueBackingStoreFile: Start checkpoint for /usr/flume/checkpoint/checkpoint_1533116333069, elements to sync = 3
  13. 18/08/02 09:56:22 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1533174892909, queueSize: 0, queueHead: 1
  14. 18/08/02 09:56:23 INFO file.Log: Updated checkpoint for file: /usr/flume/data/log-5 position: 361 logWriteOrderID: 1533174892909
  15. 18/08/02 09:56:23 INFO file.LogFile: Closing RandomReader /usr/flume/data/log-3
  16. 18/08/02 09:56:29 INFO hdfs.BucketWriter: Closing hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446.tmp
  17. 18/08/02 09:56:29 INFO hdfs.BucketWriter: Renaming hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446.tmp to hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446
  18. 18/08/02 09:56:29 INFO hdfs.HDFSEventSink: Writer callback called.
  19. 18/08/02 09:56:29 INFO hdfs.BucketWriter: Closing hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087.tmp
  20. 18/08/02 09:56:29 INFO hdfs.BucketWriter: Renaming hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087.tmp to hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087
  21. 18/08/02 09:56:29 INFO hdfs.HDFSEventSink: Writer callback called.

查看HDFS下文件

  1. [root@node-231 ~]# hadoop fs -ls /user/hdfs/flume/avro/dataoutput/
  2. Found 2 items
  3. -rw-r--r-- 3 root hdfs 12 2018-08-02 09:56 /user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446
  4. -rw-r--r-- 3 root hdfs 25 2018-08-02 09:56 /user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087

案例7:

syslogtcp
Syslogtcp监听TCP的端口做为数据源

agent配置文件如下

  1. a1.sources = r1
  2. a1.sinks = k1
  3. a1.channels = c1
  4.  
  5. # Describe/configure the source
  6. a1.sources.r1.type = syslogtcp
  7. a1.sources.r1.port = 5140
  8. a1.sources.r1.host = 192.168.1.246
  9. a1.sources.r1.channels = c1
  10.  
  11. # Describe the sink
  12. a1.sinks.k1.type = logger
  13.  
  14. # Use a channel which buffers events in memory
  15. a1.channels.c1.type = memory
  16. a1.channels.c1.capacity = 1000
  17. a1.channels.c1.transactionCapacity = 100
  18.  
  19. # Bind the source and sink to the channel
  20. a1.sources.r1.channels = c1
  21. a1.sinks.k1.channel = c1

启动agent

  1. flume-ng agent -n a1 -c ../conf -f syslogtcp.conf -Dflume.root.logger=DEBUG,console

产生测试syslog

  1. echo "test syslogtcp" | nc 192.168.1.246 5140

agent日志

  1. 18/08/02 11:13:48 WARN source.SyslogUtils: Event created from Invalid Syslog data.
  2. 18/08/02 11:13:49 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 74 65 73 74 20 73 79 73 6C 6F 67 74 63 70 test syslogtcp }

案例8:

JSONHandler
创建agent配置文件

  1. a1.sources = r1
  2. a1.sinks = k1
  3. a1.channels = c1
  4.  
  5. # Describe/configure the source
  6. a1.sources.r1.type = org.apache.flume.source.http.HTTPSource
  7. a1.sources.r1.host = 192.168.1.246
  8. a1.sources.r1.port = 8888
  9. a1.sources.r1.channels = c1
  10.  
  11. # Describe the sink
  12. a1.sinks.k1.type = logger
  13.  
  14. # Use a channel which buffers events in memory
  15. a1.channels.c1.type = memory
  16. a1.channels.c1.capacity = 1000
  17. a1.channels.c1.transactionCapacity = 100
  18.  
  19. # Bind the source and sink to the channel
  20. a1.sources.r1.channels = c1
  21. a1.sinks.k1.channel = c1

启动agent

  1. flume-ng agent -n a1 -c ../conf -f httpsource.conf -Dflume.root.logger=DEBUG,console

生成JSON格式的POST request

  1. curl -X POST -d '[{ "headers" :{"a" : "a1","b" : "b1"},"body" : "idoall.org_body"}]' http://192.168.1.246:8888

agent日志

  1. 18/08/02 11:28:34 INFO sink.LoggerSink: Event: { headers:{a=a1, b=b1} body: 69 64 6F 61 6C 6C 2E 6F 72 67 5F 62 6F 64 79 idoall.org_body }

案例9:

File Roll Sink
"file_roll"表示将数据存入本地文件系统

创建配置文件

  1. a1.sources = r1
  2. a1.sinks = k1
  3. a1.channels = c1
  4.  
  5. # Describe/configure the source
  6. a1.sources.r1.type = syslogtcp
  7. a1.sources.r1.port = 5555
  8. a1.sources.r1.host = 192.168.1.246
  9. a1.sources.r1.channels = c1
  10.  
  11. # Describe the sink
  12. a1.sinks.k1.type = file_roll
  13. a1.sinks.k1.sink.directory = /usr/local/test/fileroll
  14.  
  15. # Use a channel which buffers events in memory
  16. a1.channels.c1.type = memory
  17. a1.channels.c1.capacity = 1000
  18. a1.channels.c1.transactionCapacity = 100
  19.  
  20. # Bind the source and sink to the channel
  21. a1.sources.r1.channels = c1
  22. a1.sinks.k1.channel = c1

启动agent

  1. flume-ng agent -n a1 -c ../conf -f fileroll.conf -Dflume.root.logger=DEBUG,console

测试产生log

  1. echo "hello idoall.org syslog" | nc 192.168.1.246 5555

agent日志

  1. 18/08/02 11:53:34 WARN source.SyslogUtils: Event created from Invalid Syslog data.

查看/usr/local/test/fileroll目录下文件

  1. [root@node-246 fileroll]# ls
  2. 1533181857932-1 1533181857932-2 1533181857932-3 1533181857932-4 1533181857932-5 1533181857932-6 1533181857932-7
  3. [root@node-246 fileroll]# cat 1533181857932-6
  4. hello idoall.org syslog

案例10

Replicating Channel Selector
 Flume支持Fan out流从一个源到多个通道。有两种模式的Fan out,分别是复制和复用。在复制的情况下,流的事件被发送到所有的配置通道。在复用的情况下,事件被发送到可用的渠道中的一个子集。Fan out流需要指定源和Fan out通道的规则。
创建replicating_Channel_Selector配置文件

  1. a1.sources = r1
  2. a1.sinks = k1 k2
  3. a1.channels = c1 c2
  4.  
  5. # Describe/configure the source
  6. a1.sources.r1.type = syslogtcp
  7. a1.sources.r1.port = 5140
  8. a1.sources.r1.host = 192.168.1.246
  9. a1.sources.r1.channels = c1 c2
  10. a1.sources.r1.selector.type = replicating
  11.  
  12. # Describe the sink
  13. a1.sinks.k1.type = avro
  14. a1.sinks.k1.channel = c1
  15. a1.sinks.k1.hostname = 192.168.1.246
  16. a1.sinks.k1.port = 5555
  17.  
  18. a1.sinks.k2.type = avro
  19. a1.sinks.k2.channel = c2
  20. a1.sinks.k2.hostname = 192.168.1.247
  21. a1.sinks.k2.port = 5555
  22.  
  23. # Use a channel which buffers events in memory
  24. a1.channels.c1.type = memory
  25. a1.channels.c1.capacity = 1000
  26. a1.channels.c1.transactionCapacity = 100
  27.  
  28. a1.channels.c2.type = memory
  29. a1.channels.c2.capacity = 1000
  30. a1.channels.c2.transactionCapacity = 100

创建replicating_Channel_Selector_avro配置文件

  1. a1.sources = r1
  2. a1.sinks = k1
  3. a1.channels = c1
  4.  
  5. # Describe/configure the source
  6. a1.sources.r1.type = avro
  7. a1.sources.r1.channels = c1
  8. a1.sources.r1.bind = 192.168.1.246
  9. a1.sources.r1.port = 5555
  10.  
  11. # Describe the sink
  12. a1.sinks.k1.type = logger
  13.  
  14. # Use a channel which buffers events in memory
  15. a1.channels.c1.type = memory
  16. a1.channels.c1.capacity = 1000
  17. a1.channels.c1.transactionCapacity = 100
  18.  
  19. # Bind the source and sink to the channel
  20. a1.sources.r1.channels = c1
  21. a1.sinks.k1.channel = c1

将这两个配置文件拷贝到另一台机器247上,并修改配置中的IP

  1. scp replicating_Channel_Selector* root@node-247:/usr/local/test/flume/

打开四个窗口,分别启动两个agent

  1. flume-ng agent -n a1 -c ../conf -f replicating_Channel_Selector_avro.conf -Dflume.root.logger=DEBUG,console
  2. flume-ng agent -n a1 -c ../conf -f replicating_Channel_Selector.conf -Dflume.root.logger=DEBUG,console

测试产生syslog

  1. echo "hello idoall.org syslog" | nc 192.168.1.246 5140

agent日志

  1. 18/08/02 14:09:53 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 68 65 6C 6C 6F 20 69 64 6F 61 6C 6C 2E 6F 72 67 hello idoall.org }

案例11

Multiplexing Channel Selector
新建Multiplexing_Channel_Selector配置文件

  1. a1.sources = r1
  2. a1.sinks = k1 k2
  3. a1.channels = c1 c2
  4.  
  5. # Describe/configure the source
  6. a1.sources.r1.type = org.apache.flume.source.http.HTTPSource
  7. a1.sources.r1.host = 192.168.1.246
  8. a1.sources.r1.port = 5140
  9. a1.sources.r1.channels = c1 c2
  10. a1.sources.r1.selector.type = multiplexing
  11.  
  12. a1.sources.r1.selector.header = type
  13. #映射允许每个值通道可以重叠。默认值可以包含任意数量的通道。
  14. a1.sources.r1.selector.mapping.baidu = c1
  15. a1.sources.r1.selector.mapping.ali = c2
  16. a1.sources.r1.selector.default = c1
  17.  
  18. # Describe the sink
  19. a1.sinks.k1.type = avro
  20. a1.sinks.k1.channel = c1
  21. a1.sinks.k1.hostname = 192.168.1.246
  22. a1.sinks.k1.port = 5555
  23.  
  24. a1.sinks.k2.type = avro
  25. a1.sinks.k2.channel = c2
  26. a1.sinks.k2.hostname = 192.168.1.247
  27. a1.sinks.k2.port = 5555
  28.  
  29. # Use a channel which buffers events in memory
  30. a1.channels.c1.type = memory
  31. a1.channels.c1.capacity = 1000
  32. a1.channels.c1.transactionCapacity = 100
  33.  
  34. a1.channels.c2.type = memory
  35. a1.channels.c2.capacity = 1000
  36. a1.channels.c2.transactionCapacity = 100
  1. 新建Multiplexing_Channel_Selector_avro配置文件
  1. a1.sources = r1
  2. a1.sinks = k1
  3. a1.channels = c1
  4.  
  5. # Describe/configure the source
  6. a1.sources.r1.type = avro
  7. a1.sources.r1.channels = c1
  8. a1.sources.r1.bind = 192.168.1.246
  9. a1.sources.r1.port = 5555
  10.  
  11. # Describe the sink
  12. a1.sinks.k1.type = logger
  13.  
  14. # Use a channel which buffers events in memory
  15. a1.channels.c1.type = memory
  16. a1.channels.c1.capacity = 1000
  17. a1.channels.c1.transactionCapacity = 100
  18.  
  19. # Bind the source and sink to the channel
  20. a1.sources.r1.channels = c1
  21. a1.sinks.k1.channel = c1

将配置文件拷贝到另一个节点,并修改为对应IP

  1. scp Multiplexing_Channel_Selector* root@192.168.1.247:/usr/local/test/flume/

开启四个窗口,246 247分别两个,分别启动agent

  1. flume-ng agent -n a1 -c ../conf -f Multiplexing_Channel_Selector_avro.conf -Dflume.root.logger=DEBUG,console
  2. flume-ng agent -n a1 -c ../conf -f Multiplexing_Channel_Selector.conf -Dflume.root.logger=DEBUG,console

任意节点上,测试产生syslog

  1. curl -X POST -d '[{ "headers" :{"type" : "baidu"},"body" : "idoall_TEST1"}]' http://192.168.1.247:5140 && curl -X POST -d '[{ "headers" :{"type" : "ali"},"body" : "idoall_TEST2"}]' http://192.168.1.247:5140 && curl -X POST -d '[{ "headers" :{"type" : "qq"},"body" : "idoall_TEST3"}]' http://192.168.1.246:5140

agent日志
246上

  1. 18/08/02 14:36:04 INFO sink.LoggerSink: Event: { headers:{type=qq} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 33 idoall_TEST3 }
  2. 18/08/02 14:36:04 INFO sink.LoggerSink: Event: { headers:{type=baidu} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 31 idoall_TEST1 }

247上

  1. 18/08/02 14:36:06 INFO sink.LoggerSink: Event: { headers:{type=ali} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 32 idoall_TEST2 }

可以看到,根据header中不同的条件分布到不同的channel上

案例12

Flume Sink Procesors
Failover的机器是一直发送给其中一个sink,当这个sink不可用的时候,自动发送到下一个sink

创建Flume_Sink_Processors配置文件

  1. a1.sources = r1
  2. a1.sinks = k1 k2
  3. a1.channels = c1 c2
  4.  
  5. #这个是配置failover的关键,需要有一个sink group
  6. a1.sinkgroups = g1
  7. a1.sinkgroups.g1.sinks = k1 k2
  8. #处理的类型是failover
  9. a1.sinkgroups.g1.processor.type = failover
  10. #优先级,数字越大优先级越高,每个sink的优先级必须不相同
  11. a1.sinkgroups.g1.processor.priority.k1 = 5
  12. a1.sinkgroups.g1.processor.priority.k2 = 10
  13. #设置为10秒,当然可以根据你的实际状况更改成更快或者很慢
  14. a1.sinkgroups.g1.processor.maxpenalty = 10000
  15.  
  16. # Describe/configure the source
  17. a1.sources.r1.type = syslogtcp
  18. a1.sources.r1.host = 192.168.1.246
  19. a1.sources.r1.port = 5140
  20. a1.sources.r1.channels = c1 c2
  21. a1.sources.r1.selector.type = replicating
  22.  
  23. # Describe the sink
  24. a1.sinks.k1.type = avro
  25. a1.sinks.k1.channel = c1
  26. a1.sinks.k1.hostname = 192.168.1.246
  27. a1.sinks.k1.port = 5555
  28.  
  29. a1.sinks.k2.type = avro
  30. a1.sinks.k2.channel = c2
  31. a1.sinks.k2.hostname = 192.168.1.247
  32. a1.sinks.k2.port = 5555
  33.  
  34. # Use a channel which buffers events in memory
  35. a1.channels.c1.type = memory
  36. a1.channels.c1.capacity = 1000
  37. a1.channels.c1.transactionCapacity = 100
  38.  
  39. a1.channels.c2.type = memory
  40. a1.channels.c2.capacity = 1000
  41. a1.channels.c2.transactionCapacity = 100

  

新建Flume_Sink_Processors_avro配置文件

  1. a1.sources = r1
  2. a1.sinks = k1
  3. a1.channels = c1
  4.  
  5. # Describe/configure the source
  6. a1.sources.r1.type = avro
  7. a1.sources.r1.channels = c1
  8. a1.sources.r1.bind = 192.168.1.246
  9. a1.sources.r1.port = 5555
  10.  
  11. # Describe the sink
  12. a1.sinks.k1.type = logger
  13.  
  14. # Use a channel which buffers events in memory
  15. a1.channels.c1.type = memory
  16. a1.channels.c1.capacity = 1000
  17. a1.channels.c1.transactionCapacity = 100
  18.  
  19. # Bind the source and sink to the channel
  20. a1.sources.r1.channels = c1
  21. a1.sinks.k1.channel = c1

将这两个文件拷贝到247节点,并修改对应host

  1. scp Flume_Sink_Processors* root@192.168.1.247:/usr/local/test/flume/

开启四个窗口,分别启动两个agent

  1. flume-ng agent -n a1 -c ../conf -f Flume_Sink_Processors_avro.conf -Dflume.root.logger=DEBUG,console
  2. flume-ng agent -n a1 -c ../conf -f Flume_Sink_Processors.conf -Dflume.root.logger=DEBUG,console

测试产生log

  1. echo "idoall.org test1 failover" | nc 192.168.1.246 5140

因为247的优先级高,所以在247的sink窗口,可以看到日志

  1. 18/08/02 15:47:44 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 31 idoall.org test1 }

这时停掉247的sink(Ctrl+c),再次输出测试数据

  1. echo "idoall.org test1 failover" | nc 192.168.1.246 5140

可以看到246的sink日志

  1. 18/08/02 15:51:23 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 31 idoall.org test1 }

案例13

Load balancing Sink Processor
load balance type和failover不同的地方是,load balance有两个配置,一个是轮询,一个是随机。两种情况下如果被选择的sink不可用,就会自动尝试发送到下一个可用的sink上面。
新建Load_balancing_Sink_Processors配置文件

  1. a1.sources = r1
  2. a1.sinks = k1 k2
  3. a1.channels = c1
  4.  
  5. #这个是配置Load balancing的关键,需要有一个sink group
  6. a1.sinkgroups = g1
  7. a1.sinkgroups.g1.sinks = k1 k2
  8. a1.sinkgroups.g1.processor.type = load_balance
  9. a1.sinkgroups.g1.processor.backoff = true
  10. a1.sinkgroups.g1.processor.selector = round_robin
  11.  
  12. # Describe/configure the source
  13. a1.sources.r1.type = syslogtcp
  14. a1.sources.r1.host = 192.168.1.246
  15. a1.sources.r1.port = 5140
  16. a1.sources.r1.channels = c1
  17.  
  18. # Describe the sink
  19. a1.sinks.k1.type = avro
  20. a1.sinks.k1.channel = c1
  21. a1.sinks.k1.hostname = 192.168.1.246
  22. a1.sinks.k1.port = 5555
  23.  
  24. a1.sinks.k2.type = avro
  25. a1.sinks.k2.channel = c1
  26. a1.sinks.k2.hostname = 192.168.1.247
  27. a1.sinks.k2.port = 5555
  28.  
  29. # Use a channel which buffers events in memory
  30. a1.channels.c1.type = memory
  31. a1.channels.c1.capacity = 1000
  32. a1.channels.c1.transactionCapacity = 100

新建Load_balancing_Sink_Processors_arvo配置文件

  1. a1.sources = r1
  2. a1.sinks = k1
  3. a1.channels = c1
  4.  
  5. # Describe/configure the source
  6. a1.sources.r1.type = avro
  7. a1.sources.r1.channels = c1
  8. a1.sources.r1.bind = 192.168.1.246
  9. a1.sources.r1.port = 5555
  10.  
  11. # Describe the sink
  12. a1.sinks.k1.type = logger
  13.  
  14. # Use a channel which buffers events in memory
  15. a1.channels.c1.type = memory
  16. a1.channels.c1.capacity = 1000
  17. a1.channels.c1.transactionCapacity = 100
  18.  
  19. # Bind the source and sink to the channel
  20. a1.sources.r1.channels = c1
  21. a1.sinks.k1.channel = c1

将这两个文件拷贝到247节点下,并修改IP

  1. scp Load_balancing_Sink_Processors* root@192.168.1.247:/usr/local/test/flume/

启动四个窗口,启动四个agent

  1. flume-ng agent -n a1 -c ../conf -f Load_balancing_Sink_Processors_avro.conf -Dflume.root.logger=DEBUG,console

测试产生log

  1. [root@node-246 ~]# echo "idoall.org test1" | nc 192.168.1.246 5140
  2. [root@node-246 ~]# echo "idoall.org test2" | nc 192.168.1.246 5140
  3. [root@node-246 ~]# echo "idoall.org test3" | nc 192.168.1.246 5140
  4. [root@node-246 ~]# echo "idoall.org test4" | nc 192.168.1.246 5140

247日志

  1. 18/08/02 18:36:20 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 31 idoall.org test1 }
  2. 18/08/02 18:36:35 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 32 idoall.org test2 }
  3. 18/08/02 18:36:58 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 34 idoall.org test4 }

246日志

  1. 18/08/02 18:36:47 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 33 idoall.org test3 }

案例14

Hbase sink

将hbase lib下文件复制到flume lib下

  1. protobuf-java-2.5.0.jar
  2. hbase-client-0.96.2-hadoop2.jar
  3. hbase-common-0.96.2-hadoop2.jar
  4. hbase-protocol-0.96.2-hadoop2.jar
  5. hbase-server-0.96.2-hadoop2.jar
  6. hbase-hadoop2-compat-0.96.2-hadoop2.jar
  7. hbase-hadoop-compat-0.96.2-hadoop2.jar
  8. htrace-core-2.04.jar

  

  1. cp protobuf-java-2.5.0.jar hbase-client-1.1.2.2.6.1.0-129.jar hbase-common-1.1.2.2.6.1.0-129.jar hbase-protocol-1.1.2.2.6.1.0-129.jar hbase-server-1.1.2.2.6.1.0-129.jar hbase-hadoop2-compat-1.1.2.2.6.1.0-129.jar hbase-hadoop-compat-1.1.2.2.6.1.0-129.jar htrace-core-3.1.0-incubating.jar /usr/hdp/2.6.1.0-129/flume/lib/

hbase新建表 flume_test 列族name

  1. hbase(main):003:0> create 'flume_test', 'name'
  2. 0 row(s) in 2.3900 seconds
  3. => Hbase::Table - flume_test

新建agent配置文件hbase_simple

  1. a1.sources = r1
  2. a1.sinks = k1
  3. a1.channels = c1
  4.  
  5. # Describe/configure the source
  6. a1.sources.r1.type = syslogtcp
  7. a1.sources.r1.port = 5140
  8. a1.sources.r1.host = 192.168.1.246
  9. a1.sources.r1.channels = c1
  10.  
  11. # Describe the sink
  12. a1.sinks.k1.type = logger
  13. a1.sinks.k1.type = hbase
  14. a1.sinks.k1.table = flume_test
  15. a1.sinks.k1.columnFamily = name
  16. a1.sinks.k1.column = message
  17. a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
  18. a1.sinks.k1.channel = memoryChannel
  19.  
  20. # Use a channel which buffers events in memory
  21. a1.channels.c1.type = memory
  22. a1.channels.c1.capacity = 1000
  23. a1.channels.c1.transactionCapacity = 100
  24.  
  25. # Bind the source and sink to the channel
  26. a1.sources.r1.channels = c1
  27. a1.sinks.k1.channel = c1

启动agent

  1. flume-ng agent -n a1 -c ../conf -f hbase_simple.conf -Dflume.root.logger=DEBUG,console

产生测试log

  1. echo "hello zzz.org from flume" | nc 192.168.1.246 5140

agent日志

  1. 18/08/03 10:01:07 WARN source.SyslogUtils: Event created from Invalid Syslog data.

查看hbase

  1. hbase(main):006:0> scan 'flume_test'
  2. ROW COLUMN+CELL
  3. 1533261667472-IamY4IbgS7-0 column=name:payload, timestamp=1533261670851, value=hello zzz.org from flume
  4. 1 row(s) in 0.1130 seconds

案例15

使用flume avro采集平台日志
Agent文件如下,采集完成直接写入HDFS

  1. [root@node-246 flume]# cat avro_tag.conf
  2. # Name the components on this agent
  3. a1.sources = r1
  4. a1.sinks = k1
  5. a1.channels = c1
  6.  
  7. # Describe/configure the source
  8. a1.sources.r1.type = avro
  9. a1.sources.r1.bind = 192.168.1.246
  10. a1.sources.r1.port = 44444
  11.  
  12. # Describe the sink
  13. a1.sinks.k1.type = hdfs
  14. a1.sinks.k1.hdfs.path = hdfs://node-231:8020/user/hdfs/flume/avro_tag/dataoutput
  15. a1.sinks.k1.hdfs.writeFormat = Text
  16. a1.sinks.k1.hdfs.fileType = DataStream
  17. a1.sinks.k1.hdfs.rollInterval = 10
  18. a1.sinks.k1.hdfs.rollSize = 0
  19. a1.sinks.k1.hdfs.rollCount = 0
  20. a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S
  21. a1.sinks.k1.hdfs.useLocalTimeStamp = true
  22.  
  23. # Use a channel which buffers events in file
  24. a1.channels.c1.type = file
  25. a1.channels.c1.checkpointDir = /usr/flume/checkpoint
  26. a1.channels.c1.dataDirs = /usr/flume/data
  27.  
  28. # Bind the source and sink to the channel
  29. a1.sources.r1.channels = c1
  30. a1.sinks.k1.channel = c1

平台需要引入的包

  1. <dependency>
  2. <groupId>org.apache.logging.log4j</groupId>
  3. <artifactId>log4j-flume-ng</artifactId>
  4. <version>${log4j.version}</version>
  5. </dependency>
  6. <dependency>
  7. <groupId>org.apache.flume.flume-ng-clients</groupId>
  8. <artifactId>flume-ng-log4jappender</artifactId>
  9. <version>1.8.0</version>
  10. </dependency>
  11. <!-- log4j-core -->
  12. <dependency>
  13. <groupId>org.apache.logging.log4j</groupId>
  14. <artifactId>log4j-core</artifactId>
  15. <version>${log4j.version}</version>
  16. </dependency>
  17. <!-- log4j-api -->
  18. <dependency>
  19. <groupId>org.apache.logging.log4j</groupId>
  20. <artifactId>log4j-api</artifactId>
  21. <version>${log4j.version}</version>
  22. </dependency>
  23. <!-- log4j-web -->
  24. <dependency>
  25. <groupId>org.apache.logging.log4j</groupId>
  26. <artifactId>log4j-web</artifactId>
  27. <version>${log4j.version}</version>
  28. </dependency>

log4j2.xml,如下

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!--日志级别以及优先级排序: OFF > FATAL > ERROR > WARN > INFO > DEBUG > TRACE > ALL -->
  3. <!--Configuration后面的status,这个用于设置log4j2自身内部的信息输出,可以不设置,当设置成trace时,你会看到log4j2内部各种详细输出-->
  4. <!--monitorInterval:Log4j能够自动检测修改配置 文件和重新配置本身,设置间隔秒数-->
  5. <configuration status="INFO" monitorInterval="30">
  6. <properties>
  7. <property name="LOG_HOME">../logs</property>
  8. <property name="TMP_LOG_FILE_NAME">tmp</property>
  9. <property name="INFO_LOG_FILE_NAME">info</property>
  10. <property name="WARN_LOG_FILE_NAME">warn</property>
  11. <property name="ERROR_LOG_FILE_NAME">error</property>
  12. </properties>
  13. <!--先定义所有的appender-->
  14. <appenders>
  15. <!--这个输出控制台的配置-->
  16. <console name="Console" target="SYSTEM_OUT">
  17. <!--输出日志的格式-->
  18. <PatternLayout pattern="[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n"/>
  19. </console>
  20. <!--文件会打印出所有信息,这个log每次运行程序会自动清空,由append属性决定,这个也挺有用的,适合临时测试用-->
  21. <File name="log" fileName="${LOG_HOME}/${TMP_LOG_FILE_NAME}.log" append="false">
  22. <PatternLayout pattern="%d{HH:mm:ss.SSS} %-5level %class{36} %L %M - %msg%xEx%n"/>
  23. </File>
  24. <!-- 这个会打印出所有的info及以下级别的信息,每次大小超过size,则这size大小的日志会自动存入按年份-月份建立的文件夹下面并进行压缩,作为存档-->
  25. <RollingFile name="RollingFileInfo" fileName="${LOG_HOME}/${INFO_LOG_FILE_NAME}.log"
  26. filePattern="${LOG_HOME}/${INFO_LOG_FILE_NAME}-%d{yyyy-MM-dd}-%i.log">
  27. <!--控制台只输出level及以上级别的信息(onMatch),其他的直接拒绝(onMismatch)-->
  28. <ThresholdFilter level="info" onMatch="ACCEPT" onMismatch="DENY"/>
  29. <PatternLayout pattern="[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n"/>
  30. <Policies>
  31. <TimeBasedTriggeringPolicy/>
  32. <SizeBasedTriggeringPolicy size="100 MB"/>
  33. </Policies>
  34. </RollingFile>
  35. <RollingFile name="RollingFileWarn" fileName="${LOG_HOME}/${WARN_LOG_FILE_NAME}.log"
  36. filePattern="${LOG_HOME}/${WARN_LOG_FILE_NAME}-%d{yyyy-MM-dd}-%i.log">
  37. <ThresholdFilter level="warn" onMatch="ACCEPT" onMismatch="DENY"/>
  38. <PatternLayout pattern="[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n"/>
  39. <Policies>
  40. <TimeBasedTriggeringPolicy/>
  41. <SizeBasedTriggeringPolicy size="100 MB"/>
  42. </Policies>
  43. <!-- DefaultRolloverStrategy属性如不设置,则默认为最多同一文件夹下7个文件,这里设置了20 -->
  44. <DefaultRolloverStrategy max="20"/>
  45. </RollingFile>
  46. <RollingFile name="RollingFileError" fileName="${LOG_HOME}/${ERROR_LOG_FILE_NAME}.log"
  47. filePattern="${LOG_HOME}/${ERROR_LOG_FILE_NAME}-%d{yyyy-MM-dd}-%i.log">
  48. <ThresholdFilter level="error" onMatch="ACCEPT" onMismatch="DENY"/>
  49. <PatternLayout pattern="[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n"/>
  50. <Policies>
  51. <TimeBasedTriggeringPolicy/>
  52. <SizeBasedTriggeringPolicy size="100 MB"/>
  53. </Policies>
  54. </RollingFile>
  55. <!-- flume配置 -->
  56. <Flume name="FlumeAppender" compress="true">
  57. <Agent host="192.168.1.246" port="44444"/>
  58. <!-- <RFC5424Layout charset="UTF-8" enterpriseNumber="18060" includeMDC="true" appName="myapp"/> -->
  59. <PatternLayout charset="GBK" pattern="[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n" />
  60. </Flume>
  61. </appenders>
  62. <!--然后定义logger,只有定义了logger并引入的appender,appender才会生效-->
  63. <loggers>
  64. <!--过滤掉spring和mybatis的一些无用的DEBUG信息-->
  65. <logger name="org.springframework" level="INFO"></logger>
  66. <logger name="org.mybatis" level="INFO"></logger>
  67. <!-- <Logger name="sysLog" level="trace">
  68. <AppenderRef ref="FlumeAppender"/>
  69. </Logger> -->
  70. <root level="info">
  71. <appender-ref ref="Console"/>
  72. <appender-ref ref="RollingFileInfo"/>
  73. <appender-ref ref="RollingFileWarn"/>
  74. <appender-ref ref="RollingFileError"/>
  75. <!-- 日志写入flume source -->
  76. <appenderRef ref="FlumeAppender"/>
  77. </root>
  78. </loggers>
  79. </configuration>

启动agent

  1. flume-ng agent -n a1 -c ../conf -f avro_tag.conf -Dflume.root.logger=DEBUG,console

启动项目之后就会将日志信息通过flume写入HDFS

Flume具体应用(多案例)的更多相关文章

  1. Flume 高可用配置案例+load balance负载均衡+ 案例:日志的采集及汇总

    高可用配置案例 (一).failover故障转移 在完成单点的Flume NG搭建后,下面我们搭建一个高可用的Flume NG集群,架构图如下所示: (1)节点分配 Flume的Agent和Colle ...

  2. 第1节 flume:6、flume的入门测试案例

    案例:使用网络telent命令向一台机器发送一些网络数据,然后通过flume采集网络端口数据. 1.2.1 Flume的安装部署 第一步:下载解压修改配置文件 Flume的安装非常简单,只需要解压即可 ...

  3. Flume系列二之案例实战

    Flume案例实战 写在前面 通过前面一篇文章http://blog.csdn.net/liuge36/article/details/78589505的介绍我们已经知道flume到底是什么?flum ...

  4. 日志采集框架Flume以及Flume的安装部署(一个分布式、可靠、和高可用的海量日志采集、聚合和传输的系统)

    Flume支持众多的source和sink类型,详细手册可参考官方文档,更多source和sink组件 http://flume.apache.org/FlumeUserGuide.html Flum ...

  5. Flume篇---Flume安装配置与相关使用

    一.前述 Copy过来一段介绍Apache Flume 是一个从可以收集例如日志,事件等数据资源,并将这些数量庞大的数据从各项数据资源中集中起来存储的工具/服务,或者数集中机制.flume具有高可用, ...

  6. nginx+ flume

    nginx 作用: 做负载均衡  nginx和lvs的区别:nginx可以做反向代理 1.上传nginx安装包  tar -zxvf tengine-2.1.02.安装环境  依赖 gcc opens ...

  7. flume学习笔记

    #################################################################################################### ...

  8. flume介绍及应用

    版权声明:本文为yunshuxueyuan原创文章.如需转载请标明出处: http://www.cnblogs.com/sxt-zkys/QQ技术交流群:299142667 flume的概念 1.   ...

  9. Flume系列一之架构介绍和安装

    Flume架构介绍和安装 写在前面 在学习一门新的技术之前,我们得知道了解这个东西有什么用?我们可以使用它来做些什么呢?简单来说,flume是大数据日志分析中不能缺少的一个组件,既可以使用在流处理中, ...

随机推荐

  1. Android 版本更新升级

    推荐一款很好的版本升级开源框架: https://github.com/WVector/AppUpdate 个人地址:总结https://gitee.com/anan9303/AppVersionUp ...

  2. (转载)java提高篇(五)-----抽象类与接口

    接口和内部类为我们提供了一种将接口与实现分离的更加结构化的方法. 本文是转载的(尊重原著),原文地址:http://www.cnblogs.com/chenssy/p/3376708.html 抽象类 ...

  3. 1718 Cos的多项式

    1718 Cos的多项式 基准时间限制:1 秒 空间限制:131072 KB 分值: 40 难度:4级算法题 小明对三角函数充满了兴趣,有一天他突然发现一个神奇的性质. 2cos(nx)似乎可以表示成 ...

  4. 封装AJax实现JSON前台与后台交互

    实践技术点:1.AJax自定义封装 2.后台序列化与反序列化JSON 3.客户端解析JSON字符串,处理DOM 实现代码如下: 1.JS脚本代码:   1 /*** NOTE:AJAX处理JS TIM ...

  5. C#关于AutoResetEvent的使用介绍----修正

    说明 之前在博客园看到有位仁兄发表一篇关于AutoResetEvent介绍,看了下他写的代码,看上去没什么问题,但仔细看还是能发现问题.下图是这位仁兄代码截图. 仁兄博客地址:http://www.c ...

  6. mixin 在传参中可以出现 参数 在类内部可以定义 作用域

    mixin 在传参中可以出现 参数  在类内部可以定义

  7. mix-in class selectors

    语言特性 | Less 中文网 http://lesscss.cn/features/#mixins-feature Mixins "mix-in" properties from ...

  8. _utf8_encode _utf8_decode base64_encode base64_decode

    const Base64 = { // private property _keyStr: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuv ...

  9. AJAX 入门

    1. 同步交互与异步交互 1.1 同步交互 客户端发送一个请求, 需要等待服务器的响应结束,才能发送第二个请求! 刷新的是整个页面. 1.2 异步交互 客户端发送一个请求,无需等待服务器的响应,然后就 ...

  10. 通过jdt解析spring mvc中url-类-方法的对应关系

    依赖 <dependencies> <dependency> <groupId>org.eclipse.jdt</groupId> <artifa ...