Flume总结（1）

一、日志采集：从网络端口接收数据，下沉到logger

文件netcat-logger.conf:

 # Name the components on this agent

 #给那三个组件取个名字

 a1.sources = r1

 a1.sinks = k1

 a1.channels = c1

 # Describe/configure the source

 #类型, 从网络端口接收数据,在本机启动, 所以localhost, type=spoolDir采集目录源,目录里有就采

 a1.sources.r1.type = netcat

 a1.sources.r1.bind = localhost

 a1.sources.r1.port = 44444

 # Describe the sink

 a1.sinks.k1.type = logger

 # Use a channel which buffers events in memory

 #下沉的时候是一批一批的, 下沉的时候是一个个eventChannel参数解释：

 #capacity：默认该通道中最大的可以存储的event数量

 #trasactionCapacity：每次最大可以从source中拿到或者送到sink中的event数量

 a1.channels.c1.type = memory

 a1.channels.c1.capacity = 1000

 a1.channels.c1.transactionCapacity = 100

 # Bind the source and sink to the channel

 a1.sources.r1.channels = c1

 a1.sinks.k1.channel = c1

启动命令：
#告诉flum启动一个agent,指定配置参数, --name:agent的名字,
flume-ng agent --conf conf --conf-file conf/netcat-logger.conf --name a1 -Dflume.root.logger=INFO,console

传入数据：

[root@mini03 ~]# telnet localhost 44444

Trying ::1...

telnet: connect to address ::1: Connection refused

Trying 127.0.0.1...

Connected to localhost.

Escape character is '^]'.

hello world!^H^H^H^H^H^H^H^H^H^H^H^H^H^H

OK

tianjun2012!

OK

控制台看到的数据

2017-05-08 13:41:35,766 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 68 65 6C 6C 6F 20 77 6F 72 6C 64 21 08 08 08 08 hello world!.... }

2017-05-08 13:41:40,153 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 74 69 61 6E 6A 75 6E 32 30 31 32 21 0D tianjun2012!. }

二、监视文件夹

启动命令：
bin/flume-ng agent -c ./conf -f ./conf/spooldir-logger.conf -n a1 -Dflume.root.logger=INFO,console

测试：往/home/hadoop/flumespool放文件（mv ././xxxFile /home/hadoop/flumeSpool），但是不要在里面生成文件

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

#监听目录,spoolDir指定目录, fileHeader要不要给文件夹前坠名

a1.sources.r1.type = spooldir

a1.sources.r1.spoolDir = /home/hadoop/flumespool

a1.sources.r1.fileHeader = true

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

三、用tail命令获取数据，下沉到hdfs

 # Name the components on this agent

 a1.sources = r1

 a1.sinks = k1

 a1.channels = c1

 # Describe/configure the source

 a1.sources.r1.type = exec

 a1.sources.r1.command = tail -F /home/hadoop/log/test.log

 a1.sources.r1.channels = c1

 # Describe the sink

 a1.sinks.k1.type = hdfs

 a1.sinks.k1.channel = c1

 a1.sinks.k1.hdfs.path = hdfs://mini01:9000/flume/events/%y-%m-%d/%H%M/

 a1.sinks.k1.hdfs.filePrefix = events-

 a1.sinks.k1.hdfs.round = true

 a1.sinks.k1.hdfs.roundValue = 10

 a1.sinks.k1.hdfs.roundUnit = minute

 a1.sinks.k1.hdfs.rollInterval = 3

 a1.sinks.k1.hdfs.rollSize = 20

 a1.sinks.k1.hdfs.rollCount = 5

 a1.sinks.k1.hdfs.batchSize = 1

 a1.sinks.k1.hdfs.useLocalTimeStamp = true

 #生成的文件类型，默认是Sequencefile，可用DataStream，则为普通文本

 a1.sinks.k1.hdfs.fileType = DataStream

 # Use a channel which buffers events in memory

 a1.channels.c1.type = memory

 a1.channels.c1.capacity = 1000

 a1.channels.c1.transactionCapacity = 100

 # Bind the source and sink to the channel

 a1.sources.r1.channels = c1

 a1.sinks.k1.channel = c1

启动命令：
flume-ng agent -c conf -f conf/tail-hdfs.conf -n a1

模拟写入日志：

 [root@mini03 log]# i=1;

 while(( $i<=500000 ));

  do echo $i >> /home/hadoop/log/test.log;

  sleep 0.5;

 let 'i++';done

查看hdfs上的文件内容

 [root@mini01 ~]# hdfs dfs -cat /flume/events/17-05-08/1530/*

 1

 2

 3

 4

 5

 6

 7

 8

 9

 10

 11

 12

 13

 14

 15

 16

 17

 18

 19

注意，本例中，为了快速看到效果，这个值都设置比较小，真实情况需要调整

a1.sinks.k1.hdfs.roundValue = 10

a1.sinks.k1.hdfs.rollInterval = 3

a1.sinks.k1.hdfs.rollSize = 20

a1.sinks.k1.hdfs.rollCount = 5

22 a1.sinks.k1.hdfs.batchSize = 1

下面给一个真实环境中的配置：

agent1.sources = spooldirSource
agent1.channels = fileChannel
agent1.sinks = hdfsSink

agent1.sources.spooldirSource.type=spooldir
agent1.sources.spooldirSource.spoolDir=/home/hadoop/log
agent1.sources.spooldirSource.channels=fileChannel

agent1.sinks.hdfsSink.type=hdfs
agent1.sinks.hdfsSink.hdfs.path=hdfs://mini01:9000/weblog/flume-input/%y-%m-%d
agent1.sinks.hdfsSink.hdfs.filePrefix=flume-
agent1.sinks.sink1.hdfs.round = true
# Number of seconds to wait before rolling current file (0 = never roll based on time interval)
agent1.sinks.hdfsSink.hdfs.rollInterval = 3600
# File size to trigger roll, in bytes (0: never roll based on file size)
agent1.sinks.hdfsSink.hdfs.rollSize = 128000000
agent1.sinks.hdfsSink.hdfs.rollCount = 0
agent1.sinks.hdfsSink.hdfs.batchSize = 1000

#Rounded down to the highest multiple of this (in the unit configured using hdfs.roundUnit), less than current time.
agent1.sinks.hdfsSink.hdfs.roundValue = 1
agent1.sinks.hdfsSink.hdfs.roundUnit = minute
agent1.sinks.hdfsSink.hdfs.useLocalTimeStamp = true
agent1.sinks.hdfsSink.channel=fileChannel
agent1.sinks.hdfsSink.hdfs.fileType = DataStream

agent1.channels.fileChannel.type = file
agent1.channels.fileChannel.checkpointDir=/tmp/flume/flume-bineckpoint
agent1.channels.fileChannel.dataDirs=/tmp/flume/dataDir

bin/flume-ng agent --conf ./conf/ -f conf/spooldir-hdfs.conf -Dflume.root.logger=DEBUG,console -n agent1 > log.log 2>&1 &

Flume总结（1）的更多相关文章

Flume1 初识Flume和虚拟机搭建Flume环境
前言: 工作中需要同步日志到hdfs,以前是找运维用rsync做同步,现在一般是用flume同步数据到hdfs.以前为了工作简单看个flume的一些东西,今天下午有时间自己利用虚拟机搭建了 ...
Flume（4）实用环境搭建：source(spooldir)+channel(file)+sink(hdfs)方式
一.概述: 在实际的生产环境中,一般都会遇到将web服务器比如tomcat.Apache等中产生的日志倒入到HDFS中供分析使用的需求.这里的配置方式就是实现上述需求. 二.配置文件: #agent1 ...
Flume（3）source组件之NetcatSource使用介绍
一.概述: 本节首先提供一个基于netcat的source+channel(memory)+sink(logger)的数据传输过程.然后剖析一下NetcatSource中的代码执行逻辑. 二.flum ...
Flume（2）组件概述与列表
上一节搭建了flume的简单运行环境,并提供了一个基于netcat的演示.这一节继续对flume的整个流程进行进一步的说明. 一.flume的基本架构图: 下面这个图基本说明了flume的作用,以及f ...
Flume（1）使用入门
一.概述: Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集.聚合和传输的系统. 当前Flume有两个版本Flume 0.9X版本的统称Flume-og,Flume1.X ...
大数据平台架构（flume＋kafka＋hbase＋ELK+storm+redis+mysql）
上次实现了flume+kafka+hbase+ELK:http://www.cnblogs.com/super-d2/p/5486739.html 这次我们可以加上storm: storm-0.9.5 ...
flume+kafka+spark streaming整合
1.安装好flume2.安装好kafka3.安装好spark4.流程说明: 日志文件->flume->kafka->spark streaming flume输入:文件 flume输 ...
flume使用示例
flume的特点: flume是一个分布式.可靠.和高可用的海量日志采集.聚合和传输的系统.支持在日志系统中定制各类数据发送方,用于收集数据;同时,Flume提供对数据进行简单处理,并写到各种数据接受 ...
Hadoop学习笔记—19.Flume框架学习
START:Flume是Cloudera提供的一个高可用的.高可靠的开源分布式海量日志收集系统,日志数据可以经过Flume流向需要存储终端目的地.这里的日志是一个统称,泛指文件.操作记录等许多数据. ...
Flume NG Getting Started（Flume NG 新手入门指南）
Flume NG Getting Started(Flume NG 新手入门指南)翻译新手入门 Flume NG是什么? 有什么改变? 获得Flume NG 从源码构建配置 flume-ng全局选 ...

随机推荐

cuda内存总结
1．shared memory __shared__ 声明为共享内存,将会保存在共享内存中 2．constant memory __constant__ 声明为常量内存,将会保存在常量内存中,常量内 ...
Linux搭建FastFDFS文件管理系统搭建，部署及上传材料
昨天下午花了三四个小时在Linux centos 6 上搭建了一个分布式文件系统.纯粹是搭建来做自己的文件备份.所以把一些自己在其中遇到的一些问题给总结出来,免得更多人走错路. FastDFS 的一些 ...
apache 配置多个版本的 php
注:这里说的是windows环境下的配置我们在配置apache+php的时候,是在apache的配置文件httpd.conf里加载php的模块并指定php.ini路径 LoadModule php5 ...
JAVA 类总结
JAVA 类总结最近看了遍java内部类相关的一些内容,做一些总结.与个人博客 zhiheng.me 同步发布,标题: JAVA 类总结. 顶级类与嵌套类定义在某个类(或接口,下同)内部的类,称为 ...
nginx错误记录
症状: 安装phpBB3.1的最后一步完成安装之后,注册用户,浏览器崩溃.localhost的所有页面都打不开同时没有响应. Trace: 虽然打开了nginx.exe,但是进程中未发现服务. 重新电 ...
ELK整体方案
# ELK日志搜集平台解决方案---------1. 硬件设备2. 系统环境3. elasticsearch 集群部署4. kibana 部署5. logstash 部署6. filebeat 部署7 ...
vue2.0transition过渡的使用介绍
直接上代码 <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF ...
模拟jquery链式访问
一直写代码写代码,博客都快荒废了,眼看一月要过完,不能不留下点记忆,嘿嘿,刚研究了下jquery的链式访问,这么好用的技能我赶紧get了下,研究后略微修改,模拟一个简单的链式访问,下面这段代码支持修改 ...
反射实现 Data To Model
调用 : public ActionResult Index() { DataTable dt = new DataTable(); dt.Columns.Add("Name"); ...
android Instrumentoation 问答
android Instrumentoation 问答 1.instrumentation是执行application instrumentation代码的基类.当应用程序运行的时候instrum ...

Flume总结（1）

Flume总结（1）的更多相关文章

随机推荐

热门专题