!!!1.Logger Sink
记录INFO级别的日志,通常用于调试。
 
属性说明:
!channel –  
!type – The component type name, needs to be logger
maxBytesToLog 16 Maximum number of bytes of the Event body to log
 
要求必须在 --conf 参数指定的目录下有 log4j的配置文件
也可以通过-Dflume.root.logger=INFO,console在命令启动时手动指定log4j参数
 
案例:
参见入门案例
 
!!!2.File Roll Sink
在本地文件系统中存储事件。
每隔指定时长生成文件保存这段时间内收集到的日志信息。
 
属性说明:
!channel –  
!type – 类型,必须是"file_roll"
!sink.directory – 文件被存储的目录
sink.rollInterval 30 滚动文件每隔30秒(应该是每隔30秒钟单独切割数据到一个文件的意思)。如果设置为0,则禁止滚动,从而导致所有数据被写入到一个文件中。
sink.serializer TEXT Other possible options include avro_event or the FQCN of an implementation of EventSerializer.Builder interface.
batchSize 100  
 
案例:
编写配置文件:
#命名Agent a1的组件
a1.sources  =  r1
a1.sinks  =  k1
a1.channels  =  c1
 
#描述/配置Source
a1.sources.r1.type  = http
a1.sources.r1.port  = 6666
 
#描述Sink
a1.sinks.k1.type  = file_roll
a1.sinks.k1.directory = /home/park/work/apache-flume-1.6.0-bin/mysink
#描述内存Channel
a1.channels.c1.type  =  memory
a1.channels.c1.capacity  =  1000
a1.channels.c1.transactionCapacity  =  100
 
#为Channle绑定Source和Sink
a1.sources.r1.channels  =  c1
a1.sinks.k1.channel  =  c1
 
启动flume:
./flume-ng agent --conf ../conf --conf-file ../conf/template7.conf --name a1 -Dflume.root.logger=INFO,console
!!!3.Avro Sink
是实现多级流动 和 扇出流(1到多) 扇入流(多到1) 的基础。
 
属性说明:
!channel –  
!type – The component type name, needs to be avro.
!hostname – The hostname or IP address to bind to.
!port – The port # to listen on.
batch-size 100 number of event to batch together for send.
connect-timeout 20000 Amount of time (ms) to allow for the first (handshake) request.
request-timeout 20000 Amount of time (ms) to allow for requests after the first.
reset-connection-interval none Amount of time (s) before the connection to the next hop is reset. This will force the Avro Sink to reconnect to the next hop. This will allow the sink to connect to hosts behind a hardware load-balancer when news hosts are added without having to restart the agent.
compression-type none This can be “none” or “deflate”. The compression-type must match the compression-type of matching AvroSource
compression-level 6 The level of compression to compress event. 0 = no compression and 1-9 is compression. The higher the number the more compression
ssl false Set to true to enable SSL for this AvroSink. When configuring SSL, you can optionally set a “truststore”, “truststore-password”, “truststore-type”, and specify whether to “trust-all-certs”.
trust-all-certs false If this is set to true, SSL server certificates for remote servers (Avro Sources) will not be checked. This should NOT be used in production because it makes it easier for an attacker to execute a man-in-the-middle attack and “listen in” on the encrypted connection.
truststore – The path to a custom Java truststore file. Flume uses the certificate authority information in this file to determine whether the remote Avro Source’s SSL authentication credentials should be trusted. If not specified, the default Java JSSE certificate authority files (typically “jssecacerts” or “cacerts” in the Oracle JRE) will be used.
truststore-password – The password for the specified truststore.
truststore-type JKS The type of the Java truststore. This can be “JKS” or other supported Java truststore type.
exclude-protocols SSLv3 Space-separated list of SSL/TLS protocols to exclude. SSLv3 will always be excluded in addition to the protocols specified.
maxIoWorkers 2 * the number of available processors in the machine The maximum number of I/O w
 
案例1 - 多级流动
h2:
配置配置文件:
#命名Agent组件
a1.sources=r1
a1.sinks=k1
a1.channels=c1
 
#描述/配置Source
a1.sources.r1.type=avro
a1.sources.r1.bind=0.0.0.0
a1.sources.r1.port=9988
#描述Sink
a1.sinks.k1.type=logger
#描述内存Channel
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=1000
#为Channel绑定Source和Sink
a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1
启动flume:
./flume-ng agent --conf ../conf --conf-file ../conf/template8.conf --name a1 -Dflume.root.logger=INFO,console
 
 
h1:
配置配置文件
#命名Agent组件
a1.sources=r1
a1.sinks=k1
a1.channels=c1
 
#描述/配置Source
a1.sources.r1.type=http
a1.sources.r1.port=8888
#描述Sink
a1.sinks.k1.type=avro
a1.sinks.k1.hostname=192.168.242.138
a1.sinks.k1.port=9988
#描述内存Channel
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=1000
#为Channel绑定Source和Sink
a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1
启动flume:
./flume-ng agent --conf ../conf --conf-file ../conf/template8.conf --name a1 -Dflume.root.logger=INFO,console
 
发送http请求到h1:
curl -X POST -d '[{ "headers" :{"a" : "a1","b" : "b1"},"body" : "hello~http~flume~"}]' http://192.168.242.133:8888
稍等几秒后,发现h2最终收到了这条消息
 
案例2:扇出流 - 复制
h2 h3:
配置配置文件:
#命名Agent组件
a1.sources=r1
a1.sinks=k1
a1.channels=c1
 
#描述/配置Source
a1.sources.r1.type=avro
a1.sources.r1.bind=0.0.0.0
a1.sources.r1.port=9988
#描述Sink
a1.sinks.k1.type=logger
#描述内存Channel
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=1000
#为Channel绑定Source和Sink
a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1
启动flume:
./flume-ng agent --conf ../conf --conf-file ../conf/template8.conf --name a1 -Dflume.root.logger=INFO,console
 
h1:
配置配置文件
#命名Agent组件
a1.sources=r1
a1.sinks=k1 k2
a1.channels=c1 c2
 
#描述/配置Source
a1.sources.r1.type=http
a1.sources.r1.port=8888
#描述Sink
a1.sinks.k1.type=avro
a1.sinks.k1.hostname=192.168.242.138
a1.sinks.k1.port=9988
a1.sinks.k2.type=avro
a1.sinks.k2.hostname=192.168.242.135
a1.sinks.k2.port=9988
#描述内存Channel
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=1000
a1.channels.c2.type=memory
a1.channels.c2.capacity=1000
a1.channels.c2.transactionCapacity=1000
#为Channel绑定Source和Sink
a1.sources.r1.channels=c1 c2
a1.sinks.k1.channel=c1
a1.sinks.k2.channel=c2
启动flume:
./flume-ng agent --conf ../conf --conf-file ../conf/template8.conf --name a1 -Dflume.root.logger=INFO,console
 
案例3:扇出流 - 多路复用(路由)
h2 h3:
配置配置文件:
#命名Agent组件
a1.sources=r1
a1.sinks=k1
a1.channels=c1
 
#描述/配置Source
a1.sources.r1.type=avro
a1.sources.r1.bind=0.0.0.0
a1.sources.r1.port=9988
#描述Sink
a1.sinks.k1.type=logger
#描述内存Channel
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=1000
#为Channel绑定Source和Sink
a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1
启动flume:
./flume-ng agent --conf ../conf --conf-file ../conf/template8.conf --name a1 -Dflume.root.logger=INFO,console
 
h1:
配置配置文件
#配置Agent组件
a1.sources=r1
a1.sinks=k1 k2
a1.channels=c1 c2
 
#描述/配置Source
a1.sources.r1.type=http
a1.sources.r1.port=8888
a1.sources.r1.selector.type=multiplexing
a1.sources.r1.selector.header=flag
a1.sources.r1.selector.mapping.aaa=c1
a1.sources.r1.selector.mapping.bbb=c2
a1.sources.r1.selector.default=c1
#描述Sink
a1.sinks.k1.type=avro
a1.sinks.k1.hostname=192.168.242.138
a1.sinks.k1.port=9988
a1.sinks.k2.type=avro
a1.sinks.k2.hostname=192.168.242.135
a1.sinks.k2.port=9988
#描述内存Channel
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=1000
a1.channels.c2.type=memory
a1.channels.c2.capacity=1000
a1.channels.c2.transactionCapacity=1000
#为Channel绑定Source和Sink
a1.sources.r1.channels=c1 c2
a1.sinks.k1.channel=c1
a1.sinks.k2.channel=c2
启动flume:
./flume-ng agent --conf ../conf --conf-file ../conf/template8.conf --name a1 -Dflume.root.logger=INFO,console
发送http请求进行测试。发现可以实现路由效果
 
案例4:扇入流
m3:
编写配置文件:
#命名Agent组件
a1.sources=r1
a1.sinks=k1
a1.channels=c1
#描述/配置Source
a1.sources.r1.type=avro
a1.sources.r1.bind=0.0.0.0
a1.sources.r1.port=4141
#描述Sink
a1.sinks.k1.type=logger
#描述内存Channel
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=1000
#为Channel绑定Source和Sink
a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1
启动flume:
./flume-ng agent --conf ../conf --conf-file ../conf/template.conf --name a1 -Dflume.root.logger=INFO,console
 
m1、m2:
编写配置文件:
#命名Agent组件
a1.sources=r1
a1.sinks=k1
a1.channels=c1
 
#描述/配置Source
a1.sources.r1.type=http
a1.sources.r1.port=8888
#描述Sink
a1.sinks.k1.type=avro
a1.sinks.k1.hostname=192.168.242.135
a1.sinks.k1.port=4141
#描述内存Channel
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=1000
#为Channel绑定Source和Sink
a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1
启动flume:
./flume-ng agent --conf ../conf --conf-file ../conf/template9.conf --name a1 -Dflume.root.logger=INFO,console
m1通过curl发送一条http请求,由于默认使用的是jsonHandler,数据格式必须是指定的json格式:
[root@localhost conf]# curl -X POST -d '[{ "headers" :{"flag" : "c"},"body" : "idoall.org_body"}]' http://0.0.0.0:8888
  m2通过curl发送一条http请求,由于默认使用的是jsonHandler,数据格式必须是指定的json格式:
[root@localhost conf]# curl -X POST -d '[{ "headers" :{"flag" : "c"},"body" : "idoall.org_body"}]' http://0.0.0.0:8888
发现m3均能正确收到消息
 
 
!!!!4.HDFS Sink
此Sink将事件写入到Hadoop分布式文件系统HDFS中。
目前它支持创建文本文件和序列化文件。
对这两种格式都支持压缩。
这些文件可以分卷,按照指定的时间或数据量或事件的数量为基础。
它还通过类似时间戳或机器属性对数据进行 buckets/partitions 操作    It also buckets/partitions data by attributes like timestamp or machine where the event originated.
HDFS的目录路径可以包含将要由HDFS替换格式的转移序列用以生成存储事件的目录/文件名。
使用这个Sink要求haddop必须已经安装好,以便Flume可以通过hadoop提供的jar包与HDFS进行通信。
注意,此版本hadoop必须支持sync()调用。
 
属性说明:
!channel –  
!type – 类型名称,必须是“HDFS”
!hdfs.path – HDFS 目录路径 (eg hdfs://namenode/flume/webdata/)
hdfs.filePrefix FlumeData Flume在目录下创建文件的名称前缀
hdfs.fileSuffix – 追加到文件的名称后缀 (eg .avro - 注: 日期时间不会自动添加)
hdfs.inUsePrefix – Flume正在处理的文件所加的前缀
hdfs.inUseSuffix .tmp Flume正在处理的文件所加的后缀
hdfs.rollInterval 30 Number of seconds to wait before rolling current file (0 = never roll based on time interval)
hdfs.rollSize 1024 File size to trigger roll, in bytes (0: never roll based on file size)
hdfs.rollCount 10 Number of events written to file before it rolled (0 = never roll based on number of events)
hdfs.idleTimeout 0 Timeout after which inactive files get closed (0 = disable automatic closing of idle files)
hdfs.batchSize 100 number of events written to file before it is flushed to HDFS
hdfs.codeC – Compression codec. one of following : gzip, bzip2, lzo, lzop, snappy
hdfs.fileType SequenceFile File format: currently SequenceFile, DataStream or CompressedStream (1)DataStream will not compress output file and please don’t set codeC (2)CompressedStream requires set hdfs.codeC with an available codeC
hdfs.maxOpenFiles 5000 Allow only this number of open files. If this number is exceeded, the oldest file is closed.
hdfs.minBlockReplicas – Specify minimum number of replicas per HDFS block. If not specified, it comes from the default Hadoop config in the classpath.
hdfs.writeFormat – Format for sequence file records. One of “Text” or “Writable” (the default).
hdfs.callTimeout 10000 Number of milliseconds allowed for HDFS operations, such as open, write, flush, close. This number should be increased if many HDFS timeout operations are occurring.
hdfs.threadsPoolSize 10 Number of threads per HDFS sink for HDFS IO ops (open, write, etc.)
hdfs.rollTimerPoolSize 1 Number of threads per HDFS sink for scheduling timed file rolling
hdfs.kerberosPrincipal – Kerberos user principal for accessing secure HDFS
hdfs.kerberosKeytab – Kerberos keytab for accessing secure HDFS
hdfs.proxyUser    
hdfs.round false 时间戳是否向下取整(如果是true,会影响所有基于时间的转移序列,除了%T)
hdfs.roundValue 1 舍值的边界值
hdfs.roundUnit 向下舍值的单位 -  second, minute , hour
hdfs.timeZone Local Time Name of the timezone that should be used for resolving the directory path, e.g. America/Los_Angeles.
hdfs.useLocalTimeStamp false Use the local time (instead of the timestamp from the event header) while replacing the escape sequences.
hdfs.closeTries 0 Number of times the sink must try renaming a file, after initiating a close attempt. If set to 1, this sink will not re-try a failed rename (due to, for example, NameNode or DataNode failure), and may leave the file in an open state with a .tmp extension. If set to 0, the sink will try to rename the file until the file is eventually renamed (there is no limit on the number of times it would try). The file may still remain open if the close call fails but the data will be intact and in this case, the file will be closed only after a Flume restart.
hdfs.retryInterval 180 Time in seconds between consecutive attempts to close a file. Each close call costs multiple RPC round-trips to the Namenode, so setting this too low can cause a lot of load on the name node. If set to 0 or less, the sink will not attempt to close the file if the first attempt fails, and may leave the file open or with a ”.tmp” extension.
serializer TEXT Other possible options include avro_event or the fully-qualified class name of an implementation of the EventSerializer.Builder interface.
 
案例:
编写配置文件:
#命名Agent组件
a1.sources=r1
a1.sinks=k1
a1.channels=c1
 
#描述/配置Source
a1.sources.r1.type=http
a1.sources.r1.port=8888
#描述Sink
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs://0.0.0.0:9000/ppp
#描述内存Channel
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=1000
#为Channel绑定Source和Sink
a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1
启动flume:
./flume-ng agent --conf ../conf --conf-file ../conf/template9.conf --name a1 -Dflume.root.logger=INFO,console
 
!!!!5.Hive Sink
6.Custom Sink
自定义接收器,是自己实现的接收器接口Sink来实现的。
自定义接收器的类及其依赖类须在Flume启动前放置到Flume类加载目录下。
 
属性说明:
type – 类型,需要指定为自己实现的Sink类的全路径名

Sink - 汇聚点的更多相关文章

  1. 痞子衡嵌入式:飞思卡尔i.MX RTyyyy系列MCU硬件那些事(2.2)- 在串行NOR Flash XIP调试原理

    大家好,我是痞子衡,是正经搞技术的痞子.今天痞子衡给大家介绍的是飞思卡尔i.MX RTyyyy系列EVK在串行NOR Flash调试的原理. 本文是i.MXRT硬件那些事系列第二篇的续集,在第二篇首集 ...

  2. Flink的基本概念

    Stream.Transformation.Operator 用户实现的Flink程序是由Stream和Transformation这两个基本构建块组成,其中Stream是一个中间结果数据,而Tran ...

  3. XSS语义分析的阶段性总结(一)

    本文作者:Kale 前言 由于X3Scan的研发已经有些进展了,所以对这一阶段的工作做一下总结!对于X3Scan的定位,我更加倾向于主动+被动的结合.主动的方面主要体现在可以主动抓取页面链接并发起请求 ...

  4. docker4dotnet #2 容器化主机

    .NET 猿自从认识了小鲸鱼,感觉功力大增.上篇<docker4dotnet #1 前世今生&世界你好>中给大家介绍了如何在Windows上面配置Docker for Window ...

  5. Webpack中hash与chunkhash的区别,以及js与css的hash指纹解耦方案

    文件的hash指纹通常作为前端静态资源实现增量更新的方案之一,Webpack是目前最流行的开源编译工具之一,其强大的功能也带来很多坑(当然,大部分麻烦其实都可以在官方文档中找到答案). 比如,在Web ...

  6. UNIX:高级环境编程 - 第十五章 IPC:进程间通信

    IPC(InterProcess Communication)进程间通信.为啥没有进程间通信,这是因为进程间都是同步的关系,不需要通信. 1.管道 1.1管道特点: (1)半双工的(即数据只能在一个方 ...

  7. CCNA网络工程师学习进程(3)常规网络设计模型与基本的网络协议

        本节介绍分层的网络设计模型与基本的网络协议,包括ARP协议,ICMP协议和IP协议.     (1)三层网络架构: 一个好的园区网设计应该是一个分层的设计.一般分为接入层.汇聚层(分布层).核 ...

  8. Peer-to-Peer 综述

    Peer-To-Peer 网络介绍 最近几年,Peer-to-Peer (对等计算,简称P2P) 迅速成为计算机界关注的热门话题之一,财富杂志更将P2P列为影响Internet未来的四项科技之一. “ ...

  9. Flume 入门--几种不同的Sources

    1.flume概念 flume是分布式的,可靠的,高可用的,用于对不同来源的大量的日志数据进行有效收集.聚集和移动,并以集中式的数据存储的系统. flume目前是apache的一个顶级项目. flum ...

随机推荐

  1. iis与编辑

    hostname:域名initializationPage:对应域名下任意可访问action

  2. Babel 转译 class 过程窥探--------引用

    // Shape 类function Shape(id, x, y) {    this.id = id;    this.setLocation(x, y);}// 设置坐标的原型方法Shape.p ...

  3. golang web实战之三(基于iris框架的 web小应用,数据库采用 sqlite3 )

    一.效果:一个图片应用 1.可上传图片到uploads目录. 2.可浏览和评论图片(用富文本编辑器输入) 二.梳理一下相关知识: 1.iris框架(模板输出,session) 2.富文本编辑器.sql ...

  4. CCPC-Wannafly & Comet OJ 夏季欢乐赛(2019)E

    题面 这个题暴好啊,考了很多东西. 首先设f(x)为离终点还有x步要走的期望步数,我们可以发现 : 1.x>=k时,x可以转移到的点的下标都<x. 2.x<k时,则可能走回到x或者下 ...

  5. 代码审计之seacms v6.54 前台Getshell 复现分析

    1.环境: php5.5.38+apache+seacms v6.54 上一篇文章针对seacms v6.45 进行了分析,官方给出针对修复前台geishell提供的方法为增加: $order = ( ...

  6. Spring Boot 入门之消息中间件篇(转发)

    一.前言 在消息中间件中有 2 个重要的概念:消息代理和目的地.当消息发送者发送消息后,消息就被消息代理接管,消息代理保证消息传递到指定目的地. 我们常用的消息代理有 JMS 和 AMQP 规范.对应 ...

  7. 用 dnSpy 反编译调试 .NET 程序

    dnSpy 官网下载:https://github.com/0xd4d/dnSpy/releases 运行需要 .NET Framework 4 环境:https://dotnet.microsoft ...

  8. python--nolocal

    Compare this, without using nonlocal: x = 0def outer(): x = 1 def inner(): x = 2 print("inner:& ...

  9. 如何修改phpstorm的缓存目录

    相信使用phpstorm的人们都被缓存目录的大小困扰过.怎么修改到其它地方呢? 1. 找到 idea.properties 文件,配置信息都在此文件中,F:\Program Files\JetBrai ...

  10. shell 部分语法

    语法: variable_name=${variable_name:-xxxx} 如果variable 已经有值,则不被新值覆盖,否则将新值赋给variable split命令切割文件