1. 在使用 Regex Filtering Interceptor的时候一个属性是excludeEvents

  当它的值为true 的时候,过滤掉匹配到当前正则表达式的一行

  当它的值为false的时候,就接受匹配到正则表达式的一行

  

Property Name Default Description
type The component type name has to be regex_filter
regex ”.*” Regular expression for matching against events
excludeEvents false If true, regex determines events to exclude, otherwise regex determines events to include.

2. flume interceptors的其它属性

Flume has the capability to modify/drop events in-flight. This is done with the help of interceptors. Interceptors are classes that implement org.apache.flume.interceptor.Interceptor interface. An interceptor can modify or even drop events based on any criteria chosen by the developer of the interceptor. Flume supports chaining of interceptors. This is made possible through by specifying the list of interceptor builder class names in the configuration. Interceptors are specified as a whitespace separated list in the source configuration. The order in which the interceptors are specified is the order in which they are invoked. The list of events returned by one interceptor is passed to the next interceptor in the chain. Interceptors can modify or drop events. If an interceptor needs to drop events, it just does not return that event in the list that it returns. If it is to drop all events, then it simply returns an empty list. Interceptors are named components, here is an example of how they are created through configuration:

a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.interceptors = i1 i2
a1.sources.r1.interceptors.i1.type = org.apache.flume.interceptor.HostInterceptor$Builder
a1.sources.r1.interceptors.i1.preserveExisting = false
a1.sources.r1.interceptors.i1.hostHeader = hostname
a1.sources.r1.interceptors.i2.type = org.apache.flume.interceptor.TimestampInterceptor$Builder
a1.sinks.k1.filePrefix = FlumeData.%{CollectorHost}.%Y-%m-%d
a1.sinks.k1.channel = c1

Note that the interceptor builders are passed to the type config parameter. The interceptors are themselves configurable and can be passed configuration values just like they are passed to any other configurable component. In the above example, events are passed to the HostInterceptor first and the events returned by the HostInterceptor are then passed along to the TimestampInterceptor. You can specify either the fully qualified class name (FQCN) or the alias timestamp. If you have multiple collectors writing to the same HDFS path, then you could also use the HostInterceptor.

Timestamp Interceptor

This interceptor inserts into the event headers, the time in millis at which it processes the event. This interceptor inserts a header with key timestamp whose value is the relevant timestamp. This interceptor can preserve an existing timestamp if it is already present in the configuration.

Property Name Default Description
type The component type name, has to be timestamp or the FQCN
preserveExisting false If the timestamp already exists, should it be preserved - true or false

Example for agent named a1:

a1.sources = r1
a1.channels = c1
a1.sources.r1.channels = c1
a1.sources.r1.type = seq
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = timestamp

Host Interceptor

This interceptor inserts the hostname or IP address of the host that this agent is running on. It inserts a header with key host or a configured key whose value is the hostname or IP address of the host, based on configuration.

Property Name Default Description
type The component type name, has to be host
preserveExisting false If the host header already exists, should it be preserved - true or false
useIP true Use the IP Address if true, else use hostname.
hostHeader host The header key to be used.

Example for agent named a1:

a1.sources = r1
a1.channels = c1
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = host
a1.sources.r1.interceptors.i1.hostHeader = hostname

Static Interceptor

Static interceptor allows user to append a static header with static value to all events.

The current implementation does not allow specifying multiple headers at one time. Instead user might chain multiple static interceptors each defining one static header.

Property Name Default Description
type The component type name, has to be static
preserveExisting true If configured header already exists, should it be preserved - true or false
key key Name of header that should be created
value value Static value that should be created

Example for agent named a1:

a1.sources = r1
a1.channels = c1
a1.sources.r1.channels = c1
a1.sources.r1.type = seq
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = static
a1.sources.r1.interceptors.i1.key = datacenter
a1.sources.r1.interceptors.i1.value = NEW_YORK

UUID Interceptor

This interceptor sets a universally unique identifier on all events that are intercepted. An example UUID is b5755073-77a9-43c1-8fad-b7a586fc1b97, which represents a 128-bit value.

Consider using UUIDInterceptor to automatically assign a UUID to an event if no application level unique key for the event is available. It can be important to assign UUIDs to events as soon as they enter the Flume network; that is, in the first Flume Source of the flow. This enables subsequent deduplication of events in the face of replication and redelivery in a Flume network that is designed for high availability and high performance. If an application level key is available, this is preferable over an auto-generated UUID because it enables subsequent updates and deletes of event in data stores using said well known application level key.

Property Name Default Description
type The component type name has to be org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
headerName id The name of the Flume header to modify
preserveExisting true If the UUID header already exists, should it be preserved - true or false
prefix “” The prefix string constant to prepend to each generated UUID

Morphline Interceptor

This interceptor filters the events through a morphline configuration file that defines a chain of transformation commands that pipe records from one command to another. For example the morphline can ignore certain events or alter or insert certain event headers via regular expression based pattern matching, or it can auto-detect and set a MIME type via Apache Tika on events that are intercepted. For example, this kind of packet sniffing can be used for content based dynamic routing in a Flume topology. MorphlineInterceptor can also help to implement dynamic routing to multiple Apache Solr collections (e.g. for multi-tenancy).

Currently, there is a restriction in that the morphline of an interceptor must not generate more than one output record for each input event. This interceptor is not intended for heavy duty ETL processing - if you need this consider moving ETL processing from the Flume Source to a Flume Sink, e.g. to a MorphlineSolrSink.

Required properties are in bold.

Property Name Default Description
type The component type name has to be org.apache.flume.sink.solr.morphline.MorphlineInterceptor$Builder
morphlineFile The relative or absolute path on the local file system to the morphline configuration file. Example: /etc/flume-ng/conf/morphline.conf
morphlineId null Optional name used to identify a morphline if there are multiple morphlines in a morphline config file

Sample flume.conf file:

a1.sources.avroSrc.interceptors = morphlineinterceptor
a1.sources.avroSrc.interceptors.morphlineinterceptor.type = org.apache.flume.sink.solr.morphline.MorphlineInterceptor$Builder
a1.sources.avroSrc.interceptors.morphlineinterceptor.morphlineFile = /etc/flume-ng/conf/morphline.conf
a1.sources.avroSrc.interceptors.morphlineinterceptor.morphlineId = morphline1

Search and Replace Interceptor

This interceptor provides simple string-based search-and-replace functionality based on Java regular expressions. Backtracking / group capture is also available. This interceptor uses the same rules as in the Java Matcher.replaceAll() method.

Property Name Default Description
type The component type name has to be search_replace
searchPattern The pattern to search for and replace.
replaceString The replacement string.
charset UTF-8 The charset of the event body. Assumed by default to be UTF-8.

Example configuration:

a1.sources.avroSrc.interceptors = search-replace
a1.sources.avroSrc.interceptors.search-replace.type = search_replace # Remove leading alphanumeric characters in an event body.
a1.sources.avroSrc.interceptors.search-replace.searchPattern = ^[A-Za-z0-9_]+
a1.sources.avroSrc.interceptors.search-replace.replaceString =

Another example:

a1.sources.avroSrc.interceptors = search-replace
a1.sources.avroSrc.interceptors.search-replace.type = search_replace # Use grouping operators to reorder and munge words on a line.
a1.sources.avroSrc.interceptors.search-replace.searchPattern = The quick brown ([a-z]+) jumped over the lazy ([a-z]+)
a1.sources.avroSrc.interceptors.search-replace.replaceString = The hungry $2 ate the careless $1

Regex Filtering Interceptor

This interceptor filters events selectively by interpreting the event body as text and matching the text against a configured regular expression. The supplied regular expression can be used to include events or exclude events.

Property Name Default Description
type The component type name has to be regex_filter
regex ”.*” Regular expression for matching against events
excludeEvents false If true, regex determines events to exclude, otherwise regex determines events to include.

Regex Extractor Interceptor

This interceptor extracts regex match groups using a specified regular expression and appends the match groups as headers on the event. It also supports pluggable serializers for formatting the match groups before adding them as event headers.

Property Name Default Description
type The component type name has to be regex_extractor
regex Regular expression for matching against events
serializers Space-separated list of serializers for mapping matches to header names and serializing their values. (See example below) Flume provides built-in support for the following serializers: org.apache.flume.interceptor.RegexExtractorInterceptorPassThroughSerializer org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer
serializers.<s1>.type default Must be default (org.apache.flume.interceptor.RegexExtractorInterceptorPassThroughSerializer), org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer, or the FQCN of a custom class that implements org.apache.flume.interceptor.RegexExtractorInterceptorSerializer
serializers.<s1>.name  
serializers.* Serializer-specific properties

The serializers are used to map the matches to a header name and a formatted header value; by default, you only need to specify the header name and the default org.apache.flume.interceptor.RegexExtractorInterceptorPassThroughSerializer will be used. This serializer simply maps the matches to the specified header name and passes the value through as it was extracted by the regex. You can plug custom serializer implementations into the extractor using the fully qualified class name (FQCN) to format the matches in anyway you like.

Example 1:

If the Flume event body contained 1:2:3.4foobar5 and the following configuration was used

a1.sources.r1.interceptors.i1.regex = (\\d):(\\d):(\\d)
a1.sources.r1.interceptors.i1.serializers = s1 s2 s3
a1.sources.r1.interceptors.i1.serializers.s1.name = one
a1.sources.r1.interceptors.i1.serializers.s2.name = two
a1.sources.r1.interceptors.i1.serializers.s3.name = three

The extracted event will contain the same body but the following headers will have been added one=>1, two=>2, three=>3

Example 2:

If the Flume event body contained 2012-10-18 18:47:57,614 some log line and the following configuration was used

a1.sources.r1.interceptors.i1.regex = ^(?:\\n)?(\\d\\d\\d\\d-\\d\\d-\\d\\d\\s\\d\\d:\\d\\d)
a1.sources.r1.interceptors.i1.serializers = s1
a1.sources.r1.interceptors.i1.serializers.s1.type = org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer
a1.sources.r1.interceptors.i1.serializers.s1.name = timestamp
a1.sources.r1.interceptors.i1.serializers.s1.pattern = yyyy-MM-dd HH:mm

the extracted event will contain the same body but the following headers will have been added timestamp=>1350611220000

Flume interceptor 使用注意事项的更多相关文章

  1. Flume NG之Interceptor简介

    转载地址:http://www.cnblogs.com/lxf20061900/p/3658172.html 有的时候希望通过Flume将读取的文件再细分存储,比如讲source的数据按照业务类型分开 ...

  2. Flume 拦截器(interceptor)详解

    flume 拦截器(interceptor)1.flume拦截器介绍拦截器是简单的插件式组件,设置在source和channel之间.source接收到的事件event,在写入channel之前,拦截 ...

  3. Hadoop生态圈-Flume的组件之自定义拦截器(interceptor)

    Hadoop生态圈-Flume的组件之自定义拦截器(interceptor) 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 本篇博客只是举例了一个自定义拦截器的方法,测试字节传输速 ...

  4. 【翻译】Flume 1.8.0 User Guide(用户指南) Processors

    翻译自官网flume1.8用户指南,原文地址:Flume 1.8.0 User Guide 篇幅限制,分为以下5篇: [翻译]Flume 1.8.0 User Guide(用户指南) [翻译]Flum ...

  5. 一次flume exec source采集日志到kafka因为单条日志数据非常大同步失败的踩坑带来的思考

    本次遇到的问题描述,日志采集同步时,当单条日志(日志文件中一行日志)超过2M大小,数据无法采集同步到kafka,分析后,共踩到如下几个坑.1.flume采集时,通过shell+EXEC(tail -F ...

  6. flume 配置

    [root@dtpweb data]#tar -zxvf apache-flume-1.7.0-bin.tar.gz[root@dtpweb conf]# cp flume-env.sh.templa ...

  7. Flume -- 开源分布式日志收集系统

    Flume是Cloudera提供的一个高可用的.高可靠的开源分布式海量日志收集系统,日志数据可以经过Flume流向需要存储终端目的地.这里的日志是一个统称,泛指文件.操作记录等许多数据. 一.Flum ...

  8. flume+kafka+smart数据接入实施手册

    1.  概述 本手册主要介绍了,一个将传统数据接入到Hadoop集群的数据接入方案和实施方法.供数据接入和集群运维人员参考. 1.1.   整体方案 Flume作为日志收集工具,监控一个文件目录或者一 ...

  9. Flume + HDFS + Hive日志收集系统

    最近一段时间,负责公司的产品日志埋点与收集工作,搭建了基于Flume+HDFS+Hive日志搜集系统. 一.日志搜集系统架构: 简单画了一下日志搜集系统的架构图,可以看出,flume承担了agent与 ...

随机推荐

  1. 如何点击按钮后在加载外部的Js文件

    或许有朋友遇到过,想等自己点击按钮之后才执行某一个js文件,那么,你运气好,看到了我的代码了哈哈, <html> <head> <title></title& ...

  2. [Android Pro] StarUML 版本破解

    参考:http://bbs.chinapyg.com/thread-79022-1-1.html 官网下载地址 : http://staruml.en.softonic.com 各平台版本均适用,本文 ...

  3. 创建DLL、Lib以及使用DLL、Lib

    1.要在生成DLL文件的同时生成Lib文件,函数声明时前面要加__declspec(dllexport). 可在头文件中如下定义: #ifndef __MYDLL_H#define __MYDLL_H ...

  4. python字符串中插入变量

  5. windows 下的tcping 小插件

    如果把插件放在根目录 就要能过cmd切换到根目录 cd \ c:\>tcping -d -t -i 0.06 www.baidu.com 将文件放在c:\WINDOWS\system32目录下, ...

  6. 怎么把MVC的Controller拆分写到别的类库

    以为很难…… 其实直接继承Controller 并且按MVC_Controllser规则命名. 然后网站项目引用该项目即可.

  7. 重温WCF之WCF传输安全(十三)(2)基于SSL的WCF匿名客户端(转)

    转载地址:http://www.cnblogs.com/lxblog/archive/2012/09/13/2683514.html 这一篇我们利用上一篇制作的证书,来演示一个基于SSL的WCF服务, ...

  8. Delphi中函数定义和声明的位置

    当函数(或过程)A定义在函数(或过程)B之前,那么函数B就可以调用函数A,并且编译成功,例如下面的 procedure TForm1.btn1Click(Sender: TObject); 和   f ...

  9. poj 1008:Maya Calendar(模拟题,玛雅日历转换)

    Maya Calendar Time Limit: 1000MS   Memory Limit: 10000K Total Submissions: 64795   Accepted: 19978 D ...

  10. 信号通讯编程,王明学learn

    信号通讯编程 在Linux系统中,信号(signal)同样也是最为古老的进程间通信机制. 一.信号类型 Linux系统支持的所有信号均定义在/usr/include/asm/signal.h(展示), ...