flume使用之httpSource
flume自带很长多的source,如:exe、kafka...其中有一个非常简单的source——httpsource,使用httpSource,flume启动后会拉起一个web服务来监听指定的ip和port。常用的使用场景:对于有些应用环境中,不能部署Flume SDK及其依赖项,可以在代码中通过HTTP而不是Flume的PRC发送数据的情况,此时HTTP SOURCE可以用来将数据接收到Flume中。
1、httpsource 参数:
配置参数 | 默认值 | 描述 |
type | http (org.apache.fluem.source.httpSource) | |
bind | 绑定的IP地址或主机名 | |
port | 绑定的端口号 | |
enableSSL | false | |
keystore | 使用的keystore文件的路径 | |
keystorePassword | 能够进入keystore的密码 | |
handler | JSONHandler | HTTP SOURCE使用的处理程序类 |
handler.* | 传给处理程序类的任何参数 可以 通过使用此参数(*)配置传入 |
1)handler:
Flume使用一个可插拔的“handler”程序来实现转换,如果不指定默认是:JSONHandler,它能处理JSON格式的事件,格式如下。此外用户可以自定义handler,必须实现HTTPSourceHandler接口。
json数据格式:
- [ { "headers":{"":"","":""
- },
- "body":"the first event"
- },
- { "headers":{"":"","":""
- },
- "body":"the second event"
- }
- ]
2、简单介绍一下flume的logger sink:
记录INFO级别的日志,一般用于调试。本文将使用这种类型的sink,配置的属性:
- type logger
- maxBytesToLog 16 Maximum number of bytes of the Event body to log
注意:要求必须在 --conf 参数指定的目录下有 log4j的配置文件,可以通过-Dflume.root.logger=INFO,console在命令启动时手动指定log4j参数。
3、简单的httpSource实例:
1)下载flume、解压:
- cd /usr/local/
- wget http://mirror.bit.edu.cn/apache/flume/1.7.0/apache-flume-1.7.0-bin.tar.gz
- tar -xvzf apache-flume-1.7.9-bin.tar.gz
配置flume的环境变量:
- vim /etc/profile
- export PS1="[\u@`/sbin/ifconfig eth0|grep 'inet '|awk -F'[: ]+' '{print $4}'` \W]"'$ '
- export FLUME_HOME=/usr/local/apache-flume-1.6.0-bin
- export PATH=$PATH:$FLUME_HOME/bin
2)安装jdk、配置环境变量;
3)配置flume:
- cd /usr/local/flume/conf
- vim flume-env.sh
指定java_home,同时放入如下log4j.properties
- ### set log levels ###
- log4j.rootLogger = info,stdout , D , E
- ###
- log4j.appender.stdout = org.apache.log4j.ConsoleAppender
- log4j.appender.stdout.Target = System.out
- log4j.appender.stdout.layout = org.apache.log4j.PatternLayout
- log4j.appender.stdout.layout.ConversionPattern = [%d{MM-dd HH:mm:ss}] [%p] [%c:%L] %m%n
- ### 输出到日志文件 ###
- log4j.appender.D = org.apache.log4j.DailyRollingFileAppender
- log4j.appender.D.File = /data/logs/flume/flume.log
- log4j.appender.D.Append = true
- log4j.appender.D.Threshold = info
- log4j.appender.D.layout = org.apache.log4j.PatternLayout
- log4j.appender.D.layout.ConversionPattern = [%d{MM-dd HH:mm:ss}] [%p] [%c:%L] %m%n
- ### 保存异常信息到单独文件 ###
- log4j.appender.E = org.apache.log4j.DailyRollingFileAppender
- log4j.appender.E.File =/data/logs/flume/flume_error.log
- log4j.appender.E.Append = true
- log4j.appender.E.Threshold = ERROR
- log4j.appender.E.layout = org.apache.log4j.PatternLayout
- log4j.appender.E.layout.ConversionPattern = [%d{MM-dd HH:mm:ss}] [%p] [%c:%L] %m%n
- ### sink
- log4j.logger.com.iqiyi.ttbrain.log.flume.sink.MysqlSink= INFO, F, EE
- log4j.additivity.com.iqiyi.ttbrain.log.flume.sink.MysqlSink = false
- log4j.appender.F= org.apache.log4j.DailyRollingFileAppender
- log4j.appender.F.File=/data/logs/flume/flume_sink.log
- log4j.appender.F.Append = true
- log4j.appender.F.Threshold = info
- log4j.appender.F.layout=org.apache.log4j.PatternLayout
- log4j.appender.F.layout.ConversionPattern= [%d{MM-dd HH:mm:ss}] [%p] [%c:%L] %m%n
- log4j.appender.EE= org.apache.log4j.DailyRollingFileAppender
- log4j.appender.EE.File=/data/logs/flume/flume_sink_error.log
- log4j.appender.EE.Append = true
- log4j.appender.EE.Threshold = ERROR
- log4j.appender.EE.layout=org.apache.log4j.PatternLayout
- log4j.appender.EE.layout.ConversionPattern= [%d{MM-dd HH:mm:ss}] [%p] [%c:%L] %m%n
4)配置httpSource:
- cd /usr/local/flume/conf
- vim http_test.conf
- a1.sources=r1
- a1.sinks=k1
- a1.channels=c1
- a1.sources.r1.type=http
- a1.sources.r1.bind=localhost
- a1.sources.r1.port=50000
- a1.sources.r1.channels=c1
- a1.sinks.k1.type=logger
- a1.sinks.k1.channel=c1
- a1.channels.c1.type=memory
- a1.channels.c1.capacity=1000
- a1.channels.c1.transactionCapacity=100
5)启动flume:
- flume-ng agent -c /usr/local/flume/conf/ -f /usr/local/flume/conf/http_test.conf -n a1
6)测试:
开一个shell窗口,输入命令:
- curl -X POST -d'[{"headers":{"h1":"v1","h2":"v2"},"body":"hello body"}]' http://localhost:50000
在/data/log/flume/flume.log 文件中可以看到:
- [09-29 10:31:12] [INFO] [org.apache.flume.sink.LoggerSink:94] Event: { headers:{h1=v1, h2=v2} body: 68 65 6C 6C 6F 20 62 6F 64 79 hello body }
4、自定义handler:
假定xml请求格式,期望格式如下:
- <events>
- <event>
- <headers><header1>value1</header1></headers>
- <body>test</body>
- </event>
- <event>
- <headers><header1>value1</header1></headers>
- <body>test2</body>
- </event>
- </events>
1)pom.xml
- <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
- xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
- <modelVersion>4.0.0</modelVersion>
- <groupId>org.pq</groupId>
- <artifactId>flume-demo</artifactId>
- <packaging>jar</packaging>
- <version>1.0</version>
- <name>flume-demo Maven jar</name>
- <url>http://maven.apache.org</url>
- <dependencies>
- <dependency>
- <groupId>junit</groupId>
- <artifactId>junit</artifactId>
- <version>4.8.2</version>
- <scope>test</scope>
- </dependency>
- <dependency>
- <groupId>org.slf4j</groupId>
- <artifactId>slf4j-log4j12</artifactId>
- <version>1.7.7</version>
- <scope>compile</scope>
- </dependency>
- <dependency>
- <groupId>org.apache.flume</groupId>
- <artifactId>flume-ng-core</artifactId>
- <version>1.6.0</version>
- <scope>compile</scope>
- </dependency>
- </dependencies>
- <build>
- <finalName>flume-demo</finalName>
- </build>
- </project>
2)自定义handler:
- package org.pq.flumeDemo.sources;
- import com.google.common.base.Preconditions;
- import org.apache.flume.Context;
- import org.apache.flume.Event;
- import org.apache.flume.event.EventBuilder;
- import org.apache.flume.source.http.HTTPBadRequestException;
- import org.apache.flume.source.http.HTTPSourceHandler;
- import org.slf4j.Logger;
- import org.slf4j.LoggerFactory;
- import org.w3c.dom.Document;
- import org.w3c.dom.Element;
- import org.w3c.dom.Node;
- import org.w3c.dom.NodeList;
- import org.xml.sax.SAXException;
- import javax.servlet.http.HttpServletRequest;
- import javax.xml.parsers.DocumentBuilder;
- import javax.xml.parsers.DocumentBuilderFactory;
- import java.util.ArrayList;
- import java.util.HashMap;
- import java.util.List;
- import java.util.Map;
- public class HTTPSourceXMLHandler implements HTTPSourceHandler {
- private final String ROOT = "events";
- private final String EVENT_TAG = "event";
- private final String HEADERS_TAG = "headers";
- private final String BODY_TAG = "body";
- private final String CONF_INSERT_TIMESTAMP = "insertTimestamp";
- private final String TIMESTAMP_HEADER = "timestamp";
- private final DocumentBuilderFactory documentBuilderFactory
- = DocumentBuilderFactory.newInstance();
- // Document builders are not thread-safe.
- // So make sure we have one for each thread.
- private final ThreadLocal<DocumentBuilder> docBuilder
- = new ThreadLocal<DocumentBuilder>();
- private boolean insertTimestamp;
- private static final Logger LOG = LoggerFactory.getLogger(HTTPSourceXMLHandler.class);
- public List<Event> getEvents(HttpServletRequest httpServletRequest) throws HTTPBadRequestException, Exception {
- if (docBuilder.get() == null) {
- docBuilder.set(documentBuilderFactory.newDocumentBuilder());
- }
- Document doc;
- final List<Event> events;
- try {
- doc = docBuilder.get().parse(httpServletRequest.getInputStream());
- Element root = doc.getDocumentElement();
- root.normalize();
- // Verify that the root element is "events"
- Preconditions.checkState(
- ROOT.equalsIgnoreCase(root.getTagName()));
- NodeList nodes = root.getElementsByTagName(EVENT_TAG);
- LOG.info("get nodes={}",nodes);
- int eventCount = nodes.getLength();
- events = new ArrayList<Event>(eventCount);
- for (int i = 0; i < eventCount; i++) {
- Element event = (Element) nodes.item(i);
- // Get all headers. If there are multiple header sections,
- // combine them.
- NodeList headerNodes
- = event.getElementsByTagName(HEADERS_TAG);
- Map<String, String> eventHeaders
- = new HashMap<String, String>();
- for (int j = 0; j < headerNodes.getLength(); j++) {
- Node headerNode = headerNodes.item(j);
- NodeList headers = headerNode.getChildNodes();
- for (int k = 0; k < headers.getLength(); k++) {
- Node header = headers.item(k);
- // Read only element nodes
- if (header.getNodeType() != Node.ELEMENT_NODE) {
- continue;
- }
- // Make sure a header is inserted only once,
- // else the event is malformed
- Preconditions.checkState(
- !eventHeaders.containsKey(header.getNodeName()),
- "Header expected only once " + header.getNodeName());
- eventHeaders.put(
- header.getNodeName(), header.getTextContent());
- }
- }
- Node body = event.getElementsByTagName(BODY_TAG).item(0);
- if (insertTimestamp) {
- eventHeaders.put(TIMESTAMP_HEADER, String.valueOf(System
- .currentTimeMillis()));
- }
- events.add(EventBuilder.withBody(
- body.getTextContent().getBytes(
- httpServletRequest.getCharacterEncoding()),
- eventHeaders));
- }
- } catch (SAXException ex) {
- throw new HTTPBadRequestException(
- "Request could not be parsed into valid XML", ex);
- } catch (Exception ex) {
- throw new HTTPBadRequestException(
- "Request is not in expected format. " +
- "Please refer documentation for expected format.", ex);
- }
- return events;
- }
- public void configure(Context context) {
- insertTimestamp = context.getBoolean(CONF_INSERT_TIMESTAMP,
- false);
- }
- }
打包成dependency,然后放到flume的lib下。
3)flume配置文件:
- a1.sources=r1
- a1.sinks=k1
- a1.channels=c1
- a1.sources.r1.type=http
- a1.sources.r1.bind=localhost
- a1.sources.r1.port=50000
- a1.sources.r1.channels=c1
- a1.sources.r1.handler=org.pq.flumeDemo.sources.HTTPSourceXMLHandler
- a1.sources.r1.insertTimestamp=true
- a1.sinks.k1.type=logger
- a1.sinks.k1.channel=c1
- a1.channels.c1.type=memory
- a1.channels.c1.capacity=1000
- a1.channels.c1.transactionCapacity=100
4)启动:
- $ bin/flume-ng agent -c conf -f conf/http_test.conf -n a1 -Dflume.root.logger=INFO,console
flume使用之httpSource的更多相关文章
- send data to Flume client-sdk flume使用之httpSource
https://flume.apache.org/FlumeDeveloperGuide.html#client-sdk flume使用之httpSource - CSDN博客 https://blo ...
- flume使用示例
flume的特点: flume是一个分布式.可靠.和高可用的海量日志采集.聚合和传输的系统.支持在日志系统中定制各类数据发送方,用于收集数据;同时,Flume提供对数据进行简单处理,并写到各种数据接受 ...
- Flume环境部署和配置详解及案例大全
flume是一个分布式.可靠.和高可用的海量日志采集.聚合和传输的系统.支持在日志系统中定制各类数据发送方,用于收集数据;同时,Flume提供对数据进行简单处理,并写到各种数据接受方(比如文本.HDF ...
- 常见的几种Flume日志收集场景实战
这里主要介绍几种常见的日志的source来源,包括监控文件型,监控文件内容增量,TCP和HTTP. Spool类型 用于监控指定目录内数据变更,若有新文件,则将新文件内数据读取上传 在教你一步搭建Fl ...
- flume安装及入门实例
1. 如何安装? 1)将下载的flume包,解压到/home/hadoop目录中 2)修改 flume-env.sh 配置文件,主要是JAVA_HOME变量设置 root@m1:/home/hadoo ...
- 【翻译】Flume 1.8.0 User Guide(用户指南) Processors
翻译自官网flume1.8用户指南,原文地址:Flume 1.8.0 User Guide 篇幅限制,分为以下5篇: [翻译]Flume 1.8.0 User Guide(用户指南) [翻译]Flum ...
- Flume配置Multiplexing Channel Selector
1 官网内容 上面配置的是根据不同的heder当中state值走不同的channels,如果是CZ就走c1 如果是US就走c2 c3 其他默认走c4 2 我的详细配置信息 一个监听http端口 然后 ...
- 海量日志采集Flume(HA)
海量日志采集Flume(HA) 1.介绍: Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集.聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据 ...
- Flume组件汇总2
Component Interface Type Alias Implementation Class org.apache.flume.Channel memory org.apache.flume ...
随机推荐
- Git图形化界面客户端大汇总
文,还在不断更新,网上搜到的同名文章都是未经同意就从这里复制过去的) 一.TortoiseGit - The coolest Interface to Git Version Control Tort ...
- linux----别名
经常一些命令太长,输入太麻烦,给该命令起个别名,直接执行,简单又方便. 1.查看别名 alias 2.编辑别名 vi ~/.brashrc 3.添加自己的别名 例如:重启网卡 alias netres ...
- 【java多线程】java8的流操作api和fork/join框架
原文:https://blog.csdn.net/u011001723/article/details/52794455/ 一.测试一个案例,说明java8的流操作是并行操作 1.代码 package ...
- golang-test-tool-gotests
gotests介绍 gotests是一个Golang命令行工具 ,让Go测试变得容易.它根据目标源文件的函数和方法签名生成表驱动的测试(TDD).任何测试文件中新的依赖都会被自动倒入 Demo 下面是 ...
- zabbix怎么把英文界面换成中文
虽然能勉勉强强能看懂大部分英文,但感觉还是直接换中文方便上手一点
- mysql(mariadb)新建用户及用户授权管理
仅新建一个newuser用户 方法一: MariaDB [(none)]> create user newuser@localhost identified by '123456'; Query ...
- C++中的抽象类
一.抽象类学习笔记 1.virtual修饰函数(虚函数)后面加=0就称为一个纯虚函数,一个类中只要有纯虚函数那么它就是一个抽象类.抽象类不能用来实例化对象,是用来给他的派生类定义好这些框架的,给使用这 ...
- madlib 集成 hasura graphql-engine 试用
madlib 可以让我们直接在sql 中进行机器学习,集成了强大的sql 能力,以及分析能力,后边会尝试 集成graphql engine ,让功能更强大 docker 镜像准备 使用了一个别人的写好 ...
- 20165308 《Java程序设计》第9周学习总结
20165308 <Java程序设计>第9周学习总结 教材学习内容总结 13章知识总结 获取地址 1.获取Internet上主机的地址 可以使用InetAddress类的静态方法getBy ...
- MySQL 5.7 修改root密码
更新 MySQL 5.7 以后通过以下方法无法在修改root密码: ') where user='root'; 查看下MySQL的官方文档发现版本更新后原来user里的password字段已经变更为a ...