Flume、Kafka、Storm结合
Todo:
对Flume的sink进行重构,调用kafka的消费生产者(producer)发送消息;
在Sotrm的spout中继承IRichSpout接口,调用kafka的消息消费者(Consumer)来接收消息,然后经过几个自定义的Bolt,将自定义的内容进行输出
Flume -- Kafka
编写KafkaSink
从$KAFKA_HOME/lib下复制
kafka_2.10-0.8.2.1.jar
kafka-clients-0.8.2.1.jar
scala-library-2.10.4.jar
到$FLUME_HOME/lib
在Eclipse新建工程,从$FLUME_HOME/lib下导入
commons-logging-1.1.1.jar
flume-ng-configuration-1.6.0.jar
flume-ng-core-1.6.0.jar
flume-ng-sdk-1.6.0.jar
zkclient-0.3.jar
kafka_2.10-0.8.2.1.jar
kafka-clients-0.8.2.1.jar
scala-library-2.10.4.jar
到工程。
新建文件KafkaSink.java
import java.util.Properties; import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig; import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.flume.Channel;
import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.Transaction;
import org.apache.flume.conf.Configurable;
import org.apache.flume.sink.AbstractSink; public class KafkaSink extends AbstractSink implements Configurable {
private static final Log logger = LogFactory.getLog(KafkaSink.class); private String topic;
private Producer<String, String> producer; public void configure(Context context) {
topic = "flume_test";
Properties props = new Properties();
props.setProperty("metadata.broker.list", "localhost:9092");
props.setProperty("serializer.class", "kafka.serializer.StringEncoder");
props.put("zookeeper.connect", "localhost:2181");
props.setProperty("num.partitions", "4"); //
props.put("request.required.acks", "1");
ProducerConfig config = new ProducerConfig(props);
producer = new Producer<String, String>(config);
logger.info("KafkaSink初始化完成."); } public Status process() throws EventDeliveryException {
Channel channel = getChannel();
Transaction tx = channel.getTransaction();
try {
tx.begin();
Event e = channel.take();
if (e == null) {
tx.rollback();
return Status.BACKOFF;
}
KeyedMessage<String, String> data = new KeyedMessage<String, String>(topic, new String(e.getBody()));
producer.send(data);
logger.info("flume向kafka发送消息:" + new String(e.getBody()));
tx.commit();
return Status.READY;
} catch (Exception e) {
logger.error("Flume KafkaSinkException:", e);
tx.rollback();
return Status.BACKOFF;
} finally {
tx.close();
}
}
}
导出jar包,放到$FLUME_HOME/lib下
(File->Export->Jar File 全部默认参数)
创建kafka.conf
a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5140
a1.sources.r1.host = localhost
a1.sources.r1.channels = c1 # Describe the sink
a1.sinks.k1.type = KafkaSink # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
测试
启动kafka
cd ~/app/kafka
./bin/zookeeper-server-start.sh ./config/zookeeper.properties> /dev/null &
./bin/kafka-server-start.sh ./config/server.properties > /dev/null &
创建topic
~/app/kafka_2.-0.8.2.1/bin/kafka-topics.sh --create --zookeeper localhost: --replication-factor --partitions --topic flume_test
启动控制台消费者
~/app/kafka_2.-0.8.2.1/bin/kafka-console-consumer.sh --zookeeper localhost: --topic flume_test --from-beginning
启动flume agent
flume-ng agent -c conf -f ~/test/kafka.conf --name a1 -Dflume.root.logger=INFO,console
发送消息
echo "hey manhua" |nc localhost
echo "nice shot" |nc localhost
flume和kafka结合的一个工具
https://github.com/kevinjmh/flumeng-kafka-plugin/tree/master/flumeng-kafka-plugin/src/main/java/org/apache/flume/plugins
Kafka -- Storm
http://storm.apache.org/index.html
下载-解压-修改/etc/profile
在Eclipse新建maven工程,其中pom.xml文件填入如下:
<?xml version="1.0" encoding="utf-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>manhua</groupId>
<artifactId>kafka-storm-test</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>kafka-storm</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<repositories>
<repository>
<id>github-releases</id>
<url>http://oss.sonatype.org/content/repositories/github-releases/</url>
</repository>
<repository>
<id>clojars.org</id>
<url>http://clojars.org/repo</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
<version>0.8.2.1</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.14</version>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>0.10.0</version>
<!-- keep storm out of the jar-with-dependencies -->
<scope>provided</scope>
</dependency>
<dependency>
<groupId>commons-collections</groupId>
<artifactId>commons-collections</artifactId>
<version>3.2.1</version>
</dependency>
</dependencies>
</project>
在src/main/java创建两个java文件
KafkaSpouttest.java
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import kafka.consumer.ConsumerConfig;
import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;
import kafka.javaapi.consumer.ConsumerConnector;
import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.IRichSpout;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values; public class KafkaSpouttest implements IRichSpout { private SpoutOutputCollector collector;
private ConsumerConnector consumer;
private String topic; public KafkaSpouttest() {
} public KafkaSpouttest(String topic) {
this.topic = topic;
} public void nextTuple() {
} public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
this.collector = collector;
} public void ack(Object msgId) {
} public void activate() { consumer = kafka.consumer.Consumer.createJavaConsumerConnector(createConsumerConfig()); Map<String, Integer> topickMap = new HashMap<String, Integer>();
topickMap.put(topic, 1); System.out.println("*********Results********topic:" + topic); Map<String, List<KafkaStream<byte[], byte[]>>> streamMap = consumer.createMessageStreams(topickMap);
KafkaStream<byte[], byte[]> stream = streamMap.get(topic).get(0);
ConsumerIterator<byte[], byte[]> it = stream.iterator();
while (it.hasNext()) {
String value = new String(it.next().message());
SimpleDateFormat formatter = new SimpleDateFormat("yyyy年MM月dd日 HH:mm:ss SSS");
Date curDate = new Date(System.currentTimeMillis());// 获取当前时间
String str = formatter.format(curDate); System.out.println("storm接收到来自kafka的消息------->" + value); collector.emit(new Values(value, 1, str), value);
}
} private static ConsumerConfig createConsumerConfig() {
Properties props = new Properties();
// 设置zookeeper的链接地址
props.put("zookeeper.connect", "localhost:2181");
// 设置group id
props.put("group.id", "1");
// kafka的group 消费记录是保存在zookeeper上的, 但这个信息在zookeeper上不是实时更新的, 需要有个间隔时间更新
props.put("auto.commit.interval.ms", "1000");
props.put("zookeeper.session.timeout.ms", "10000");
return new ConsumerConfig(props);
} public void close() {
} public void deactivate() {
} public void fail(Object msgId) {
} public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word", "id", "time"));
} public Map<String, Object> getComponentConfiguration() {
System.out.println("getComponentConfiguration被调用");
topic = "flume_test";
return null;
}
}
KafkaTopologytest.java
import java.util.HashMap;
import java.util.Map;
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;
import backtype.storm.utils.Utils; public class KafkaTopologytest { public static void main(String[] args) {
TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("spout", new KafkaSpouttest(""), 1);
builder.setBolt("bolt1", new Bolt1(), 2).shuffleGrouping("spout");
builder.setBolt("bolt2", new Bolt2(), 2).fieldsGrouping("bolt1",new Fields("word")); Map conf = new HashMap();
conf.put(Config.TOPOLOGY_WORKERS, 1);
conf.put(Config.TOPOLOGY_DEBUG, true); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("my-flume-kafka-storm-topology-integration", conf, builder.createTopology()); Utils.sleep(1000*60*5); // local cluster test ...
cluster.shutdown();
} public static class Bolt1 extends BaseBasicBolt { public void execute(Tuple input, BasicOutputCollector collector) {
try {
String msg = input.getString(0);
int id = input.getInteger(1);
String time = input.getString(2);
msg = msg+"bolt1";
System.out.println("对消息加工第1次-------[arg0]:"+ msg +"---[arg1]:"+id+"---[arg2]:"+time+"------->"+msg);
if (msg != null) {
collector.emit(new Values(msg));
}
} catch (Exception e) {
e.printStackTrace();
}
} public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
} public static class Bolt2 extends BaseBasicBolt {
Map<String, Integer> counts = new HashMap<String, Integer>(); public void execute(Tuple tuple, BasicOutputCollector collector) {
String msg = tuple.getString(0);
msg = msg + "bolt2";
System.out.println("对消息加工第2次---------->"+msg);
collector.emit(new Values(msg,1));
} public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word", "count"));
}
}
}
测试
接着上面Flume-Kafka的测试,保证kafka已经启动,以及创建了对应的topic
# 启动storm之前必须启动zookeeper # 启动storm
storm nimbus &
storm supervisor &
storm ui &
# 打开浏览器地址http://localhost:8080 看到界面表示启动成功
测试1
启动控制台的生产者和消费者
~/app/kafka_2.-0.8.2.1/bin/kafka-console-producer.sh --broker-list localhost: --topic flume_test ~/app/kafka_2.-0.8.2.1/bin/kafka-console-consumer.sh --zookeeper localhost: --topic flume_test --from-beginning
右键工程中KafkaTopologytest.java运行storm程序
现在在运行生产者的控制台输入值,在消费者和Eclipse都会有显示
测试2
从$KAFKA_HOME/lib下复制
kafka_2.10-0.8.2.1.jar
kafka-clients-0.8.2.1.jar
scala-library-2.10.4.jar
metrics-core-2.2.0.jar
zkclient-0.3.jar
zookeeper-3.4.6.jar
到$STORM_HOME/lib
类似上面的方法导出jar包(File->Export->Jar File 全部默认参数),放到任意目录下
使用storm执行jar包
storm jar kafkaSpout.jar KafkaTopologytest
启动流程:zookeeper - kafka - storm - flume
Ref:http://www.aboutyun.com/thread-8915-1-1.html
Flume、Kafka、Storm结合的更多相关文章
- 简单测试flume+kafka+storm的集成
集成 Flume/kafka/storm 是为了收集日志文件而引入的方法,最终将日志转到storm中进行分析.storm的分析方法见后面文章,这里只讨论集成方法. 以下为具体步骤及测试方法: 1.分别 ...
- Flume+Kafka+Storm+Hbase+HDSF+Poi整合
Flume+Kafka+Storm+Hbase+HDSF+Poi整合 需求: 针对一个网站,我们需要根据用户的行为记录日志信息,分析对我们有用的数据. 举例:这个网站www.hongten.com(当 ...
- Flume+Kafka+Storm整合
Flume+Kafka+Storm整合 1. 需求: 有一个客户端Client可以产生日志信息,我们需要通过Flume获取日志信息,再把该日志信息放入到Kafka的一个Topic:flume-to-k ...
- 大数据处理框架之Strom:Flume+Kafka+Storm整合
环境 虚拟机:VMware 10 Linux版本:CentOS-6.5-x86_64 客户端:Xshell4 FTP:Xftp4 jdk1.8 storm-0.9 apache-flume-1.6.0 ...
- Flume+Kafka+storm的连接整合
Flume-ng Flume是一个分布式.可靠.和高可用的海量日志采集.聚合和传输的系统. Flume的文档可以看http://flume.apache.org/FlumeUserGuide.html ...
- flume+kafka+storm+mysql架构设计
前段时间学习了storm,最近刚开blog,就把这些资料放上来供大家参考. 这个框架用的组件基本都是最新稳定版本,flume-ng1.4+kafka0.8+storm0.9+mysql (项目是mav ...
- 一次简单的springboot+dubbo+flume+kafka+storm+redis系统
最近无事学习一下,用springboot+dubbo+flume+kafka+storm+redis做了一个简单的scenic系统 scenicweb:展现层,springboot+dubbo sce ...
- Flume+Kafka+Storm+Redis 大数据在线实时分析
1.实时处理框架 即从上面的架构中我们可以看出,其由下面的几部分构成: Flume集群 Kafka集群 Storm集群 从构建实时处理系统的角度出发,我们需要做的是,如何让数据在各个不同的集群系统之间 ...
- flume+kafka+storm
centos06.6+JDK1.7 flume1.4+kafka2.10+storm0.9.3 zookeeper3.4.6 集群: 192.168.80.133 x01 192.168.80.134 ...
- flume+kafka+storm单机部署
flume-1.6.0 kafka0.9.0.0 storm0.9.6 一.部署flume 1.解压 tar -xzvf apache-flume-1.6.0-bin.tar.gz -C ../app ...
随机推荐
- 蓝牙遥控小车设计(三)——Amarino和 Android手机重力感应控制
最近事真是多啊,一件接着一件的,加上自己拖延症~ - -! 遥控小车基本完成了,只是自己没时间来更新. 现在更新手机控制的部分 首先我们要熟悉一个软件—— 官网地址:http://www.amarin ...
- java反射(基本知识)
在java中反射降低了模块间的依赖性这个过程称解耦---高内聚,低耦合 在java中,万物皆对象,则将字节码看成一个对象,将一个方法看成一个对象..... 反射--剖析类,分析类的字节码,产生对象的字 ...
- [linux]多线程下载
axel -n 10 -o /tmp/ http://soft.vpser.net/lnmp/lnmp0.7-full.tar.gz # 10是线程数
- SPOJ NETADMIN - Smart Network Administrator(二分)(网络流)
NETADMIN - Smart Network Administrator #max-flow The citizens of a small village are tired of being ...
- Don't Be a Subsequence
问题 F: Don't Be a Subsequence 时间限制: 1 Sec 内存限制: 128 MB提交: 33 解决: 2[提交] [状态] [讨论版] [命题人:] 题目描述 A sub ...
- [姿势]cpp - memset
头文件:memory.h 可以刷的有: memset(array,,sizeof(array)); //全部赋0 memset(array,-,sizeof(array)); //全部赋-1 用法和用 ...
- POJ 2115 C Looooops(Exgcd)
[题目链接] http://poj.org/problem?id=2115 [题目大意] 求for (variable = A; variable != B; variable += C)的循环次数, ...
- 前端基础-HTML简介及发展史
一 HTML简介 二 HTML发展史 一. HTML简介 用户使用浏览器打开网页看到结果的过程就是:浏览器将服务端的文本文件(即网页文件)内容下载到本地,然后打开显示的过程. 而文本文件的文档结构只有 ...
- Eclipse编辑jsp卡死解决方案
使用Eclipse编辑jsp.js文件时,经常出现卡死现象,在网上百度了N次,经过N次优化调整后,卡死现象逐步好转,具体那个方法起到作用,不太好讲.将所有用过的方法罗列如下: 1.取消验证 windo ...
- redis系列文章
http://blog.csdn.net/liubenlong007/article/details/53690103