大数据处理框架之Strom：kafka storm 整合

storm 使用kafka做数据源，还可以使用文件、redis、jdbc、hive、HDFS、hbase、netty做数据源。

新建一个maven 工程：

pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

  <modelVersion>4.0.0</modelVersion>

  <groupId>storm06</groupId>

  <artifactId>storm06</artifactId>

  <version>0.0.1-SNAPSHOT</version>

  <packaging>jar</packaging>

  <name>storm07</name>

  <url>http://maven.apache.org</url>

  <repositories>

        <!-- Repository where we can found the storm dependencies  -->

        <repository>

            <id>clojars.org</id>

            <url>http://clojars.org/repo</url>

        </repository>

  </repositories>

  <properties>

    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

  </properties>

  <dependencies>

    <dependency>

        <groupId>org.apache.storm</groupId>

        <artifactId>storm-core</artifactId>

        <version>0.9.2-incubating</version>

    </dependency>

    <dependency>

      <groupId>junit</groupId>

      <artifactId>junit</artifactId>

      <version>4.11</version>

      <scope>test</scope>

    </dependency>

     <dependency>

        <groupId>org.apache.kafka</groupId>

        <artifactId>kafka_2.10</artifactId>

        <version>0.9.0.1</version>

        <exclusions>

            <exclusion>

                <groupId>com.sun.jdmk</groupId>

                <artifactId>jmxtools</artifactId>

            </exclusion>

            <exclusion>

                <groupId>com.sun.jmx</groupId>

                <artifactId>jmxri</artifactId>

            </exclusion>

        </exclusions>

    </dependency>

    <dependency>

        <groupId>org.apache.logging.log4j</groupId>

        <artifactId>log4j-slf4j-impl</artifactId>

        <version>2.0-beta9</version>

    </dependency>

    <dependency>

        <groupId>org.apache.logging.log4j</groupId>

        <artifactId>log4j-1.2-api</artifactId>

        <version>2.0-beta9</version>

    </dependency>

    <dependency>

        <groupId>org.slf4j</groupId>

        <artifactId>log4j-over-slf4j</artifactId>

        <version>1.7.10</version>

    </dependency>

    <dependency>

        <groupId>org.slf4j</groupId>

        <artifactId>slf4j-log4j12</artifactId>

        <version>1.7.10</version>

    </dependency>

    <!-- storm & kafka sqout -->

    <dependency>

        <groupId>net.wurstmeister.storm</groupId>

        <artifactId>storm-kafka-0.8-plus</artifactId>

        <version>0.4.0</version>

    </dependency>

    <dependency>

        <groupId>commons-collections</groupId>

        <artifactId>commons-collections</artifactId>

        <version>3.2.1</version>

    </dependency>

    <dependency>

        <groupId>com.google.guava</groupId>

        <artifactId>guava</artifactId>

        <version>15.0</version>

    </dependency>

  </dependencies>

    <build>

    <finalName>storm06</finalName>

   <plugins>

        <plugin>

            <groupId>org.apache.maven.plugins</groupId>

            <artifactId>maven-war-plugin</artifactId>

            <version>2.4</version>

        </plugin>

        <plugin>

            <groupId>org.apache.maven.plugins</groupId>

            <artifactId>maven-compiler-plugin</artifactId>

            <version>2.1</version>

            <configuration>

                <source>1.7</source>

                <target>1.7</target>

            </configuration>

        </plugin>

        <!-- 单元测试 -->

        <plugin>

            <groupId>org.apache.maven.plugins</groupId>

            <artifactId>maven-surefire-plugin</artifactId>

            <configuration>

                <skip>true</skip>

                <includes>

                    <include>**/*Test*.java</include>

                </includes>

            </configuration>

        </plugin>

        <plugin>

            <groupId>org.apache.maven.plugins</groupId>

            <artifactId>maven-source-plugin</artifactId>

            <version>2.1.2</version>

            <executions>

                <!-- 绑定到特定的生命周期之后，运行maven-source-pluin 运行目标为jar-no-fork -->

                <execution>

                    <phase>package</phase>

                    <goals>

                        <goal>jar-no-fork</goal>

                    </goals>

                </execution>

            </executions>

        </plugin>

    </plugins>

  </build>

</project>

KafkaTopology

package bhz.storm.kafka.example;

import storm.kafka.KafkaSpout;

import storm.kafka.SpoutConfig;

import storm.kafka.StringScheme;

import storm.kafka.ZkHosts;

import backtype.storm.Config;

import backtype.storm.LocalCluster;

import backtype.storm.generated.AlreadyAliveException;

import backtype.storm.generated.InvalidTopologyException;

import backtype.storm.spout.SchemeAsMultiScheme;

import backtype.storm.topology.TopologyBuilder;

public class KafkaTopology {

    public static void main(String[] args) throws

        AlreadyAliveException, InvalidTopologyException {

        // zookeeper hosts for the Kafka cluster

        ZkHosts zkHosts = new ZkHosts("134.32.123.101:2181,134.32.123.102:2181,134.32.123.103:2181");

        // Create the KafkaSpout configuartion

        // Second argument is the topic name

        // Third argument is the zookeeper root for Kafka

        // Fourth argument is consumer group id

        SpoutConfig kafkaConfig = new SpoutConfig(zkHosts,"words_topic", "", "id7");

        // Specify that the kafka messages are String

        kafkaConfig.scheme = new SchemeAsMultiScheme(new StringScheme());

        // We want to consume all the first messages in the topic everytime

        // we run the topology to help in debugging. In production, this

        // property should be false

        kafkaConfig.forceFromStart = true;

        // Now we create the topology

        TopologyBuilder builder = new TopologyBuilder();

        // set the kafka spout class

        builder.setSpout("KafkaSpout", new KafkaSpout(kafkaConfig), 1);

        // configure the bolts

        builder.setBolt("SentenceBolt", new SentenceBolt(), 1).globalGrouping("KafkaSpout");

        builder.setBolt("PrinterBolt", new PrinterBolt(), 1).globalGrouping("SentenceBolt");

        // create an instance of LocalCluster class for executing topology in local mode.

        LocalCluster cluster = new LocalCluster();

        Config conf = new Config();

        // Submit topology for execution

        cluster.submitTopology("KafkaToplogy", conf, builder.createTopology());

        try {

            // Wait for some time before exiting

            System.out.println("Waiting to consume from kafka");

            Thread.sleep(10000);

        } catch (Exception exception) {

            System.out.println("Thread interrupted exception : " + exception);

        }

        // kill the KafkaTopology

        cluster.killTopology("KafkaToplogy");

        // shut down the storm test cluster

        cluster.shutdown();

    }

}

package bhz.storm.kafka.example;

import java.util.ArrayList;

import java.util.List;

import org.apache.commons.lang.StringUtils;

import backtype.storm.topology.BasicOutputCollector;

import backtype.storm.topology.OutputFieldsDeclarer;

import backtype.storm.topology.base.BaseBasicBolt;

import backtype.storm.tuple.Fields;

import backtype.storm.tuple.Tuple;

import com.google.common.collect.ImmutableList;

public class SentenceBolt extends BaseBasicBolt {

    // list used for aggregating the words

    private List<String> words = new ArrayList<String>();

    public void execute(Tuple input, BasicOutputCollector collector) {

        // Get the word from the tuple

        String word = input.getString(0);

        if(StringUtils.isBlank(word)){

            // ignore blank lines

            return;

        }

        System.out.println("Received Word:" + word);

        // add word to current list of words

        words.add(word);

        if (word.endsWith(".")) {

            // word ends with '.' which means this is the end of

            // the sentence publishes a sentence tuple

            collector.emit(ImmutableList.of(

                    (Object) StringUtils.join(words, ' ')));

            // and reset the words list.

            words.clear();

        }

    }

    public void declareOutputFields(OutputFieldsDeclarer declarer) {

        // here we declare we will be emitting tuples with

        // a single field called "sentence"

        declarer.declare(new Fields("sentence"));

    }

}

package bhz.storm.kafka.example;

import backtype.storm.topology.BasicOutputCollector;

import backtype.storm.topology.OutputFieldsDeclarer;

import backtype.storm.topology.base.BaseBasicBolt;

import backtype.storm.tuple.Tuple;

public class PrinterBolt extends BaseBasicBolt {

    public void execute(Tuple input, BasicOutputCollector collector) {

        // get the sentence from the tuple and print it

        String sentence = input.getString(0);

        System.out.println("Received Sentence:" + sentence);

    }

    public void declareOutputFields(OutputFieldsDeclarer declarer) {

        // we don't emit anything

    }

}

大数据处理框架之Strom：kafka storm 整合的更多相关文章

大数据处理框架之Strom: Storm----helloword
大数据处理框架之Strom: Storm----helloword Storm按照设计好的拓扑流程运转,所以写代码之前要先设计好拓扑图.这里写一个简单的拓扑: 第一步:创建一个拓扑类含有main方法的 ...
大数据处理框架之Strom：认识storm
Storm是分布式实时计算系统,用于数据的实时分析.持续计算,分布式RPC等. (备注:5种常见的大数据处理框架:· 仅批处理框架:Apache Hadoop:· 仅流处理框架:Apache Stor ...
大数据处理框架之Strom：Flume+Kafka+Storm整合
环境虚拟机:VMware 10 Linux版本:CentOS-6.5-x86_64 客户端:Xshell4 FTP:Xftp4 jdk1.8 storm-0.9 apache-flume-1.6.0 ...
大数据处理框架之Strom：redis storm 整合
storm 引入redis ,主要是使用redis缓存库暂存storm的计算结果,然后redis供其他应用调用取出数据. 新建maven工程 pom.xml <project xmlns=&qu ...
大数据处理框架之Strom: Storm拓扑的并行机制和通信机制
一.并行机制 Storm的并行度 ,通过提高并行度可以提高storm程序的计算能力. 1.组件关系:Supervisor node物理节点,可以运行1到多个worker,不能超过supervisor. ...
大数据处理框架之Strom:Storm集群环境搭建
搭建环境 Red Hat Enterprise Linux Server release 7.3 (Maipo) zookeeper-3.4.11 jdk1.7.0_80 Pyth ...
大数据处理框架之Strom：DRPC
环境虚拟机:VMware 10 Linux版本:CentOS-6.5-x86_64 客户端:Xshell4 FTP:Xftp4 jdk1.8 storm-0.9 一.DRPC DRPC:Distri ...
大数据处理框架之Strom:容错机制
1.集群节点宕机Nimbus服务器单点故障,大部分时间是闲置的,在supervisor挂掉时会影响,所以宕机影响不大,重启即可非Nimbus服务器故障时,该节点上所有Task任务都会超时,Nimb ...
大数据处理框架之Strom：事务
环境虚拟机:VMware 10 Linux版本:CentOS-6.5-x86_64 客户端:Xshell4 FTP:Xftp4 jdk1.8 storm-0.9 apache-flume-1.6.0 ...

随机推荐

【PyQt5-Qt Designer】QMessageBox 弹出框总结
QMessageBox QMessageBox类中常用方法方法描述 information(QWdiget parent,title,text,buttons,defaultButton) 弹出 ...
monitor
// ==UserScript== // @name Page Monitor // @namespace http://tampermonkey.net/ // @version 0.1 // @d ...
Ansible 快速上手（转）
add by zhj: 执行Ansible(发音时,重音在最前面)命令有两种方式,一种是ad-hoc形式,另一种是playbooks,对于软件开发者来说,一般使用ad-hoc就足够了.playbook ...
es中的停用词
停用词主要是为了提升性能与精度.从早期的信息检索到如今,我们已习惯于磁盘空间和内存被限制为很小一部分,所以必须使你的索引尽可能小. 每个字节都意味着巨大的性能提升. 词干提取的重要性不仅是因为它让搜 ...
关于hover的一个问题记录
问题描述: 页面显示: 当鼠标移动到其中一个文件夹上面的时候,显示如下: 显示这样的效果的原理是:对于外层元素.collectionsbox添加hover之后,再去取里面的元素,比如说左上角的shar ...
20180318 一个VS2015运行DataTable问题
1. 环境VS 2015 社区版,使用"DataTable" 为了即使查看DataTable中的数据,点击放大镜 ,加载提示错误. 解决方案: 第一步: 第二步: “选项” -- ...
git diff 与 git diff --cached的不用
git diff比较的是工作目录中当前文件和暂存区域快照之间的差异, 也就是修改之后还没有暂存起来的变化内容.若要查看已暂存的将要添加到下次提交里的内容,可以用 git diff --cached 命 ...
js小数点精度问题
项目背景是用eharts 渲染数据,其中Y 轴的刻度尺间隔用 interval,代码中如下: yAxis: [ { type : 'value', position:'left', min:minV ...
poi 生成图片到excel
try { InputStream iss = new FileInputStream("D:\\test.xlsx"); XSSFWorkbook wb = new XSSFWo ...
Echart绘制趋势图和柱状图总结
1.legend名字与series名字一样,即可联动,且不可手动去掉联动效果 2.通过legend图例联动,隐藏绘制图线后,对应( yAxisIndex: 1)坐标y轴如果没有同时设置min和max的 ...

大数据处理框架之Strom：kafka storm 整合

大数据处理框架之Strom：kafka storm 整合的更多相关文章

随机推荐

热门专题