Setting up your development environment

1. download j2se 6 SDK from http://www.oracle.com/technetwork/java/javase/downloads/index.html

chmod 775 jdk-6u35-linux-64.bin
yes | jdk-6u35-linux-64.bin
mv jdk1.6.0_35 /opt
ln -s /opt/jdk1.6.0_35/bin/java /usr/bin
ln -s /opt/jdk1.6.0_35/bin/javac /usr/bin
export JAVA_HOME=/opt/jdk1.6.0_35
export PATH=$PATH:$JAVA_HOME/bin

2. install Git

sudo apt-get install git

3. install Maven

sudo apt-get install mvn

4. install Puppet, Vagrant, and VirtualBox

sudo apt-get install virtualbox puppet vagrant

5. install Eclipse

sudo apt-get install eclipse

Distributed version control

1. create project directory

mkdir FirstGitProject
cd FirstGitProject
git init

2. create some filess in the repository

touch README.txt
vim README.txt

3. review the status of the repository

git status

4. add all files and folders manually

git add README.txt

5. commit the file

git commit -a -m "The first commit"

6. add the remote repository to local repository and push the changes

git remote add origin https://[user]@bitbucket.org/[user]/firstgitproject.git

git push origin master

Creating a "Hello World" topology

1. create a new project folder and init Git repository

mkdir HelloWorld
cd HelloWorld
git init

2. create Maven project file

vi pom.xml

3. create the basic XML tags and project metadata

<project xmlns="http://maven.apache.org/POM/4.0.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

    <modelVersion>4.0.0</modelVersion>

    <groupId>storm.cookbook</groupId>
    <artifactId>hello-world</artifactId>
    <version>0.0.1-SNAPSHOT</version>

    <name>hello-world</name>
    <url>https://bitbucket.org/[user]/hello-world</url>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

</project>

4. declare the Maven repositories to fetch the dependencies from

<repositories>
    <repository>
        <id>github-releases</id>
        <url>http://oss.sonatype.org/content/repositories/github-releases/</url>
    </repository>
    <repository>
        <id>clojars.org</id>
        <url>http://clojars.org/repo</url>
    </repository>
    <repository>
        <id>twitter4j</id>
        <url>http://twitter4j.org/maven2</url>
    </repository>
</repositories>

5. declare the dependencies

<dependencies>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>3.8.1</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>storm</groupId>
        <artifactId>storm</artifactId>
        <version>0.8.1</version>
        <!-- keep storm out of the jar-with-dependencies -->
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>com.googlecode.json-simple</groupId>
        <artifactId>json-simple</artifactId>
        <version>1.1</version>
    </dependency>
</dependencies>

6. add the build plugin definitions

<build>
    <plugins>
        <plugin>
            <artifactId>maven-assembly-plugin</artifactId>
            <configuration>
                <descriptorRefs>
                      <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
                <archive>
                    <manifest>
                        <mainClass></mainClass>
                    </manifest>
                </archive>
            </configuration>
            <executions>
                <execution>
                    <id>make-assembly</id>
                    <phase>package</phase>
                    <goals>
                        <goal>single</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
        <plugin>
            <groupId>com.theoryinpractise</groupId>
            <artifactId>clojure-maven-plugin</artifactId>
            <version>1.3.8</version>
            <extensions>true</extensions>
            <configuration>
                <sourceDirectories>
                    <sourceDirectory>src/clj</sourceDirectory>
                </sourceDirectories>
            </configuration>
            <executions>
                <execution>
                    <id>compile</id>
                     <phase>compile</phase>
                     <goals>
                         <goal>compile</goal>
                     </goals>
                </execution>
                <execution>
                    <id>test</id>
                     <phase>test</phase>
                     <goals>
                         <goal>test</goal>
                     </goals>
                </execution>
            </executions>
        </plugin>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <configuration>
                <source>1.6</source>
                <target>1.6</target>
            </configuration>
        </plugin>
    </plugins>
</build>

7. complete required folder structure

src/main/java
src/test

8. generate the Eclipse project

mvn eclipse:eclipse

9. create the spout of HelloWorldSpout (HelloWorldSpout.java)

package storm.cookbook 

class HelloWorldSpout : BaseRichSpout {
    private SpoutOutputCollector collector;
    private int referenceRandom;
    private static final int MAX_RANDOM = 10;

    public HelloWorldSpout() {
        final Random rand = new Random();
        referenceRandom = rand.nextInt(MAX_RANDOM);
    }
}

10. after construction, the Storm cluster will open the spout

public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
    this.collector = collector;
}

11. The Storm cluster will repeatedly call the nextTuple method, which will do all the work of the spout

public void nextTuple() {
    final Random rand = new Random();
    int instanceRandom = rand.nextInt(MAX_RANDOM);

    if (instanceRandom == referenceRandom) {
        collector.emit(new Values("Hello World"));
    }
    else {
        collector.emit(new Values("Other Random Word"));
    }
}

12. tell the Storm cluster which fields this spout emits within the declareOutputFields

public void declareOutputFields() {
    declarer.declare(new Fields("sentence"));
}

13. create the bolt of HelloWorldBolt (HelloWorldBolt.java)

package storm.cookbook 

class HelloWorldBolt : BaseRichBolt {
    private int myCount;

    public void execute() {
        String test = input.getStringByField("sentence");

        if ("Hello World".equals(test)) {
            myCount++;
            System.out.println("Found a Hello World! My Count is now: " + Integer.toString(myCount));
        }
    }
}

14. create a main class to declare the Storm topology (HelloWorldTopology.java)

package storm.cookbook

public class HelloWorldTopology {
    TopologyBuilder builder = new TopologyBuilder();

    builder.setSpout("randomHelloWorld", new HelloWorldSpout(), 10);
    builder.setBolt("HelloWorldBolt", new HelloWorldBolt(), 2).shuffleGrouping("randomHelloWorld");

    Config conf = new Config();

    conf.setDebug(true);

    if(args!=null && args.length > 0) {
        conf.setNumWorkers(3);
        StormSubmitter.submitTopology(args[0], conf, builder.createTopology());
    }
    else {
        LocalCluster cluster = new LocalCluster();

        cluster.submitTopology("test", conf, builder.createTopology());

        Utils.sleep(10000);

        cluster.killTopology("test");
        cluster.shutdown();
    }
}

15. execute the cluster from the project's root folder

mvn compile exec:java -Dexec.classpathScope=compile -Dexec.mainClass=storm.cookbook.HelloWorldTopology

Creating a Storm Cluster - provisioning the machines

. create a new project named vagrant-storm-cluster with the following data structure

vagrant-storm-cluster
vagrant-storm-cluster/data
vagrant-storm-cluster/manifest
vagrant-storm-cluster/modules
vagrant-storm-cluster/scripts

. create a file in the project root called Vagrantfile

# -*- mode: ruby -*-
# vi: set ft=ruby :
boxes = [
    { :name => :nimbus, :ip => , :memory =>  },
    { :name => :supervisor1, :ip => , :memory =>  },
    { :name => :supervisor2, :ip => , :memory =>  },
    { :name => :zookeeper1, :ip => , :memory =>  }
]

. define the hardware, networking, and operating system

boxes.each do |opts|
    config.vm.define opts[:name] do |config|
        config.vm.box = "ubuntu12"
        config.vm.box_url = "http://dl.dropbox.com/u/1537815/precise64.box"
        config.vm.network :hostonly, opts[:ip]
        config.vm.host_name = "storm.%s" % opts[:name].to_s
        config.vm.share_folder "v-data", "/vagrant_data", "./data", :transient => false
        config.vm.customize ["modifyvm", :id, "--memory", opts[:memory]]
        config.vm.customize ["modifyvm", :id, "--cpus", opts[:cpus]] if opts[:cpus]

. config the provisioning of the application

        config.vm.provision :shell, :inline => "cp -fv /vagrant_data/hosts/etc/hosts"
        config.vm.provision :shell, :inline => "apt-get update"
        # Check if the jdk has been provided
        if File.exist?("./data/jdk-6u35-linux-x64.bin") then
            config.vm.provision :puppet do |puppet|
                puppet.manifests_path = "manifests"
                puppet.manifest_file = "jdk.pp"
            end
        end
        config.vm.provision :puppet do |puppet|
             puppet.manifests_path = "manifests"
             puppet.manifest_file = "provisioningInit.pp"
        end
        # Ask puppet to do the provisioning now.
        config.vm.provision :shell, :inline => "puppet apply /tmp/storm-puppet/manifests/site.pp --verbose --modulepath=/tmp/storm-puppet/modules/ --debug"
    end
end   

. create installJdk.sh file in the scripts folder

#!/bin/sh
echo "Installing JDK!"

 /vagrant_data/jdk-6u35-linux-x64.bin

cd /root
yes | /vagrant_data/jdk-6u35-linux-x64.bin

.0_35 /opt

rm -rv /usr/bin/java
rm -rv /usr/bin/javac

.0_35/bin/java /usr/bin
.0_35/bin/javac /usr/bin

export JAVA_HOME=/opt/jdk1..0_35
export PATH=$PATH:$JAVA_HOME/bin

. create jdk.pp file in the manifest folder

$JDK_VERSION = "1.6.0_35"
package { "openjdk":
    ensure => absent,
}
exec { "installJdk":
    command => "installJdk.sh",
    path => "/bagrant/scripts",
    logoutput => true,
    creates => "/opt/jdk${JDK_VERSION}",
}

. create provisioningInit.pp file in the manifest folder

$CLONE_URL = "https://bitbucket.org/qanderson/storm-puppet.git"
$CHECKOUT_DIR="/tmp/storm-puppet"

package {git:ensure=> [latest,installed]}
package {puppet:ensure=> [latest,installed]}
package {ruby:ensure=> [latest,installed]}
package {rubygems:ensure=> [latest,installed]}
package {unzip:ensure=> [latest,installed]}

exec { "install_hiera":
    command => "gem install hiera hiera-puppet",
    path => "/usr/bin",
    require => Package['rubygems'],
}

. clone the repository, which contains the second level of provision

exec { "clone_storm-puppet":
    command => "git clone ${CLONE_URL}",
    cwd => "/tmp",
    path => "/usr/bin",
    creates => "${CHECKOUT_DIR}",
    require => Package['git'],
}

. configure Puppet plugin of Hiera, which is used to externalize properties from the provisioning scripts

exec {"/bin/ln -s /var/lib/gems/1.8/gems/hiera-puppet-1.0.0/ /tmp/storm-puppet/modules/hiera-puppet":
    creates => "/tmp/storm-puppet/modules/hiera-puppet",
    require => [Exec['clone_storm-puppet'],Exec['install_hiera']]
}

#install hiera and the storm configuration
file { "/etc/puppet/hiera.yaml":
    source => "/vagrant_data/hiera.yaml",
    replace => true,
    require => Package['puppet']
}

file { "/etc/puppet/hieradata":
    ensure => directory,
    require => Package['puppet']
}

file {"/etc/puppet/hieradata/storm.yaml":
    source => "${CHECKOUT_DIR}/modules/storm.yaml",
    replace => true,
    require => [Exec['clone_storm-puppet'],File['/etc/puppet/hieradata']]
}

. create the Hiera base configuration file in data folder 

hiera.yaml:

---
:hierarchy:
    - %{operatingsystem}
    - storm
:backends:
    - yaml
:yaml:
    :datadir: '/etc/puppet/hieradata'

. config the host's file

127.0.0.1 localhost
192.168.33.100 storm.nimbus
192.168.33.101 storm.supervisor1
192.168.33.102 storm.supervisor2
192.168.33.103 storm.supervisor3
192.168.33.104 storm.supervisor4
192.168.33.105 storm.supervisor5
192.168.33.201 storm.zookeeper1
192.168.33.202 storm.zookeeper2
192.168.33.203 storm.zookeeper3
192.168.33.204 storm.zookeeper4

. init the Git repository for this project and push it to bitbucket.org

Creating a Storm cluster - provisioning Storm

Once you have a base set of virtual machines that are ready for application provisioning, you need to install and configure the appropriate packages on each node.

. create a new project named storm-puppet

storm-puppet
storm-puppet/manifests
storm-puppet/modules
storm-puppet/modules/storm
storm-puppet/modules/storm/manifests
storm-puppet/modules/storm/templates

. create site.pp in the manifests folder

node 'storm.nimbus' {
    $cluster = 'storm1'
    include storm::nimbus
    include storm::ui
}

node /storm.supervisor[-]/ {
    $cluster = 'storm1'
    include storm::supervisor
}

node /storm.zookeeper[-]/ {
    include storm::zoo
}

. create init.pp in /modules/storm/manifests

class storm {
    include storm::install
    include storm::config
}

. create install.pp (in /modules/storm/manifests? or /manifests?)

class storm::install {
    $BASE_URL="https://bitbucket.org/qanderson/storm-deb-packaging/downloads/"
    $ZMQ_FILE="libzmq0_2.1.7_amd64.deb"
    $JZMQ_FILE="libjzmq_2.1.7_amd64.deb"
    $STORM_FILE="storm_0.8.1_all.deb"

      package { "wget":
        ensure => latest
      }

    # call fetch for each file
      exec { "wget_storm":
        command => "/usr/bin/wget ${BASE_URL}${STORM_FILE}"
    }

      exec {"wget_zmq":
        command => "/usr/bin/wget ${BASE_URL}${ZMQ_FILE}"
    }

      exec { "wget_jzmq":
        command => "/usr/bin/wget ${BASE_URL}${JZMQ_FILE}"
    }

      #call package for each file
      package { "libzmq0":
        provider => dpkg,
        ensure => installed,
        source => "${ZMQ_FILE}",
        require => Exec['wget_zmq']
      }

      #call package for each file
      package { "libjzmq":
        provider => dpkg,
        ensure => installed,
        source => "${JZMQ_FILE}",
        require => [Exec['wget_jzmq'],Package['libzmq0']]
      }

      #call package for each file
      package { "storm":
        provider => dpkg,
        ensure => installed,
        source => "${STORM_FILE}",
        require => [Exec['wget_storm'], Package['libjzmq']]
      }
}

. create config.pp in the storm manifests

class storm::config {
    require storm::install
    include storm::params

    file { '/etc/storm/storm.yaml':
        require => Package['storm'],
        content => template('storm/storm.yaml.erb'),
        owner   => 'root',
        group   => 'root',
        mode    => '
    }

    file { '/etc/default/storm':
        require => Package['storm'],
        content => template('storm/default.erb'),
        owner   => 'root',
        group   => 'root',
        mode    => '
    }
}

. create params.pp in the storm manifests for Hiera

class storm::params {
    #_ STORM DEFAULTS _#
    $java_library_path = hiera_array('java_library_path', ['/usr/local/lib', '/opt/local/lib', '/usr/lib'])
}

. specify the nimbus, supervisor, ui, and zoo class

class storm::nimbus {
    require storm::install
    include storm::config
    include storm::params

    # Install nimbus /etc/default
    storm::service { 'nimbus':
        start => 'yes',
        jvm_memory => $storm::params::nimbus_mem
    }
}

class storm::supervisor {
    require storm::install
    include storm::config
    include storm::params

    # Install supervisor /etc/default
    storm::service { 'supervisor':
        start => 'yes',
        jvm_memory => $storm::params::supervisor_mem
      }
}

class storm::ui {
      require storm::install
      include storm::config
      include storm::params

      # Install ui /etc/default
      storm::service { 'ui':
        start => 'yes',
        jvm_memory => $storm::params::ui_mem
      }
}

class storm::zoo {
    package {['zookeeper','zookeeper-bin','zookeeperd']:
        ensure => latest,
      }
}

. init the Git repository and push it to bitbucket.org

. navigate to the vagrant-storm-cluster folder to run the provisioning

vagrant up

. vagrant ssh nimbus

Deriving basic click statistics

Getting ready

wget http://download.redis.io/redis-stable.tar.gz
tar xvzf redis-stable.tar.gz
cd redis-stable
make
sudo cp redis-server /usr/local/bin/
sudo cp redis-cli /usr/local/bin/

Then start the Redis Server.

1. crate a new java project named click-topology, and create the pom.xml file and folder structure as per the "Hello World" topology project.

mkdir ClickTopology

src/test
src/main/java

vi pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
      http://maven.apache.org/xsd/maven-4.0.0.xsd">

      <modelVersion>4.0.0</modelVersion>

      <groupId>storm.cookbook</groupId>
      <artifactId>click-topology</artifactId>
       <version>0.0.1-SNAPSHOT</version>

    <packaging>jar</packaging>
    <name>click-topology</name>
    <url>https://bitbucket.org/[user]/hello-world</url>
    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>
</project>

2. add the <dependencies> tag

<dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>4.11</version>
    <scope>test</scope>
</dependency>

<dependency>
    <groupId>org.jmock</groupId>
    <artifactId>jmock-junit4</artifactId>
    <version>2.5.1</version>
    <scope>test</scope>
</dependency>

<dependency>
    <groupId>org.jmock</groupId>
    <artifactId>jmock-legacy</artifactId>
    <version>2.5.1</version>
    <scope>test</scope>
</dependency>

<dependency>
    <groupId>redis.clients</groupId>
    <artifactId>jedis</artifactId>
    <version>2.1.0</version>
</dependency>

3. create the ClickTopology main class in the package of storm.cookbook under the folder of src/main/java

public class ClickTopology {

    TopologyBuilder builder = new TopologyBuilder();

    public ClickTopology() {
        builder.setSpout("clickSpout", new ClickSpout(), 10);

        //First layer of bolts
        builder.setBolt("repeatsBolt", new RepeatVisitBolt(), 10).shuffleGrouping("clickSpout");
        builder.setBolt("geographyBolt", new GeographyBolt(new HttpIPResolver()), 10).shuffleGrouping("clickSpout");

        //second layer of bolts, commutative in nature
        builder.setBolt("totalStats", new VisitStatsBolt(), 1).globalGrouping("repeatsBolt");
        builder.setBolt("geoStats", new GeoStatsBolt(), 10).fieldsGrouping("geographyBolt", new Fields(storm.cookbook.Fields.COUNTRY));

        conf.put(Conf.REDIS_PORT_KEY, DEFAULT_JEDIS_PORT);
    }

    public void runLocal(int runTime){
           conf.setDebug(true);
           conf.put(Conf.REDIS_HOST_KEY, "localhost");

           cluster = new LocalCluster();
           cluster.submitTopology("test", conf, builder.createTopology());

           if(runTime > 0){
               Utils.sleep(runTime);
               shutDownLocal();
        }
    }

    public void shutDownLocal(){
        if(cluster != null){
            cluster.killTopology("test");
            cluster.shutdown();
        }
    }

    public void runCluster(String name, String redisHost) throws AlreadyAliveException, InvalidTopologyException {
          conf.setNumWorkers(20);
          conf.put(Conf.REDIS_HOST_KEY, redisHost);
          StormSubmitter.submitTopology(name, conf, builder.createTopology());
    }

}

4. add the main method.

public static void main(String[] args) throws Exception {

    ClickTopology topology = new ClickTopology();

    if(args!=null && args.length > 1) {
        topology.runCluster(args[0], args[1]);
    } else {
        if(args!=null && args.length == 1) {
            System.out.println("Running in local mode, redis ip missing for cluster run");
        }
        topology.runLocal(10000);
    }
}

5. the topology assumes that the web server pushes message onto a Redis queue. you must create a spout to inject these into the Storm cluster as a stream. create the ClickSpout class, connect to Redis when it is opened by the cluster.

public class ClickSpout extends BaseRichSpout {

    public static Logger logger = Logger.getLogger(ClickSpout.class);

    private Jedis jedis;
    private String host;
    private int port;
    private SpoutOutputCollector collector;

    @Override
    public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
        outputFieldsDeclarer.declare(new Fields(storm.cookbook.Fields.IP, storm.cookbook.Fields.URL, storm.cookbook.Fields.CLIENT_KEY));
    }

    @Override
    public void open(Map conf, TopologyContext topologyContext, SpoutOutputCollector spoutOutputCollector) {
        host = conf.get(Conf.REDIS_HOST_KEY).toString();
        port = Integer.valueOf(conf.get(Conf.REDIS_PORT_KEY).toString());
        this.collector = spoutOutputCollector;
        connectToRedis();
    }

    private void connectToRedis() {
        jedis = new Jedis(host, port);
    }
}

6. the cluster will then poll the spout for new tuples through the nextTuple method

public void nextTuple() {

    String content = jedis.rpop("count");

    if (content == null || "nil".equals(content)) {
        try {
            Thread.sleep(300);
        }
        catch (InterruptdException e) {
        }
    }
    else {
        JSONObject obj = (JSONObject)JSONValue.parse(content);

        String ip = obj.get(storm.cookbook.Fields.IP).toString();
        String url = obj.get(storm.cookbook.Fields.URL).toString();
        String clientKey = obj.get(storm.cookbook.Fields.CLIENT_KEY).toString();

        collector.emit(new Values(ip, url, clientKey));
    }
}

7. create the bolts that will enrich the basic data through the database or remote API lookups.

public class RepeatVisitBolt extends BaseRichBolt {

    private OutputCollector collector;

    private Jedis jedis;
    private String host;
    private int port;

    @Override
    public void prepare(Map conf, TopologyContext topologyContext, outputCollector outputCollector) {
        this.collector = outputCollector;
        host = conf.get(Conf.REDIS_HOST_KEY).toString();
        port = Integer.valueOf(conf.get(Conf.REDIS_PORT_KEY).toString());

        connectToRedis();
    }

    private void connectToRedis() {
        jedis = new Jedis(host, port);
        jedis.connect();
    }
}

8. add the execute method, which look up the previous visit flags from Redis, based on the fields in the tuple, and emit the enriched tuple

public void execute(Tuple tuple) {

    String ip = tuple.getStringByField(storm.cookbook.Fields.IP);
    String clientKey = tuple.getStringByField(storm.cookbook.Fields.CLIENT_KEY);
    String url = tuple.getStringByField(storm.cookbook.Fields.URL);
    String key = url + ":" + clientKey;
    String value = jedis.get(key);

    if(value == null) {
        jedis.set(key, "visited");
          collector.emit(new Values(clientKey, url, Boolean.TRUE.toString()));
    }
    else {
        collector.emit(new Values(clientKey, url, Boolean.FALSE.toString()));
    }
}

9. create the GeographyBolt class

package storm.bookbook;

public class GeographyBolt extends BaseRichBolt {

    public void execute(Tuple tuple) {

        String ip = tuple.getStringByField(storm.cookbook.Fields.IP);
           JSONObject json = resolver.resolveIP(ip);

           String city = (String) json.get(storm.cookbook.Fields.CITY);
           String country = (String) json.get(storm.cookbook.Fields.COUNTRY_NAME);

           collector.emit(new Values(country, city));
    }
}

10. create the HttpIPResolver class, and injecting it into GeographyBolt at design time

public class HttpIPResolver implements IPResolver, Serializable {

}

11. splits streams into the bolt

builder.setBolt("geoStats", new GeoStatsBolt(), 10).fieldsGrouping("geographyBolt", new Fields(storm.bookbook.Fields.COUNTRY));

12. create the GeoStatsBolt class

public class GeoStatsBolt {

    public void execute(Tuple tuple) {
        String country = tuple.getStringByField(storm.cookbook.Fields.COUNTRY);
        String city = tuple.getStringByField(Fields.CITY);

        if(!stats.containsKey(country)){
            stats.put(country, new CountryStats(country));
        }

        stats.get(country).cityFound(city);

        collector.emit(new Values(country, stats.get(country).getCountryTotal(), city, stats.get(country).getCityTotal(city)));
    }
}

13. create the CountryStats class

private class CountryStatus {

    private int countryTotal = 0;
    private static final int COUNT_INDEX = 0;
    private static final int PERCENTAGE_INDEX = 1;
    private String countryName;

    private Map<String, List<Integer>> cityStats = new HashMap<String, List<Integer>>();

    public CountryStats(String countryName) {
        this.countryName = countryName;
    }

    public void cityFound(String cityName) {
        countryTotal++;

        if(cityStats.containsKey(cityName)){
            cityStats.get(cityName).set(COUNT_INDEX, cityStats.get(cityName).get(COUNT_INDEX).intValue() + 1);
        }
        else {
            List<Integer> list = new LinkedList<Integer>();
            list.add(1);
            list.add(0);

            cityStats.put(cityName, list);
        }

        double percent = (double)cityStats.get(cityName).get(COUNT_INDEX)/(double)countryTotal;

        cityStats.get(cityName).set(PERCENTAGE_INDEX, (int)percent);
    }

    public int getCountryTotal(){
        return countryTotal;
    }

    public int getCityTotal(String cityName){
        return cityStats.get(cityName).get(COUNT_INDEX).intValue();
    }
}

14. the final counting for visitors and unique visitors

builder.setBolt("totalStats", new VisitStatsBolt(), 1).globalGrouping("repeatsBolt");

15. create the VisitStatsBolt class

public class VisitStatsBolt {

    public void execute(Tuple tuple) {
           boolean unique = Boolean.parseBoolean(tuple.getStringByField(storm.cookbook.Fields.UNIQUE));
           total++;

           if(unique) {
               uniqueCount++;
           }

           collector.emit(new Values(total,uniqueCount));
    }
}

Unit testing a bolt

1. create the StormTestCase class under src/text/java

package storm.cookbook;

public class StormTestCase {
    protected Mockery context = new Mockery() {
        setImposteriser(ClassImposteriser.INSTANCE);
    };

    protected Tuple getTuple() {
        final Tuple tuple = context.mock(Tuple.class);
        return tuple;
    }
}

2. craete the TestRepeatVisitBolt class

@RunWith(value = Parameterized.class)
public class TestRepeatVisitBolt extends StormTestCase {
}

3. add the execute method

public void testExecute() {

    jedis = new Jedis("localhost", 6379);
    RepeatVisitBolt bolt = new RepeatVisibleBolt();

    Map config = new HashMap();
    config.put("redis-host", "localhost");
    config.put("redis-port", "6379");

    final OutputCollector collector = context.mock(OutputCollector.class);

    bolt.prepare(config, null, collector);

    final Tuple tuple = getTuple();
    context.checking(new Expectations() {
        oneOf(tuple).getStringByField(Fields.IP);
        will(returnValue(ip));

        oneOf(tuple).getStringByField(Fields.CLIENT_KEY);
        will(returnValue(clientKey));

        oneOf(tuple).getStringByField(Fields.URL);
        will(returnValue(url));

        oneOf(collector).emit(new Values(clientKey, url, expected));
    });

    bolt.execute(tuple);
    context.assertIsSatisfied();

    if(jedis != null) {
        jedis.disconnect();
    }
}

4. define the parameters

@Parameterized.Parameters
public static Collection<Object[]> data() {

    Object[][] data = new Object[][] {
        { "192.168.33.100", "Client1", "myintranet.com", "false" },
        { "192.168.33.100", "Client1", "myintranet.com", "false" },
        { "192.168.33.101", "Client2", "myintranet1.com", "true" },
          { "192.168.33.102", "Client3", "myintranet2.com", false"}
    };

    return Arrays.asList(data);
}

5. add the base provisioning of the values using Redis

@BeforeClass
pubic static void setupJedis() {
    Jedis jedis = new Jedis("localhost",6379);
    jedis.flushDB();
    Iterator<Object[]> it = data().iterator();

    while (it.hasNext()) {
        Object[] values = it.next();

        if (values[3].equals("false")) {
            String key = values[2] + ":" + values[1];
            jedis.set(key, "visited");
        }
    }
}

create storm topologies and deploy to storm cluster.

Storm cluster <similar to> Hadoop cluster

topologies <whereas> MapReduce jobs

process forever < whereas> eventually finishes

master node => worker node

daemon<Nimbus> => JobTracker
distributing code

daemon<Supervisor> =>

storm jar all-my-code.jar org.apache.storm.MyTopology arg1 arg2

storm jar part takes care of connecting to Nimbus and uploading the jar.

Streams

=> unbounded sequence of tuples.

stream transformation => spouts and bolts, have interfaces

spout => source of stream

bolt => consume input stream, process and emit new stream

=> run functions, filter tuples, do streaming aggregation, do streaming join, talk to database

tuple => data model

Storm(1) - Setting Up Development Environment的更多相关文章

  1. storm环境搭建(前言)—— 翻译 Setting Up a Development Environment

    Setting Up a Development Environment 搭建storm开发环境所需步骤: Download a Storm release , unpack it, and put ...

  2. Programming in Go (Golang) – Setting up a Mac OS X Development Environment

    http://www.distilnetworks.com/setup-go-golang-ide-for-mac-os-x/#.V1Byrf50yM8 Programming in Go (Gola ...

  3. Establish the LAMP (Linux+Apache+MySQL+PHP) Development Environment on Ubuntu 14.04 Operating System

    ######################################################## Step One: Update the software package in yo ...

  4. Create A .NET Core Development Environment Using Visual Studio Code

    https://www.c-sharpcorner.com/article/create-a-net-core-development-environment-using-visual-studio- ...

  5. Install Qualcomm Development Environment

    安裝 Android Development Environment http://www.cnblogs.com/youchihwang/p/6645880.html 除了上述還得安裝, sudo ...

  6. The IBM Blockchain Platform:Installing the development environment

    Follow these instructions to obtain the IBM Blockchain Platform: Develop development tools (primaril ...

  7. 1.3 PROGRAM DEVELOPMENT ENVIRONMENT

    1.3 PROGRAM DEVELOPMENT ENVIRONMENT 1.4 WIN32 EXECUTEABLE FILE FORMAT We should also know that compl ...

  8. The Google Test and Development Environment (持续更新)

    最近Google Testing Blog上开始连载The Google Test and Development Environment(Google的测试和开发环境),因为blogspot被墙,我 ...

  9. Building and setting up QT environment for BeagleBone

    There are too few information available on how to easily setup QT environment for building Beaglebon ...

随机推荐

  1. CSS笔记(四)文本

    CSS 文本属性可定义文本的外观.通过文本属性,可以改变文本的颜色.字符间距,对齐文本,装饰文本和对文本进行缩进,等等. 参考:http://www.w3school.com.cn/css/css_t ...

  2. 面向对象的JavaScript系列二,继承

    1.原型链 function SuperType(){ this.property = true; } SuperType.prototype.getSuperValue = function(){ ...

  3. XAF应用开发教程(三)业务对象模型之引用类型与关联关系

    本节介绍信息系统开发中最常见的问题,引用关系,一对多关系,多对多关系. 以客户信息为例,客户通常需要客户分类,如VIP客户,普通客户,潜在客户.当然,我们可以定义枚举类型进行定义出这个类型,并在客户类 ...

  4. jQuery 中的children()和 find() 的区别

    <!DOCTYPE html> <html> <head> <script type="text/javascript" src=&quo ...

  5. Java、fileless恶意软件威胁桌面安全

    工作原理:用户访问一个受侵的网站,不小心下载了最新类型的恶意软件.如果你的杀毒软件运行良好的话,就会阻止下载,至少能够检测到并隔离硬盘上的入侵文件.但是如果硬盘上没有文件监测呢?如果恶意软件只入侵内存 ...

  6. 代码中特殊的注释技术——TODO、FIXME和XXX的用处

    本文内容概要: 代码中特殊的注释技术--TODO.FIXME和XXX的用处. 前言:今天在阅读Qt  Creator的源代码时,发现一些注释中有FIXME英文单词,用英文词典居然查不到其意义!实际上, ...

  7. Xcode好用的插件

    注释:每当Xcode升级之后,都会导致原有的Xcode插件不能使用,这是因为每个插件的Info.plist中记录了该插件兼容Xcode版本的DVTPlugInCompatibilityUUID,而每个 ...

  8. jQuery.form.js jQuery ajax异步提交form

    jQuery.form.js是一个form插件,支持ajax表单提交和ajax文件上传. 官网下载地址:http://plugins.jquery.com/form/ API ajaxForm 增加所 ...

  9. hadoop 入门实例【转】

    原文链接:http://www.cnblogs.com/xia520pi/archive/2012/06/04/2534533.html 1.数据去重  "数据去重"主要是为了掌握 ...

  10. Fedora 防火墙关闭与开启

    重启后生效 开启: chkconfig iptables on 关闭: chkconfig iptables off   或者 /sbin/chkconfig --level 2345 iptable ...