Storm-源码分析- bolt (backtype.storm.task)

Bolt关键的接口为execute,
Tuple的真正处理逻辑, 通过OutputCollector.emit发出新的tuples, 调用ack或fail处理的tuple

/**

 * An IBolt represents a component that takes tuples as input and produces tuples

 * as output. An IBolt can do everything from filtering to joining to functions

 * to aggregations. It does not have to process a tuple immediately and may

 * hold onto tuples to process later.

 *

 * <p>A bolt's lifecycle is as follows:</p>

 *

 * <p>IBolt object created on client machine. The IBolt is serialized into the topology

 * (using Java serialization) and submitted to the master machine of the cluster (Nimbus).

 * Nimbus then launches workers which deserialize the object, call prepare on it, and then

 * start processing tuples.</p>

 *

 * <p>If you want to parameterize an IBolt, you should set the parameter's through its

 * constructor and save the parameterization state as instance variables (which will

 * then get serialized and shipped to every task executing this bolt across the cluster).</p>

 *

 * <p>When defining bolts in Java, you should use the IRichBolt interface which adds

 * necessary methods for using the Java TopologyBuilder API.</p>

 */

public interface IBolt extends Serializable {

    /**

     * Called when a task for this component is initialized within a worker on the cluster.

     * It provides the bolt with the environment in which the bolt executes.

     *

     * <p>This includes the:</p>

     *

     * @param stormConf The Storm configuration for this bolt. This is the configuration provided to the topology merged in with cluster configuration on this machine.

     * @param context This object can be used to get information about this task's place within the topology, including the task id and component id of this task, input and output information, etc.

     * @param collector The collector is used to emit tuples from this bolt. Tuples can be emitted at any time, including the prepare and cleanup methods. The collector is thread-safe and should be saved as an instance variable of this bolt object.

     */

    void prepare(Map stormConf, TopologyContext context, OutputCollector collector);

    /**

     * Process a single tuple of input. The Tuple object contains metadata on it

     * about which component/stream/task it came from. The values of the Tuple can

     * be accessed using Tuple#getValue. The IBolt does not have to process the Tuple

     * immediately. It is perfectly fine to hang onto a tuple and process it later

     * (for instance, to do an aggregation or join).

     *

     * <p>Tuples should be emitted using the OutputCollector provided through the prepare method.

     * It is required that all input tuples are acked or failed at some point using the OutputCollector.

     * Otherwise, Storm will be unable to determine when tuples coming off the spouts

     * have been completed.</p>

     *

     * <p>For the common case of acking an input tuple at the end of the execute method,

     * see IBasicBolt which automates this.</p>

     *

     * @param input The input tuple to be processed.

     */

    void execute(Tuple input);

    /**

     * Called when an IBolt is going to be shutdown. There is no guarentee that cleanup

     * will be called, because the supervisor kill -9's worker processes on the cluster.

     *

     * <p>The one context where cleanup is guaranteed to be called is when a topology

     * is killed when running Storm in local mode.</p>

     */

    void cleanup();

}

首先OutputCollector, 主要是emit和emitDirect接口

List<Integer> emit(String streamId, Tuple anchor, List<Object> tuple)

emit, 3个参数, 发送到的streamid, anchors(来源tuples), tuple(values list)

如果streamid为空, 则发送到默认stream, Utils.DEFAULT_STREAM_ID

如果anchors为空, 则为unanchored tuple

1个返回值, 最终发送到的task ids

对比一下SpoutOutputCollector中的emit, 参数变化, 没有message-id, 多了anchors

而在在Bolt中, ack和fail接口在IOutputCollector中, 用于在execute中完成对上一级某tuple的处理和emit, 调用ack或fail

而在Spout中, ack和fail接口在ISpout中, 用于spout收到ack或fail tuple时调用

/**

 * This output collector exposes the API for emitting tuples from an IRichBolt.

 * This is the core API for emitting tuples. For a simpler API, and a more restricted

 * form of stream processing, see IBasicBolt and BasicOutputCollector.

 */

public class OutputCollector implements IOutputCollector {

    private IOutputCollector _delegate;

    public OutputCollector(IOutputCollector delegate) {

        _delegate = delegate;

    }

    /**

     * Emits a new tuple to a specific stream with a single anchor. The emitted values must be

     * immutable.

     *

     * @param streamId the stream to emit to

     * @param anchor the tuple to anchor to

     * @param tuple the new output tuple from this bolt

     * @return the list of task ids that this new tuple was sent to

     */

    public List<Integer> emit(String streamId, Tuple anchor, List<Object> tuple) {

        return emit(streamId, Arrays.asList(anchor), tuple);

    }

    /**

     * Emits a tuple directly to the specified task id on the specified stream.

     * If the target bolt does not subscribe to this bolt using a direct grouping,

     * the tuple will not be sent. If the specified output stream is not declared

     * as direct, or the target bolt subscribes with a non-direct grouping,

     * an error will occur at runtime. The emitted values must be

     * immutable.

     *

     * @param taskId the taskId to send the new tuple to

     * @param streamId the stream to send the tuple on. It must be declared as a direct stream in the topology definition.

     * @param anchor the tuple to anchor to

     * @param tuple the new output tuple from this bolt

     */

    public void emitDirect(int taskId, String streamId, Tuple anchor, List<Object> tuple) {

        emitDirect(taskId, streamId, Arrays.asList(anchor), tuple);

    }

    @Override

    public List<Integer> emit(String streamId, Collection<Tuple> anchors, List<Object> tuple) {

        return _delegate.emit(streamId, anchors, tuple);

    }

    @Override

    public void emitDirect(int taskId, String streamId, Collection<Tuple> anchors, List<Object> tuple) {

        _delegate.emitDirect(taskId, streamId, anchors, tuple);

    }

    @Override

    public void ack(Tuple input) {

        _delegate.ack(input);

    }

    @Override

    public void fail(Tuple input) {

        _delegate.fail(input);

    }

    @Override

    public void reportError(Throwable error) {

        _delegate.reportError(error);

    }

}

Storm-源码分析- bolt (backtype.storm.task)的更多相关文章

storm源码分析之任务分配--task assignment
在"storm源码分析之topology提交过程"一文最后,submitTopologyWithOpts函数调用了mk-assignments函数.该函数的主要功能就是进行topo ...
Storm源码分析--Nimbus-data
nimbus-datastorm-core/backtype/storm/nimbus.clj (defn nimbus-data [conf inimbus] (let [forced-schedu ...
JStorm与Storm源码分析（四）--均衡调度器，EvenScheduler
EvenScheduler同DefaultScheduler一样,同样实现了IScheduler接口, 由下面代码可以看出: (ns backtype.storm.scheduler.EvenSche ...
JStorm与Storm源码分析（三）--Scheduler，调度器
Scheduler作为Storm的调度器,负责为Topology分配可用资源. Storm提供了IScheduler接口,用户可以通过实现该接口来自定义Scheduler. 其定义如下: public ...
JStorm与Storm源码分析（二）--任务分配，assignment
mk-assignments主要功能就是产生Executor与节点+端口的对应关系,将Executor分配到某个节点的某个端口上,以及进行相应的调度处理.代码注释如下: ;;参数nimbus为nimb ...
JStorm与Storm源码分析（一）--nimbus-data
Nimbus里定义了一些共享数据结构,比如nimbus-data. nimbus-data结构里定义了很多公用的数据,请看下面代码: (defn nimbus-data [conf inimbus] ...
spark 源码分析之二十一 -- Task的执行流程
引言在上两篇文章 spark 源码分析之十九 -- DAG的生成和Stage的划分和 spark 源码分析之二十 -- Stage的提交中剖析了Spark的DAG的生成,Stage的划分以及St ...
storm源码分析之topology提交过程
storm集群上运行的是一个个topology,一个topology是spouts和bolts组成的图.当我们开发完topology程序后将其打成jar包,然后在shell中执行storm jar x ...
Nimbus<三>Storm源码分析--Nimbus启动过程
Nimbus server, 首先从启动命令开始, 同样是使用storm命令"storm nimbus”来启动看下源码, 此处和上面client不同, jvmtype="-serv ...

随机推荐

Xilinx IP核使用(一)--FIFO
今天在将SRIO的数据存入FIFO后,然后把FIFO中的数据不断送入FFT进行运算时,对于几个控制信号总产生问题.所以单独对FIFO进行了仿真.原来感觉FIFO的几个参数端口一目了然啊,还需要什么深入 ...
[position]返回顶部
position:fixed;实现 <!DOCTYPE html> <html lang="en"> <head> <meta chars ...
[svc][bg]phabricator-zh_CN汉化包
汉化phabricator审计系统主要是用来审计一些开发bug的,客服会提交一些bug,测试也会提交一些bug给开发. https://github.com/wanthings/phabricato ...
poj Squares n个点，共能组成多少个正方形二分 + 哈希
题目链接:http://poj.org/problem?id=2002 测试数据: 41 00 11 10 090 01 02 00 21 22 20 11 12 14-2 53 70 05 20 有 ...
svn还原文件中去掉已经删除的文件
1.到svn目录下,选择文件并提交 2.在弹出的对话窗口中,选择文件并右击,找到"解决" 3.再次点击"还原"的时候,已经删除的文件就没有了.
apache commons io入门
原文参考 http://www.javacodegeeks.com/2014/10/apache-commons-io-tutorial.html Apache Commons IO 包绝对是 ...
flume+kafka+storm打通过程
0.有的地方我已经整理成脚本了,有的命令是脚本里面截取的 1.启动hadoop和yarn $HADOOP_HOME/sbin/start-dfs.sh;$HADOOP_HOME/sbin/start- ...
traceroute/tracert--获取网络路由路径
traceroute 是用来检测发出数据包的主机到目标主机之间所经过的网关数量的工具.traceroute 的原理是试图以最小的TTL发出探测包来跟踪数据包到达目标主机所经过的网关,然后监听一个来自网 ...
C扩展php的方法（制作php扩展库）
用PHP调用C扩展整个配置过程在CentOS下今天终于把C扩展加入到PHP中了,并且可以调用,废话就不说了,看下文. 一.必须先要安装Apache和mysql,这两个的安装过程我就不说了. ...
IDEA中如何配置Tomcat和项目？
IDEA是我用的挺多的一款java代码编辑工具,对于刚接触这款软件的新手来说,配置项目是很麻烦的了,更别说配置服务器Tomcat了,那么通过我的教程大家一定觉得配置IDEA项目也是很轻松的事了. ...

Storm-源码分析- bolt (backtype.storm.task)

Storm-源码分析- bolt (backtype.storm.task)的更多相关文章

随机推荐

热门专题