mk-task, 比较简单, 因为task只是概念上的结构, 不象其他worker, executor都需要创建进程或线程
所以其核心其实就是mk-task-data,
1. 创建TopologyContext对象, 其实就是把之前的topology对象和worker-data混合到一起, 便于task在执行时可以取到需要的topology信息.
2. 创建task-object, spout-object或bolt-object, 封装相应的逻辑, 如nextTuple, execute
3. 生成tasks-fn, 名字起的不好,让人误解执行了task的功能, 其实就是做些emit之间的准备工作, 其中最重要的就是调用grouper去产生targets task, 当然还包含些metrics, hooks的调用.

说白了其实mk-tasks, 没做啥事

(defn mk-task [executor-data task-id]
(let [task-data (mk-task-data executor-data task-id) ;;1 mk-task-data
storm-conf (:storm-conf executor-data)]
(doseq [klass (storm-conf TOPOLOGY-AUTO-TASK-HOOKS)] ;; add预定义的hooks
(.addTaskHook ^TopologyContext (:user-context task-data) (-> klass Class/forName .newInstance)))
;; when this is called, the threads for the executor haven't been started yet,
;; so we won't be risking trampling on the single-threaded claim strategy disruptor queue
(send-unanchored task-data SYSTEM-STREAM-ID ["startup"]) ;;向SYSTEM-STREAM, 发送startup通知,谁会接收SYSTEM-STREAM…?
task-data
))

 

1 mk-task-data

(defn mk-task-data [executor-data task-id]
(recursive-map
:executor-data executor-data
:task-id task-id
:system-context (system-topology-context (:worker executor-data) executor-data task-id)
:user-context (user-topology-context (:worker executor-data) executor-data task-id)
:builtin-metrics (builtin-metrics/make-data (:type executor-data))
:tasks-fn (mk-tasks-fn <>)
:object (get-task-object (.getRawTopology ^TopologyContext (:system-context <>)) (:component-id executor-data))))

1.1 TopologyContext

Storm-源码分析-Topology Submit-Task-TopologyContext

:system-context, :user-context, 只是context中的topology对象不同, system为system-topology!

1.2 builtin-metrics/make-data

这里的builtin-metrics用来记录spout或bolt的执行状况的metrics

1.3 mk-tasks-fn

返回tasks-fn, 这个函数主要用于做emit之前的准备工作, 返回target tasks list

1. 调用grouper, 产生target tasks

2. 执行emit hook

3. 满足sampler条件时, 更新stats和buildin-metrics

task-fn, 两种不同参数版本

[^String stream ^List values], 这个版本好理解些, 就是将stream对应的component的target tasks都算上(一个stream可能有多个out component, 一份数据需要发到多个bolt处理)

[^Integer out-task-id ^String stream ^List values], 指定out-task-id, 即direct grouping

这里对out-task-id做了验证

out-task-id (if grouping out-task-id), 即out-task-id->component->grouper不为nil(为:direct?), 即验证这个stream确实有到该out-task-id对应component

如果验证失败, 将out-task-id置nil

(defn mk-tasks-fn [task-data]
(let [task-id (:task-id task-data)
executor-data (:executor-data task-data)
component-id (:component-id executor-data)
^WorkerTopologyContext worker-context (:worker-context executor-data)
storm-conf (:storm-conf executor-data)
emit-sampler (mk-stats-sampler storm-conf)
stream->component->grouper (:stream->component->grouper executor-data) ;;Storm-源码分析-Streaming Grouping
user-context (:user-context task-data)
executor-stats (:stats executor-data)
debug? (= true (storm-conf TOPOLOGY-DEBUG))] (fn ([^Integer out-task-id ^String stream ^List values]
(when debug?
(log-message "Emitting direct: " out-task-id "; " component-id " " stream " " values))
(let [target-component (.getComponentId worker-context out-task-id)
component->grouping (get stream->component->grouper stream)
grouping (get component->grouping target-component)
out-task-id (if grouping out-task-id)]
(when (and (not-nil? grouping) (not= :direct grouping))
(throw (IllegalArgumentException. "Cannot emitDirect to a task expecting a regular grouping")))
(apply-hooks user-context .emit (EmitInfo. values stream task-id [out-task-id]))
(when (emit-sampler)
(builtin-metrics/emitted-tuple! (:builtin-metrics task-data) executor-stats stream)
(stats/emitted-tuple! executor-stats stream)
(if out-task-id
(stats/transferred-tuples! executor-stats stream 1)
(builtin-metrics/transferred-tuple! (:builtin-metrics task-data) executor-stats stream 1)))
(if out-task-id [out-task-id])
))
([^String stream ^List values]
(when debug?
(log-message "Emitting: " component-id " " stream " " values))
(let [out-tasks (ArrayList.)]
(fast-map-iter [[out-component grouper] (get stream->component->grouper stream)]
(when (= :direct grouper)
;; TODO: this is wrong, need to check how the stream was declared
(throw (IllegalArgumentException. "Cannot do regular emit to direct stream")))
(let [comp-tasks (grouper task-id values)] ;;执行grouper, 产生target tasks
(if (or (sequential? comp-tasks) (instance? Collection comp-tasks))
(.addAll out-tasks comp-tasks)
(.add out-tasks comp-tasks)
)))
(apply-hooks user-context .emit (EmitInfo. values stream task-id out-tasks)) ;;执行事先注册的emit hook
(when (emit-sampler) ;;满足抽样条件时, 更新stats和buildin-metrics中的emitted和transferred metric
(stats/emitted-tuple! executor-stats stream)
(builtin-metrics/emitted-tuple! (:builtin-metrics task-data) executor-stats stream)
(stats/transferred-tuples! executor-stats stream (count out-tasks))
(builtin-metrics/transferred-tuple! (:builtin-metrics task-data) executor-stats stream (count out-tasks)))
out-tasks)))
))

1.4 get-task-object

取出component的对象,

比如对于Spout, 取出SpoutSpec中的ComponentObject spout_object, 包含了spout的逻辑, 比如nextTuple()

(defn- get-task-object [^TopologyContext topology component-id]
(let [spouts (.get_spouts topology)
bolts (.get_bolts topology)
state-spouts (.get_state_spouts topology)
obj (Utils/getSetComponentObject
(cond
(contains? spouts component-id) (.get_spout_object ^SpoutSpec (get spouts component-id))
(contains? bolts component-id) (.get_bolt_object ^Bolt (get bolts component-id))
(contains? state-spouts component-id) (.get_state_spout_object ^StateSpoutSpec (get state-spouts component-id))
true (throw-runtime "Could not find " component-id " in " topology)))
obj (if (instance? ShellComponent obj)
(if (contains? spouts component-id)
(ShellSpout. obj)
(ShellBolt. obj))
obj )
obj (if (instance? JavaObject obj)
(thrift/instantiate-java-object obj)
obj )]
obj
))

Storm-源码分析-Topology Submit-Task的更多相关文章

  1. storm源码分析之任务分配--task assignment

    在"storm源码分析之topology提交过程"一文最后,submitTopologyWithOpts函数调用了mk-assignments函数.该函数的主要功能就是进行topo ...

  2. Storm源码分析--Nimbus-data

    nimbus-datastorm-core/backtype/storm/nimbus.clj (defn nimbus-data [conf inimbus] (let [forced-schedu ...

  3. JStorm与Storm源码分析(四)--均衡调度器,EvenScheduler

    EvenScheduler同DefaultScheduler一样,同样实现了IScheduler接口, 由下面代码可以看出: (ns backtype.storm.scheduler.EvenSche ...

  4. JStorm与Storm源码分析(一)--nimbus-data

    Nimbus里定义了一些共享数据结构,比如nimbus-data. nimbus-data结构里定义了很多公用的数据,请看下面代码: (defn nimbus-data [conf inimbus] ...

  5. JStorm与Storm源码分析(三)--Scheduler,调度器

    Scheduler作为Storm的调度器,负责为Topology分配可用资源. Storm提供了IScheduler接口,用户可以通过实现该接口来自定义Scheduler. 其定义如下: public ...

  6. JStorm与Storm源码分析(二)--任务分配,assignment

    mk-assignments主要功能就是产生Executor与节点+端口的对应关系,将Executor分配到某个节点的某个端口上,以及进行相应的调度处理.代码注释如下: ;;参数nimbus为nimb ...

  7. Spark源码分析之八:Task运行(二)

    在<Spark源码分析之七:Task运行(一)>一文中,我们详细叙述了Task运行的整体流程,最终Task被传输到Executor上,启动一个对应的TaskRunner线程,并且在线程池中 ...

  8. Spark源码分析之七:Task运行(一)

    在Task调度相关的两篇文章<Spark源码分析之五:Task调度(一)>与<Spark源码分析之六:Task调度(二)>中,我们大致了解了Task调度相关的主要逻辑,并且在T ...

  9. Spark源码分析之六:Task调度(二)

    话说在<Spark源码分析之五:Task调度(一)>一文中,我们对Task调度分析到了DriverEndpoint的makeOffers()方法.这个方法针对接收到的ReviveOffer ...

  10. Spark源码分析之五:Task调度(一)

    在前四篇博文中,我们分析了Job提交运行总流程的第一阶段Stage划分与提交,它又被细化为三个分阶段: 1.Job的调度模型与运行反馈: 2.Stage划分: 3.Stage提交:对应TaskSet的 ...

随机推荐

  1. 电脑不识别USB blaster驱动问题

    电脑不识别USB blaster,如下图: 解决办法:手动更新 http://zhidao.baidu.com/link?url=snVT__AsbtmQ4U5EBVN05Yrgv1TPv7AdVYe ...

  2. [css]后台管理系统布局

    知识点: 绝对定位+overflowhidden 整体思路 三大块 pg-header---需要固定 (height:48px) pg-content menu 右侧菜单-需要固定(width:200 ...

  3. 反射学习2-通过反射机制动态获取属性的值模拟Struts的自动赋值

    一.准备知识:   Java反射机制   处理事务的JavaBean   String的操作常用方法 二.模拟步骤   这里我们通过反射机制动态获取属性的值模拟Struts中的自动赋值. 1.首先创建 ...

  4. CSS Sprites的概念、原理、适用范围和优缺点

    CSS Sprites概念 CSSSprites在国内很多人叫css精灵,是一种网页图片应用处理方式.它允许你将一个页面涉及到的所有零星图片都包含到一张大图中去,这样一来,当访问该页面时,载入的图片就 ...

  5. onActivityResult 通过case对不同情况进行处理

    @Overridepublic void onActivityResult(int requestCode, int resultCode, Intent data) { switch (result ...

  6. @Transactional spring事务无效的解决方案

    关于@Transactional注解 一般都认为要注意以下三点 1 .在需要事务管理的地方加@Transactional 注解.@Transactional 注解可以被应用于接口定义和接口方法.类定义 ...

  7. npm太慢, 淘宝npm镜像使用方法[转]

    淘宝 npm 地址: http://npm.taobao.org/ 如何使用 有很多方法来配置npm的registry地址,下面根据不同情境列出几种比较常用的方法.以淘宝npm镜像举例: 1.临时使用 ...

  8. CI cookie 存放数组

      #ci cookie 由于不能存放数组,所有必须序列化之后在存入数组中 #定义数组 $cookie_array=array( 'shop_id'=>$gid, 'shop_name'=> ...

  9. pycharm重置设置,恢复默认设置

    备忘,备忘,备忘 window 系统 找到下方目录-->删除. 再重新打开pycharm # Windows Vista, 7, 8, 10: <SYSTEM DRIVE>\User ...

  10. 【BZOJ】2100: [Usaco2010 Dec]Apple Delivery(spfa+优化)

    http://www.lydsy.com/JudgeOnline/problem.php?id=2100 这题我要吐血啊 我交了不下10次tle.. 噗 果然是写挫了. 一开始没加spfa优化果断t ...