Storm-源码分析-Topology Submit-Task
mk-task, 比较简单, 因为task只是概念上的结构, 不象其他worker, executor都需要创建进程或线程
所以其核心其实就是mk-task-data,
1. 创建TopologyContext对象, 其实就是把之前的topology对象和worker-data混合到一起, 便于task在执行时可以取到需要的topology信息.
2. 创建task-object, spout-object或bolt-object, 封装相应的逻辑, 如nextTuple, execute
3. 生成tasks-fn, 名字起的不好,让人误解执行了task的功能, 其实就是做些emit之间的准备工作, 其中最重要的就是调用grouper去产生targets task, 当然还包含些metrics, hooks的调用.
说白了其实mk-tasks, 没做啥事
(defn mk-task [executor-data task-id]
(let [task-data (mk-task-data executor-data task-id) ;;1 mk-task-data
storm-conf (:storm-conf executor-data)]
(doseq [klass (storm-conf TOPOLOGY-AUTO-TASK-HOOKS)] ;; add预定义的hooks
(.addTaskHook ^TopologyContext (:user-context task-data) (-> klass Class/forName .newInstance)))
;; when this is called, the threads for the executor haven't been started yet,
;; so we won't be risking trampling on the single-threaded claim strategy disruptor queue
(send-unanchored task-data SYSTEM-STREAM-ID ["startup"]) ;;向SYSTEM-STREAM, 发送startup通知,谁会接收SYSTEM-STREAM…?
task-data
))
1 mk-task-data
(defn mk-task-data [executor-data task-id]
(recursive-map
:executor-data executor-data
:task-id task-id
:system-context (system-topology-context (:worker executor-data) executor-data task-id)
:user-context (user-topology-context (:worker executor-data) executor-data task-id)
:builtin-metrics (builtin-metrics/make-data (:type executor-data))
:tasks-fn (mk-tasks-fn <>)
:object (get-task-object (.getRawTopology ^TopologyContext (:system-context <>)) (:component-id executor-data))))
1.1 TopologyContext
Storm-源码分析-Topology Submit-Task-TopologyContext
:system-context, :user-context, 只是context中的topology对象不同, system为system-topology!
1.2 builtin-metrics/make-data
这里的builtin-metrics用来记录spout或bolt的执行状况的metrics
1.3 mk-tasks-fn
返回tasks-fn, 这个函数主要用于做emit之前的准备工作, 返回target tasks list
1. 调用grouper, 产生target tasks
2. 执行emit hook
3. 满足sampler条件时, 更新stats和buildin-metrics
task-fn, 两种不同参数版本
[^String stream ^List values], 这个版本好理解些, 就是将stream对应的component的target tasks都算上(一个stream可能有多个out component, 一份数据需要发到多个bolt处理)
[^Integer out-task-id ^String stream ^List values], 指定out-task-id, 即direct grouping
这里对out-task-id做了验证
out-task-id (if grouping out-task-id), 即out-task-id->component->grouper不为nil(为:direct?), 即验证这个stream确实有到该out-task-id对应component
如果验证失败, 将out-task-id置nil
(defn mk-tasks-fn [task-data]
(let [task-id (:task-id task-data)
executor-data (:executor-data task-data)
component-id (:component-id executor-data)
^WorkerTopologyContext worker-context (:worker-context executor-data)
storm-conf (:storm-conf executor-data)
emit-sampler (mk-stats-sampler storm-conf)
stream->component->grouper (:stream->component->grouper executor-data) ;;Storm-源码分析-Streaming Grouping
user-context (:user-context task-data)
executor-stats (:stats executor-data)
debug? (= true (storm-conf TOPOLOGY-DEBUG))] (fn ([^Integer out-task-id ^String stream ^List values]
(when debug?
(log-message "Emitting direct: " out-task-id "; " component-id " " stream " " values))
(let [target-component (.getComponentId worker-context out-task-id)
component->grouping (get stream->component->grouper stream)
grouping (get component->grouping target-component)
out-task-id (if grouping out-task-id)]
(when (and (not-nil? grouping) (not= :direct grouping))
(throw (IllegalArgumentException. "Cannot emitDirect to a task expecting a regular grouping")))
(apply-hooks user-context .emit (EmitInfo. values stream task-id [out-task-id]))
(when (emit-sampler)
(builtin-metrics/emitted-tuple! (:builtin-metrics task-data) executor-stats stream)
(stats/emitted-tuple! executor-stats stream)
(if out-task-id
(stats/transferred-tuples! executor-stats stream 1)
(builtin-metrics/transferred-tuple! (:builtin-metrics task-data) executor-stats stream 1)))
(if out-task-id [out-task-id])
))
([^String stream ^List values]
(when debug?
(log-message "Emitting: " component-id " " stream " " values))
(let [out-tasks (ArrayList.)]
(fast-map-iter [[out-component grouper] (get stream->component->grouper stream)]
(when (= :direct grouper)
;; TODO: this is wrong, need to check how the stream was declared
(throw (IllegalArgumentException. "Cannot do regular emit to direct stream")))
(let [comp-tasks (grouper task-id values)] ;;执行grouper, 产生target tasks
(if (or (sequential? comp-tasks) (instance? Collection comp-tasks))
(.addAll out-tasks comp-tasks)
(.add out-tasks comp-tasks)
)))
(apply-hooks user-context .emit (EmitInfo. values stream task-id out-tasks)) ;;执行事先注册的emit hook
(when (emit-sampler) ;;满足抽样条件时, 更新stats和buildin-metrics中的emitted和transferred metric
(stats/emitted-tuple! executor-stats stream)
(builtin-metrics/emitted-tuple! (:builtin-metrics task-data) executor-stats stream)
(stats/transferred-tuples! executor-stats stream (count out-tasks))
(builtin-metrics/transferred-tuple! (:builtin-metrics task-data) executor-stats stream (count out-tasks)))
out-tasks)))
))
1.4 get-task-object
取出component的对象,
比如对于Spout, 取出SpoutSpec中的ComponentObject spout_object, 包含了spout的逻辑, 比如nextTuple()
(defn- get-task-object [^TopologyContext topology component-id]
(let [spouts (.get_spouts topology)
bolts (.get_bolts topology)
state-spouts (.get_state_spouts topology)
obj (Utils/getSetComponentObject
(cond
(contains? spouts component-id) (.get_spout_object ^SpoutSpec (get spouts component-id))
(contains? bolts component-id) (.get_bolt_object ^Bolt (get bolts component-id))
(contains? state-spouts component-id) (.get_state_spout_object ^StateSpoutSpec (get state-spouts component-id))
true (throw-runtime "Could not find " component-id " in " topology)))
obj (if (instance? ShellComponent obj)
(if (contains? spouts component-id)
(ShellSpout. obj)
(ShellBolt. obj))
obj )
obj (if (instance? JavaObject obj)
(thrift/instantiate-java-object obj)
obj )]
obj
))
Storm-源码分析-Topology Submit-Task的更多相关文章
- storm源码分析之任务分配--task assignment
在"storm源码分析之topology提交过程"一文最后,submitTopologyWithOpts函数调用了mk-assignments函数.该函数的主要功能就是进行topo ...
- Storm源码分析--Nimbus-data
nimbus-datastorm-core/backtype/storm/nimbus.clj (defn nimbus-data [conf inimbus] (let [forced-schedu ...
- JStorm与Storm源码分析(四)--均衡调度器,EvenScheduler
EvenScheduler同DefaultScheduler一样,同样实现了IScheduler接口, 由下面代码可以看出: (ns backtype.storm.scheduler.EvenSche ...
- JStorm与Storm源码分析(一)--nimbus-data
Nimbus里定义了一些共享数据结构,比如nimbus-data. nimbus-data结构里定义了很多公用的数据,请看下面代码: (defn nimbus-data [conf inimbus] ...
- JStorm与Storm源码分析(三)--Scheduler,调度器
Scheduler作为Storm的调度器,负责为Topology分配可用资源. Storm提供了IScheduler接口,用户可以通过实现该接口来自定义Scheduler. 其定义如下: public ...
- JStorm与Storm源码分析(二)--任务分配,assignment
mk-assignments主要功能就是产生Executor与节点+端口的对应关系,将Executor分配到某个节点的某个端口上,以及进行相应的调度处理.代码注释如下: ;;参数nimbus为nimb ...
- Spark源码分析之八:Task运行(二)
在<Spark源码分析之七:Task运行(一)>一文中,我们详细叙述了Task运行的整体流程,最终Task被传输到Executor上,启动一个对应的TaskRunner线程,并且在线程池中 ...
- Spark源码分析之七:Task运行(一)
在Task调度相关的两篇文章<Spark源码分析之五:Task调度(一)>与<Spark源码分析之六:Task调度(二)>中,我们大致了解了Task调度相关的主要逻辑,并且在T ...
- Spark源码分析之六:Task调度(二)
话说在<Spark源码分析之五:Task调度(一)>一文中,我们对Task调度分析到了DriverEndpoint的makeOffers()方法.这个方法针对接收到的ReviveOffer ...
- Spark源码分析之五:Task调度(一)
在前四篇博文中,我们分析了Job提交运行总流程的第一阶段Stage划分与提交,它又被细化为三个分阶段: 1.Job的调度模型与运行反馈: 2.Stage划分: 3.Stage提交:对应TaskSet的 ...
随机推荐
- sell学习
Linux Shell编程入门 从程序员的角度来看, Shell本身是一种用C语言编写的程序,从用户的角度来看,Shell是用户与Linux操作系统沟通的桥梁.用户既可以输入命令执行,又可以利用 Sh ...
- LayerMask小结
layerMask参数: Raycast (ray : Ray, out hitInfo : RaycastHit, distance : float = Mathf.Infinity, layerM ...
- c#各类型转byte[]或转回
var tmp = BitConverter.ToInt32(new byte[]{...}); var bytes = BitConverter.GetBytes(tmp); 而String转byt ...
- Atitit .linux 取回root 密码q99
Atitit .linux 取回root 密码q99 1.1. 停止mysql1 1.2. mysqld_safe路径1 1.3. Mysql配置文件路径1 1.4. Mysql路径1 1.5. 安全 ...
- 龙芯CAN测试(sja1000)
测试方案 CAN0和CAN1相连,互相收发数据.连接方式如下图: 使用扩展模式CAN1发送数据CAN0接收数据. 使用标准模式CAN1发送数据CAN0接收数据. 使用EJTAG中bin文件夹内的can ...
- 数据库设计(四)数据库的规范化(Normalization)
数据库的规范化 Database Normalization is a technique of organizing the data in the database. Normalization ...
- python -> lambda与def的差别
lambda能够定义一个匿名函数.而def定义的函数必须有一个名字. 这应该是lambda与def两者最大的差别. 与Javascript不同的是,python中匿名函数与非匿名函数须要使用不同的语法 ...
- 【Python】分享使用的插件文件链接(实时更新)
链接:https://pan.baidu.com/s/1o7AgHtw Python工具实时更新.
- Meter and pixel units in a box2d game - LibGDX
http://pimentoso.blogspot.com/2013/01/meter-and-pixel-units-in-box2d-game.html 需FQ ————————————————— ...
- CVPR(IEEE Conference on Computer Vision and Pattern Recognition)
论文提交时间:11月份中旬左右会议时间:7月份左右 CVPR 2017: 网址:http://cvpr2017.thecvf.com/ 接受论文数:782