Storm-源码分析-Topology Submit-Task-TopologyContext (backtype.storm.task)
1. GeneralTopologyContext
记录了Topology的基本信息, 包含StormTopology, StormConf
已经从他们推导出的, task和component, component的streams, input/output信息
public class GeneralTopologyContext implements JSONAware {
private StormTopology _topology;
private Map<Integer, String> _taskToComponent;
private Map<String, List<Integer>> _componentToTasks;
private Map<String, Map<String, Fields>> _componentToStreamToFields; //ComponentCommon.streams, map<string, StreamInfo>
private String _stormId; ;;topology id
protected Map _stormConf; }
StormTopology, worker从磁盘stormcode.ser
中读出
struct StormTopology {
//ids must be unique across maps
// #workers to use is in conf
1: required map<string, SpoutSpec> spouts;
2: required map<string, Bolt> bolts;
3: required map<string, StateSpoutSpec> state_spouts;
}
StormConf, worker从磁盘stormconf.ser
中读出
taskToComponent, componentToTasks, task和component的对应关系
componentToStreamToFields, component包含哪些streams, 每个stream包含哪些fields
除了显而易见的操作以外, 还有如下操作以获得component的输入和输出
/**
* Gets the declared inputs to the specified component.
*
* @return A map from subscribed component/stream to the grouping subscribed with.
*/
public Map<GlobalStreamId, Grouping> getSources(String componentId) {
return getComponentCommon(componentId).get_inputs(); //ComponentCommon.inputs,map<GlobalStreamId, Grouping>
}
/**
* Gets information about who is consuming the outputs of the specified component,
* and how.
*
* @return Map from stream id to component id to the Grouping used.
*/
public Map<String, Map<String, Grouping>> getTargets(String componentId) {
Map<String, Map<String, Grouping>> ret = new HashMap<String, Map<String, Grouping>>();
for(String otherComponentId: getComponentIds()) { //对所有components的id
Map<GlobalStreamId, Grouping> inputs = getComponentCommon(otherComponentId).get_inputs(); //取出component的inputs
for(GlobalStreamId id: inputs.keySet()) { //对inputs里面的每个stream-id
if(id.get_componentId().equals(componentId)) { //判断stream的源component是否是该component
Map<String, Grouping> curr = ret.get(id.get_streamId());
if(curr==null) curr = new HashMap<String, Grouping>();
curr.put(otherComponentId, inputs.get(id));
ret.put(id.get_streamId(), curr);
}
}
}
return ret; // [steamid, [target-componentid, grouping]]
}
这里面的getComponentCommon和getComponentIds, 来自ThriftTopologyUtils类
不要误解, 不是通过thriftAPI去nimbus获取信息, 只是从StormTopology里面读信息, 而StormTopology类本身是generated by thrift
thrift产生的class, 是有metaDataMap的, 所以实现如下
public static Set<String> getComponentIds(StormTopology topology) {
Set<String> ret = new HashSet<String>();
for(StormTopology._Fields f: StormTopology.metaDataMap.keySet()) {
Map<String, Object> componentMap = (Map<String, Object>) topology.getFieldValue(f);
ret.addAll(componentMap.keySet());
}
return ret;
}
通过metaDataMap读出StormTopology里面有哪些field, spouts,bolts,state_spouts, 然后遍历getFieldValue, 将value中的keyset返回
这样做的好处是, 动态, 当StormTopology发生变化时, 代码不用改, 对于普通java class应该无法实现这样的功能, 但是对于python这样的动态语言, 就简单了
当然这里其实也可以不用ThriftTopologyUtils, 直接写死从StormTopology.spouts…中去读
从storm.thrift里面看看ComponentCommon的定义, 上面两个函数就很好理解了
getTargets的实现, 需要看看, 因为是从inputs去推出outputs
因为在ComponentCommon只记录了output的streamid以及fields, 但无法知道这个stream发往哪个component
但对于input, streamid是GlobalStreamId类型, GlobalStreamId里面不但包含streamid,还有源component的componentid
所以从这个可以反推, 只要源component是当前component, 那么说明该component是源component的target component
struct ComponentCommon {
1: required map<GlobalStreamId, Grouping> inputs;
2: required map<string, StreamInfo> streams; //key is stream id, outputs
3: optional i32 parallelism_hint; //how many threads across the cluster should be dedicated to this component
4: optional string json_conf;
} struct SpoutSpec {
1: required ComponentObject spout_object;
2: required ComponentCommon common;
// can force a spout to be non-distributed by overriding the component configuration
// and setting TOPOLOGY_MAX_TASK_PARALLELISM to 1
} struct Bolt {
1: required ComponentObject bolt_object;
2: required ComponentCommon common;
}
2. WorkerTopologyContext
WorkerTopologyContext封装了些worker相关信息
public class WorkerTopologyContext extends GeneralTopologyContext {
public static final String SHARED_EXECUTOR = "executor"; private Integer _workerPort; ;;worker进程的port
private List<Integer> _workerTasks; ;;worker包含的taskids
private String _codeDir; ;;supervisor上的代码目录, stormdist/stormid
private String _pidDir; ;;记录worker运行进程(可能多个)的pids的目录,workid/pids
Map<String, Object> _userResources;
Map<String, Object> _defaultResources; }
3. TopologyContext
看注释, TopologyContext会作为bolt和spout的prepare(or open)函数的参数
所以用openOrPrepareWasCalled, 表示该TopologyContext是否被prepare调用过
registerMetric, 可以用于往_registeredMetrics中注册metics
注册的结构, [timeBucketSizeInSecs, [taskId, [name, metric]]]
_hooks, 用于注册task hook
/**
* A TopologyContext is given to bolts and spouts in their "prepare" and "open"
* methods, respectively. This object provides information about the component's
* place within the topology, such as task ids, inputs and outputs, etc.
*
* <p>The TopologyContext is also used to declare ISubscribedState objects to
* synchronize state with StateSpouts this object is subscribed to.</p>
*/
public class TopologyContext extends WorkerTopologyContext implements IMetricsContext {
private Integer _taskId;
private Map<String, Object> _taskData = new HashMap<String, Object>();
private List<ITaskHook> _hooks = new ArrayList<ITaskHook>();
private Map<String, Object> _executorData;
private Map<Integer,Map<Integer, Map<String, IMetric>>> _registeredMetrics;
private clojure.lang.Atom _openOrPrepareWasCalled;
public TopologyContext(StormTopology topology, Map stormConf,
Map<Integer, String> taskToComponent, Map<String, List<Integer>> componentToSortedTasks,
Map<String, Map<String, Fields>> componentToStreamToFields,
String stormId, String codeDir, String pidDir, Integer taskId,
Integer workerPort, List<Integer> workerTasks, Map<String, Object> defaultResources,
Map<String, Object> userResources, Map<String, Object> executorData, Map registeredMetrics,
clojure.lang.Atom openOrPrepareWasCalled) {
super(topology, stormConf, taskToComponent, componentToSortedTasks,
componentToStreamToFields, stormId, codeDir, pidDir,
workerPort, workerTasks, defaultResources, userResources);
_taskId = taskId;
_executorData = executorData;
_registeredMetrics = registeredMetrics;
_openOrPrepareWasCalled = openOrPrepareWasCalled;
}
4. 使用
mk-task-data, 创建每个task的topology context
user-context (user-topology-context (:worker executor-data) executor-data task-id)
(defn user-topology-context [worker executor-data tid]
((mk-topology-context-builder
worker
executor-data
(:topology worker))
tid)) (defn mk-topology-context-builder [worker executor-data topology]
(let [conf (:conf worker)]
#(TopologyContext.
topology
(:storm-conf worker)
(:task->component worker)
(:component->sorted-tasks worker)
(:component->stream->fields worker)
(:storm-id worker)
(supervisor-storm-resources-path
(supervisor-stormdist-root conf (:storm-id worker)))
(worker-pids-root conf (:worker-id worker))
(int %)
(:port worker)
(:task-ids worker)
(:default-shared-resources worker)
(:user-shared-resources worker)
(:shared-executor-data executor-data)
(:interval->task->metric-registry executor-data)
(:open-or-prepare-was-called? executor-data))))
Storm-源码分析-Topology Submit-Task-TopologyContext (backtype.storm.task)的更多相关文章
- Storm源码分析--Nimbus-data
nimbus-datastorm-core/backtype/storm/nimbus.clj (defn nimbus-data [conf inimbus] (let [forced-schedu ...
- JStorm与Storm源码分析(二)--任务分配,assignment
mk-assignments主要功能就是产生Executor与节点+端口的对应关系,将Executor分配到某个节点的某个端口上,以及进行相应的调度处理.代码注释如下: ;;参数nimbus为nimb ...
- storm源码分析之任务分配--task assignment
在"storm源码分析之topology提交过程"一文最后,submitTopologyWithOpts函数调用了mk-assignments函数.该函数的主要功能就是进行topo ...
- JStorm与Storm源码分析(四)--均衡调度器,EvenScheduler
EvenScheduler同DefaultScheduler一样,同样实现了IScheduler接口, 由下面代码可以看出: (ns backtype.storm.scheduler.EvenSche ...
- JStorm与Storm源码分析(一)--nimbus-data
Nimbus里定义了一些共享数据结构,比如nimbus-data. nimbus-data结构里定义了很多公用的数据,请看下面代码: (defn nimbus-data [conf inimbus] ...
- JStorm与Storm源码分析(三)--Scheduler,调度器
Scheduler作为Storm的调度器,负责为Topology分配可用资源. Storm提供了IScheduler接口,用户可以通过实现该接口来自定义Scheduler. 其定义如下: public ...
- Nimbus<三>Storm源码分析--Nimbus启动过程
Nimbus server, 首先从启动命令开始, 同样是使用storm命令"storm nimbus”来启动看下源码, 此处和上面client不同, jvmtype="-serv ...
- 【原】storm源码之mac os x编译twitter storm源码
twitter storm是由backtype公司创始人nathanmarz一手研发和开源的流计算(实时计算)框架,堪称实时计算领域的hadoop.nathanmarz也是在mac os x环境下开发 ...
- storm源码分析之topology提交过程
storm集群上运行的是一个个topology,一个topology是spouts和bolts组成的图.当我们开发完topology程序后将其打成jar包,然后在shell中执行storm jar x ...
- JStorm与Storm源码分析(五)--SpoutOutputCollector与代理模式
本文主要是解析SpoutOutputCollector源码,顺便分析该类中所涉及的设计模式–代理模式. 首先介绍一下Spout输出收集器接口–ISpoutOutputCollector,该接口主要声明 ...
随机推荐
- iOS开发通过AFNetworking上传图片到服务器
iOS开发通过AFNetworking上传图片到服务器 AFHTTPSessionManager *manager = [AFHTTPSessionManager manager]; manager. ...
- CSS学习笔记(9)--详解CSS中:nth-child的用法
详解CSS中:nth-child的用法 前端的哥们想必都接触过css中一个神奇的玩意,可以轻松选取你想要的标签并给与修改添加样式,是不是很给力,它就是“:nth-child”. 下面我将用几个典型的实 ...
- asp.net 下载的几种方式
protected void Button1_Click(object sender, EventArgs e) { /* 微软为Response对象提供了一个新的方法TransmitFile来 ...
- ASP.NET MVC4 异常拦截
ASP.NET MVC4 程序发生异常时,通过拦截Action的异常,重写ActionFilterAttribute 的方法OnActionExecuted实现. 具体实现代码如下: /// < ...
- C++ 类的继承四(类继承中的重名成员)
//类继承中的重名成员 #include<iostream> using namespace std; /* 自己猜想: 对于子类中的与父类重名的成员,c++编译器会单独为子类的这个成员变 ...
- C++ 匿名对象初始化新对象
//c++中匿名对象初始化新对象 #include<iostream> using namespace std; class Point{ public: Point(){ cout &l ...
- IIS故障问题(Connections_Refused)分析及处理【转】
这篇文章其实已经写好很久,只是后来一直没有重现当时的问题,或者因为业务的重要性.投诉的压力也就临时处理了.这几天某地市Web服务器连续多次出现这个问题,正好借这个案例来做个收尾. 前几个月有台重要的W ...
- web 前端 转盘界面
http://www.cnblogs.com/arfeizhang/p/turntable.html "如果有个做转盘的需求,你准备怎么做?设计师只会提供一个转盘的图片,其余都需要你完成,不 ...
- (转)FFMPEG-数据结构解释(AVCodecContext,AVStream,AVFormatContext)
AVCodecContext 这是一个描述编解码器上下文的数据结构,包含了众多编解码器需要的参数信息 如果是单纯使用libavcodec,这部分信息需要调用者进行初始化:如果是使用整个FFMPEG库 ...
- mysql -- 一次执行多条sql语句
最近要做一个软件升级,其中涉及到数据库表字段的变动(新增或删除或修改),所有的关于数据库的变动的sql语句都是存放在Sqlupdate.sql文件中,每次升级的时候都需要执行一次Sqlupdate.s ...