今天看了一下storm的命令行脚本${STORM_HOME}/bin/storm,现在将剖析过程整理一下,作为记录。注:使用的storm版本为0.8.0。

${STORM_HOME}/bin/storm文件是用python写的,该文件写的还是相当精简和清晰的。

首先,命令的运行从main()方法开始,main()方法主要是解析输入的命令和命令携带的参数以及读取默认配置和配置文件配置。

if __name__ == "__main__":
  main()

def main():

  if len(sys.argv) <= 1:

  print_usage()

    sys.exit(-1)
global CONFIG_OPTS
config_list, args = parse_config_opts(sys.argv[1:]) //读取配置项和参数(因为命令行以storm
                 //开头,故这里出去argv[0]),"-c"指定配置项,其他为参数
  parse_config(config_list)
  COMMAND = args[0] //提取执行命令(第一项为命令)
  ARGS = args[1:] //提取命令携带的参数
  (COMMANDS.get(COMMAND, "help"))(*ARGS) //查询COMMAND字典,获取要执行的命令对应的方法,调用该方法

以命令storm jar xxx.jar MAINCLASS arg1 arg2 为例,则上面的命令执行jar(xxx.jar, MAINCLASS, arg1, arg2)方法

下面重点分析一下命令执行方法,还是以jar()方法为例吧

def jar(jarfile, klass, *args):
"""Syntax: [storm jar topology-jar-path class ...] Runs the main method of class with the specified arguments.
The storm jars and configs in ~/.storm are put on the classpath.
The process is configured so that StormSubmitter
(http://nathanmarz.github.com/storm/doc/backtype/storm/StormSubmitter.html)
will upload the jar at topology-jar-path when the topology is submitted.
"""
exec_storm_class(
klass,
jvmtype="-client",
extrajars=[jarfile, CONF_DIR, STORM_DIR + "/bin"],
args=args,
jvmopts=["-Dstorm.jar=" + jarfile]) def exec_storm_class(klass, jvmtype="-server", jvmopts=[], extrajars=[], args=[], fork=False):
all_args = [
"java", jvmtype, get_config_opts(),
"-Dstorm.home=" + STORM_DIR,
"-Djava.library.path=" + confvalue("java.library.path", extrajars),
"-cp", get_classpath(extrajars),
] + jvmopts + [klass] + list(args)
print "Running: " + " ".join(all_args)
if fork:
os.spawnvp(os.P_WAIT, "java", all_args)
else:
os.execvp("java", all_args) # replaces the current process and never returns

实际执行的是exec_storm_class(klass, "-client", jvmopts, extrajars, args, fork=False),exec_storm_class()是所有strom命令的真正执行者。此方法的执行分两个关键得步骤:1.构建all_args这个list;2.使用系统调用启动java进程。分别对他们进行剖析:

1.构建all_args

all_args = [
"java", jvmtype, get_config_opts(),
"-Dstorm.home=" + STORM_DIR,
"-Djava.library.path=" + confvalue("java.library.path", extrajars),
"-cp", get_classpath(extrajars),
] + jvmopts + [klass] + list(args) def get_config_opts():
  """
设置-Dstorm.options变量
"""
global CONFIG_OPTS
return "-Dstorm.options=" + (','.join(CONFIG_OPTS)).replace(' ', "%%%%") def confvalue(name, extrapaths):
"""
启动进程“java -client backtype.storm.command.config_value $name”,来获取配置
"""
command = [
"java", "-client", get_config_opts(), "-cp", get_classpath(extrapaths), "backtype.storm.command.config_value", name
]
p = sub.Popen(command, stdout=sub.PIPE) //打开管道
output, errors = p.communicate() //从管道中读取输出和错误
lines = output.split("\n")
for line in lines:
tokens = line.split(" ")
if tokens[0] == "VALUE:":
return " ".join(tokens[1:])
return "" def get_classpath(extrajars):
"""
   将STORM_DIR、STORM_DIR/lib、extrajars目录下的所有文件提取出来构建出classpath的值
"""
ret = get_jars_full(STORM_DIR)
ret.extend(get_jars_full(STORM_DIR + "/lib"))
ret.extend(extrajars)
return normclasspath(":".join(ret))

重点关注confvalue方法,该方法使用subprocess模块启动java进程“java -client backtype.storm.command.config_value $name”来获取$name的配置值。

注:subprocess - Subprocesses with accessible I/O streams. This module allows you to spawn processes, connect to their input/output/error pipes, and obtain their return codes.

backtype.storm.command.config_value是backtype/storm/command/config_value.clj生成的。

config_value.clj代码如下:

(ns backtype.storm.command.config-value
(:use [backtype.storm config log])
(:gen-class)) (defn -main [^String name]
(let [conf (read-storm-config)]
(println "VALUE:" (conf name))
))
(ns backtype.storm.command.config-value //指定命名空间
(:use [backtype.storm config log]) //引入了config.clj和log.clj
(defn -main [^String name]
(let [conf (read-storm-config)]
(println "VALUE:" (conf name))
)) //config_value.clj的main方法是执行conf.clj的read-storm-config,将返回结果打印出来。
read-storm-config函数就是在config.clj中实现的:

  (defn read-storm-config []
  (clojurify-structure (Utils/readStormConfig)))

  其中Utils/readStormConfig表示使用了java的backtype.storm.utils包中的Utils类的readStormConfig方法,其定义如下:

public static Map readStormConfig() {
Map ret = readDefaultConfig();
Map storm = findAndReadConfigFile("storm.yaml", false); //读取storm.yaml文件
ret.putAll(storm);
ret.putAll(readCommandLineOpts());
return ret;
} public static Map readDefaultConfig() {
return findAndReadConfigFile("defaults.yaml", true); //读取defaults.yaml文件
} public static Map findAndReadConfigFile(String name) {
return findAndReadConfigFile(name, true);
} public static Map findAndReadConfigFile(String name, boolean mustExist) {
try {
List<URL> resources = findResources(name);
if(resources.isEmpty()) {
if(mustExist) throw new RuntimeException("Could not find config file on classpath " + name);
else return new HashMap();
}
if(resources.size() > 1) {
throw new RuntimeException("Found multiple " + name + " resources. You're probably bundling the Storm jars with your topology jar.");
}
URL resource = resources.get(0);
Yaml yaml = new Yaml();
Map ret = (Map) yaml.load(new InputStreamReader(resource.openStream()));
if(ret==null) ret = new HashMap(); return new HashMap(ret); } catch (IOException e) {
throw new RuntimeException(e);
}
}

可见storm的默认配置是在default.yaml和storm.yaml中。贴上一个测试的最终执行进程信息:

Running: java -client -Dstorm.options= -Dstorm.home=/opt/storm/storm-0.8.0 -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib -cp /opt/storm/storm-0.8.0/storm-0.8.0.jar:/opt/storm/storm-0.8.0/lib/jgrapht-0.8.3.jar:/opt/storm/storm-0.8.0/lib/servlet-api-2.5-20081211.jar:/opt/storm/storm-0.8.0/lib/curator-client-1.0.1.jar:/opt/storm/storm-0.8.0/lib/tools.cli-0.2.2.jar:/opt/storm/storm-0.8.0/lib/clout-0.4.1.jar:/opt/storm/storm-0.8.0/lib/guava-10.0.1.jar:/opt/storm/storm-0.8.0/lib/jsr305-1.3.9.jar:/opt/storm/storm-0.8.0/lib/jetty-util-6.1.26.jar:/opt/storm/storm-0.8.0/lib/commons-logging-1.1.1.jar:/opt/storm/storm-0.8.0/lib/curator-framework-1.0.1.jar:/opt/storm/storm-0.8.0/lib/commons-exec-1.1.jar:/opt/storm/storm-0.8.0/lib/kryo-2.17.jar:/opt/storm/storm-0.8.0/lib/ring-servlet-0.3.11.jar:/opt/storm/storm-0.8.0/lib/math.numeric-tower-0.0.1.jar:/opt/storm/storm-0.8.0/lib/reflectasm-1.07-shaded.jar:/opt/storm/storm-0.8.0/lib/ring-jetty-adapter-0.3.11.jar:/opt/storm/storm-0.8.0/lib/jline-0.9.94.jar:/opt/storm/storm-0.8.0/lib/httpcore-4.1.jar:/opt/storm/storm-0.8.0/lib/jetty-6.1.26.jar:/opt/storm/storm-0.8.0/lib/slf4j-log4j12-1.5.8.jar:/opt/storm/storm-0.8.0/lib/commons-fileupload-1.2.1.jar:/opt/storm/storm-0.8.0/lib/slf4j-api-1.5.8.jar:/opt/storm/storm-0.8.0/lib/clojure-1.4.0.jar:/opt/storm/storm-0.8.0/lib/json-simple-1.1.jar:/opt/storm/storm-0.8.0/lib/asm-4.0.jar:/opt/storm/storm-0.8.0/lib/ring-core-0.3.10.jar:/opt/storm/storm-0.8.0/lib/commons-io-1.4.jar:/opt/storm/storm-0.8.0/lib/junit-3.8.1.jar:/opt/storm/storm-0.8.0/lib/httpclient-4.1.1.jar:/opt/storm/storm-0.8.0/lib/disruptor-2.10.1.jar:/opt/storm/storm-0.8.0/lib/tools.logging-0.2.3.jar:/opt/storm/storm-0.8.0/lib/tools.macro-0.1.0.jar:/opt/storm/storm-0.8.0/lib/commons-codec-1.4.jar:/opt/storm/storm-0.8.0/lib/minlog-1.2.jar:/opt/storm/storm-0.8.0/lib/joda-time-2.0.jar:/opt/storm/storm-0.8.0/lib/snakeyaml-1.9.jar:/opt/storm/storm-0.8.0/lib/commons-lang-2.5.jar:/opt/storm/storm-0.8.0/lib/log4j-1.2.16.jar:/opt/storm/storm-0.8.0/lib/servlet-api-2.5.jar:/opt/storm/storm-0.8.0/lib/hiccup-0.3.6.jar:/opt/storm/storm-0.8.0/lib/zookeeper-3.3.3.jar:/opt/storm/storm-0.8.0/lib/core.incubator-0.1.0.jar:/opt/storm/storm-0.8.0/lib/carbonite-1.5.0.jar:/opt/storm/storm-0.8.0/lib/libthrift7-0.7.0.jar:/opt/storm/storm-0.8.0/lib/objenesis-1.2.jar:/opt/storm/storm-0.8.0/lib/clj-time-0.4.1.jar:/opt/storm/storm-0.8.0/lib/compojure-0.6.4.jar:/opt/storm/storm-0.8.0/lib/jzmq-2.1.0.jar:xxx.jar:/home/storm/.storm:/opt/storm/storm-0.8.0/bin -Dstorm.jar=xxx.jar Test arg1

2.使用系统调用启动java进程

if fork:
os.spawnvp(os.P_WAIT, "java", all_args)
else:
os.execvp("java", all_args) # replaces the current process and never returns

此处是使用fork或exec来启动进程,实际使用的是exec。至于fork和exec的区别,可以参考http://www.cnblogs.com/jerryshao2015/p/4432060.html

storm源码剖析(1):storm脚本的更多相关文章

  1. storm源码之理解Storm中Worker、Executor、Task关系 + 并发度详解

    本文导读: 1 Worker.Executor.task详解 2 配置拓扑的并发度 3 拓扑示例 4 动态配置拓扑并发度 Worker.Executor.Task详解: Storm在集群上运行一个To ...

  2. storm源码剖析(3):topology启动过程

    storm的topology启动过程是执行strom jar topology1.jar MAINCLASS ARG1 ARG2 鉴于前面已经分析了脚本的解析过程,现在重点分析topology1.ja ...

  3. storm源码剖析(2):storm的配置项

    storm的配置项,可以从backtype/storm/Config.java中找到所有配置项及其描述

  4. 【原】storm源码之理解Storm中Worker、Executor、Task关系

    Storm在集群上运行一个Topology时,主要通过以下3个实体来完成Topology的执行工作:1. Worker(进程)2. Executor(线程)3. Task 下图简要描述了这3者之间的关 ...

  5. 【原】storm源码之mac os x编译twitter storm源码

    twitter storm是由backtype公司创始人nathanmarz一手研发和开源的流计算(实时计算)框架,堪称实时计算领域的hadoop.nathanmarz也是在mac os x环境下开发 ...

  6. storm源码之storm代码结构【译】【转】

    [原]storm源码之storm代码结构[译]  说明:本文翻译自Storm在GitHub上的官方Wiki中提供的Storm代码结构描述一节Structure of the codebase,希望对正 ...

  7. storm源码之一个class解决nimbus单点问题【转】

    本文导读: storm nimbus 单节点问题概述 storm与解决nimbus单点相关的概念 nimbus目前无法做到多节点的原因 解决nimbus单点问题的关键 业界对nimbus单点问题的努力 ...

  8. 【原】storm源码之storm代码结构【译】

    说明:本文翻译自Storm在GitHub上的官方Wiki中提供的Storm代码结构描述一节Structure of the codebase,希望对正在基于Storm进行源码级学习和研究的朋友有所帮助 ...

  9. twitter storm源码走读之3--topology提交过程分析

    概要 storm cluster可以想像成为一个工厂,nimbus主要负责从外部接收订单和任务分配.除了从外部接单,nimbus还要将这些外部订单转换成为内部工作分配,这个时候nimbus充当了调度室 ...

随机推荐

  1. 开发SharePoint 自定义WebService 的小工具

    是一个开源的项目,地址:http://www.codeproject.com/Articles/10728/WSS-Web-Service-DISCO-and-WSDL-Generator-Helpe ...

  2. shell循环,判断介绍,以及实例

    shell的循环主要有3种,for,while,until shell的分支判断主要有2种,if,case 一,for循环 #!/bin/bash for file in $(ls /tmp/test ...

  3. git 操作分支

    1. git 查看本地分支:git branch 2. git 查看所有分支:git branch -a 3. git 新建本地分支:git branch branchName 4. git 新建分支 ...

  4. CentOS搭建git服务器实测

    Git 可以使用四种主要的协议来传输数据:本地传输,SSH 协议,Git 协议和 HTTP 协议 1,安装: CentOS/Fedora: yum install git Ubuntu/Debian: ...

  5. OpenCV 中的三大数据类型( 概述 )

    前言 OpenCV 提供了许多封装好了的类型,而其中,以三大类型最为核心.本文将大致介绍这三大类型. CvArr:不确定数组 它可以被视为一个抽象基类,后面的两大类型都继承此类型并扩展.只要某个函数的 ...

  6. c++ 系统函数实现文件拷贝

    #include "stdafx.h" #include <string> #include<windows.h> #include<iostream ...

  7. 基于zookeeper或redis实现分布式锁

    前言 在分布式系统中,分布式锁是为了解决多实例之间的同步问题.例如master选举,能够获取分布式锁的就是master,获取失败的就是slave.又或者能够获取锁的实例能够完成特定的操作. 目前比较常 ...

  8. MongoDB查询语句(转)

    目录 查询操作 集合查询方法 find() 查询内嵌文档 查询操作符(内含 数组查询) "$gt" ."$gte". "$lt". &quo ...

  9. 九度OJ 1162:I Wanna Go Home(我想回家) (最短路径)

    时间限制:1 秒 内存限制:32 兆 特殊判题:否 提交:870 解决:415 题目描述: The country is facing a terrible civil war----cities i ...

  10. iOS应用上架报错解决

    ERROR ITMS-90087: "Unsupported Architectures. The executable for LiveStorage.app/Frameworks/Spe ...