【原】 Spark中Worker源码分析(二)
继续前一篇的内容。前一篇内容为:
Spark中Worker源码分析(一)http://www.cnblogs.com/yourarebest/p/5300202.html
4.receive方法,
receive方法主要分为以下14种情况:
(1)worker向master注册成功后,详见代码
(2)worker向master发送心跳消息,如果还没有注册到master上,该消息将被忽略,详见代码
(3)worker的工作空间的清理,详见代码
(4)更换master,详见代码
(5)worker注册失败,详见代码
(6)再次连接worker,详见代码
(7)创建executor,详见代码
(8)executor的转态发生改变时,详见代码
(9)kill executor,详见代码
(10)创建driver,详见代码
(11)kill driver,详见代码
(12)driver的状态发生变化时,详见代码
(13)将worker注册到master上,详见代码
(14)app执行完毕,详见代码
worker与master相关的交互为(1)(2)(4)(6)(13)
worker与driver相关的交互为(10)(11)(12)
worker与executor相关的交互为(3)(7)(8)(9)(14),需要说明的是(3)(14)它们的完成都与executor有着密切的联系。
<code>
override def receive: PartialFunction[Any, Unit] = {
//(1)注册成功的Woker
case RegisteredWorker(masterRef, masterWebUiUrl) =>
logInfo("Successfully registered with master " + masterRef.address.toSparkURL)
registered = true
changeMaster(masterRef, masterWebUiUrl)
//守护线程15s发送一次心跳消息
forwordMessageScheduler.scheduleAtFixedRate(new Runnable {
override def run(): Unit = Utils.tryLogNonFatalError {
self.send(SendHeartbeat)
}
}, 0, HEARTBEAT_MILLIS, TimeUnit.MILLISECONDS)
//如果允许清理
if (CLEANUP_ENABLED) {
logInfo(s"Worker cleanup enabled; old application directories will be deleted in: $workDir")
forwordMessageScheduler.scheduleAtFixedRate(new Runnable {
override def run(): Unit = Utils.tryLogNonFatalError {
//守护线程30min清理app文件夹
self.send(WorkDirCleanup)
}
}, CLEANUP_INTERVAL_MILLIS, CLEANUP_INTERVAL_MILLIS, TimeUnit.MILLISECONDS)
}
//(2)worker向master发送心跳消息,如果还没有注册到master上,该消息将被忽略
case SendHeartbeat =>
if (connected) { sendToMaster(Heartbeat(workerId, self)) }
//(3)worker的工作空间的清理
case WorkDirCleanup =>
//为了加快独立将来独立线程的清理工作,不要占用worker rpcEndpoint的端口号,拷贝ids所以它可以被清理线程使用
val appIds = executors.values.map(_.appId).toSet
val cleanupFuture = concurrent.future {
val appDirs = workDir.listFiles()
if (appDirs == null) {
throw new IOException("ERROR: Failed to list files in " + appDirs)
}
appDirs.filter { dir =>
//目录正在被app使用-当清理时检查app是否在运行
val appIdFromDir = dir.getName
val isAppStillRunning = appIds.contains(appIdFromDir)
dir.isDirectory && !isAppStillRunning &&
!Utils.doesDirectoryContainAnyNewFiles(dir, APP_DATA_RETENTION_SECONDS)
}.foreach { dir =>
logInfo(s"Removing directory: ${dir.getPath}")
Utils.deleteRecursively(dir)
}
}(cleanupThreadExecutor)
cleanupFuture.onFailure {
case e: Throwable =>
logError("App dir cleanup failed: " + e.getMessage, e)
}(cleanupThreadExecutor)
//(4)更换master
case MasterChanged(masterRef, masterWebUiUrl) =>
logInfo("Master has changed, new master is at " + masterRef.address.toSparkURL)
changeMaster(masterRef, masterWebUiUrl)
val execs = executors.values.
map(e => new ExecutorDescription(e.appId, e.execId, e.cores, e.state))
masterRef.send(WorkerSchedulerStateResponse(workerId, execs.toList, drivers.keys.toSeq))
//(5)worker注册失败
case RegisterWorkerFailed(message) =>
if (!registered) {
logError("Worker registration failed: " + message)
System.exit(1)
}
//(6)再次连接Worker
case ReconnectWorker(masterUrl) =>
logInfo(s"Master with url $masterUrl requested this worker to reconnect.")
//再次将worker注册到masters上
registerWithMaster()
//(7)创建Executor
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>if (masterUrl != activeMasterUrl) {
logWarning("Invalid Master (" + masterUrl + ") attempted to launch executor.")
} else {
try {
logInfo("Asked to launch executor %s/%d for %s".format(appId, execId, appDesc.name))
//创建executor的工作目录
val executorDir = new File(workDir, appId + "/" + execId)
if (!executorDir.mkdirs()) {
throw new IOException("Failed to create directory " + executorDir)
}
//为executors创建本地目录,通过SPARK_EXECUTOR_DIRS环境变量设置,当app执行完后并删除
val appLocalDirs = appDirectories.get(appId).getOrElse {
Utils.getOrCreateLocalRootDirs(conf).map { dir =>
val appDir = Utils.createDirectory(dir, namePrefix = "executor")
Utils.chmod700(appDir)
appDir.getAbsolutePath()
}.toSeq
}
appDirectories(appId) = appLocalDirs
val manager = new ExecutorRunner(
appId,
execId,
appDesc.copy(command = Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
cores_,
memory_,
self,
workerId,
host,
webUi.boundPort,
publicAddress,
sparkHome,
executorDir,
workerUri,
conf,
appLocalDirs, ExecutorState.LOADING)
executors(appId + "/" + execId) = manager
manager.start()
coresUsed += cores_
memoryUsed += memory_
sendToMaster(ExecutorStateChanged(appId, execId, manager.state, None, None))
} catch {
case e: Exception => {
logError(s"Failed to launch executor $appId/$execId for ${appDesc.name}.", e)
if (executors.contains(appId + "/" + execId)) {
executors(appId + "/" + execId).kill()
executors -= appId + "/" + execId
}
sendToMaster(ExecutorStateChanged(appId, execId, ExecutorState.FAILED,
Some(e.toString), None))
}
}
}
//(8)executor的转态发生改变时
case executorStateChanged @ ExecutorStateChanged(appId, execId, state, message, exitStatus) =>
handleExecutorStateChanged(executorStateChanged)
//(9)kill executor
case KillExecutor(masterUrl, appId, execId) =>
if (masterUrl != activeMasterUrl) {
logWarning("Invalid Master (" + masterUrl + ") attempted to launch executor " + execId)
} else {
val fullId = appId + "/" + execId
executors.get(fullId) match {
case Some(executor) =>
logInfo("Asked to kill executor " + fullId)
executor.kill()
case None =>
logInfo("Asked to kill unknown executor " + fullId)
}
}
//(10)创建Driver
case LaunchDriver(driverId, driverDesc) => {
logInfo(s"Asked to launch driver $driverId")
val driver = new DriverRunner(
conf,
driverId,
workDir,
sparkHome,
driverDesc.copy(command = Worker.maybeUpdateSSLSettings(driverDesc.command, conf)),
self,
workerUri,
securityMgr)
drivers(driverId) = driver
driver.start(
coresUsed += driverDesc.cores
memoryUsed += driverDesc.mem
}
//(11)kill Driver
case KillDriver(driverId) => {
logInfo(s"Asked to kill driver $driverId")
drivers.get(driverId) match {
case Some(runner) =>
runner.kill()
case None =>
logError(s"Asked to kill unknown driver $driverId")
}
}
//(12)driver的状态发生变化时
case driverStateChanged @ DriverStateChanged(driverId, state, exception) => {
handleDriverStateChanged(driverStateChanged)
}
//(13)将worker注册到master上
case ReregisterWithMaster =>
reregisterWithMaster()
//(14)app执行完毕
case ApplicationFinished(id) =>
finishedApps += id
//删除执行完的app在执行过程中创建的本地文件
maybeCleanupApplication(id)
}
</code>
【原】 Spark中Worker源码分析(二)的更多相关文章
- 【原】 Spark中Worker源码分析(一)
Worker作为对于Spark集群的健壮运行起着举足轻重的作用,作为Master的奴隶,每15s向Master告诉自己还活着,一旦主人(Master>有了任务(Application),立马交给 ...
- 【原】Spark中Client源码分析(二)
继续前一篇的内容.前一篇内容为: Spark中Client源码分析(一)http://www.cnblogs.com/yourarebest/p/5313006.html DriverClient中的 ...
- 【原】Spark中Master源码分析(二)
继续上一篇的内容.上一篇的内容为: Spark中Master源码分析(一) http://www.cnblogs.com/yourarebest/p/5312965.html 4.receive方法, ...
- 【原】Spark中Master源码分析(一)
Master作为集群的Manager,对于集群的健壮运行发挥着十分重要的作用.下面,我们一起了解一下Master是听从Client(Leader)的号召,如何管理好Worker的吧. 1.家当(静态属 ...
- Spark中决策树源码分析
1.Example 使用Spark MLlib中决策树分类器API,训练出一个决策树模型,使用Python开发. """ Decision Tree Classifica ...
- 【原】Spark中Client源码分析(一)
在Spark Standalone中我们所谓的Client,它的任务其实是由AppClient和DriverClient共同完成的.AppClient是一个允许app(Client)和Spark集群通 ...
- Spark RPC框架源码分析(二)RPC运行时序
前情提要: Spark RPC框架源码分析(一)简述 一. Spark RPC概述 上一篇我们已经说明了Spark RPC框架的一个简单例子,Spark RPC相关的两个编程模型,Actor模型和Re ...
- Spark Scheduler模块源码分析之DAGScheduler
本文主要结合Spark-1.6.0的源码,对Spark中任务调度模块的执行过程进行分析.Spark Application在遇到Action操作时才会真正的提交任务并进行计算.这时Spark会根据Ac ...
- Spark RPC框架源码分析(三)Spark心跳机制分析
一.Spark心跳概述 前面两节中介绍了Spark RPC的基本知识,以及深入剖析了Spark RPC中一些源码的实现流程. 具体可以看这里: Spark RPC框架源码分析(二)运行时序 Spark ...
随机推荐
- Sublime Text 3使用技巧总结--快捷键及常用插件
1.Goto Anything(快速搜索) |--Ctrl+p 输入|--①文件名 |--②@+函数名 |--③:+数字 ->跳转到相应行 |--④#+变量名 2.多行游标 |--|--Alt+ ...
- Challenge Checkio(python)—初尝python练习网站
最近在找点python语言练习的网站,发现这个网站不错 http://www.checkio.org/ 页面设计的也比较漂亮,比较适合学习python的语法知识.不过注册这个网站 开始就得解决一个py ...
- 第四章 Web表单
4.1 跨站请求伪造保护 安装flask-wtf app = Flask(__name__) app.config['SECRET_KEY'] = 'hard to guess string' 密钥不 ...
- MSSql ID自动增长删除数据重1开始
dbcc checkident('db_Tome1.dbo.员工信息表',reseed,0) 注:dbcc checkident('表名',reseed,0)
- Hive 自定义函数(转)
Hive是一种构建在Hadoop上的数据仓库,Hive把SQL查询转换为一系列在Hadoop集群中运行的MapReduce作业,是MapReduce更高层次的抽象,不用编写具体的MapReduce方法 ...
- 深入浅出分析C#接口的作用
1.C#接口的作用 :C#接口是一个让很多初学C#者容易迷糊的东西,用起来好像很简单,定义接口,里面包含方法,但没有方法具体实现的代码,然后在继承该接口的类里面要实现接口的所有方法的代码,但没有真正认 ...
- 长达半年的苹果发布会:亮点与槽点(iPhone5s,iPhone5c)
不知出于什么原因,今天凌晨召开的苹果发布会并没有视频直播,所以大家都守着The Verge家的图文直播.结果,苹果再一次用事实证明了他们没有保密体系,或者,故意没有保密体系. 整场发布会正经的亮点如下 ...
- sencha touch json store
js: Ext.define('MyApp.store.MyJsonStore', { extend: 'Ext.data.Store', requires: [ 'MyApp.model.Perso ...
- linux环境下验证码不显示的几种情况
linux环境下验证码不显示的几种情况 gd库扩展没有安装. 查看phpinfo(),看看有没有安装gd库 yum安装gd库或者phpize安装 安装完成后记得重启php-fpm bom头的原因 在生 ...
- DataGridView自动行号
最近又用了一下DataGridView控件,需要显示行号,我们知道在.net中DataGridView控件默认是不显示行号(数据的记录行数)的,后来通过查资料发现可以在DataGridView控件的R ...