Spark源码分析之二：Job的调度模型与运行反馈

在《Spark源码分析之Job提交运行总流程概述》一文中，我们提到了，Job提交与运行的第一阶段Stage划分与提交，可以分为三个阶段：

1、Job的调度模型与运行反馈；

2、Stage划分；

3、Stage提交：对应TaskSet的生成。

今天，我们就结合源码来分析下第一个小阶段：Job的调度模型与运行反馈。

首先由DAGScheduler负责将Job提交到事件队列eventProcessLoop中，等待调度执行。入口方法为DAGScheduler的runJon()方法。代码如下：

/**
* Run an action job on the given RDD and pass all the results to the resultHandler function as
* they arrive.
*
* @param rdd target RDD to run tasks on
* @param func a function to run on each partition of the RDD
* @param partitions set of partitions to run on; some jobs may not want to compute on all
* partitions of the target RDD, e.g. for operations like first()
* @param callSite where in the user program this job was called
* @param resultHandler callback to pass each result to
* @param properties scheduler properties to attach to this job, e.g. fair scheduler pool name
*
* @throws Exception when the job fails
*/
def runJob[T, U](
rdd: RDD[T],
func: (TaskContext, Iterator[T]) => U,
partitions: Seq[Int],
callSite: CallSite,
resultHandler: (Int, U) => Unit,
properties: Properties): Unit = {
// 开始时间
val start = System.nanoTime
// 调用submitJob()方法，提交Job，返回JobWaiter
// rdd为最后一个rdd，即target RDD to run tasks on
// func为该rdd上每个分区需要执行的函数，a function to run on each partition of the RDD
// partitions为该rdd上需要执行操作的分区集合，set of partitions to run on
// callSite为用户程序job被调用的地方，where in the user program this job was called
val waiter = submitJob(rdd, func, partitions, callSite, resultHandler, properties)
// JobWaiter调用awaitResult()方法等待结果
waiter.awaitResult() match {
case JobSucceeded => // Job运行成功
logInfo("Job %d finished: %s, took %f s".format
(waiter.jobId, callSite.shortForm, (System.nanoTime - start) / 1e9))
case JobFailed(exception: Exception) =>// Job运行失败
logInfo("Job %d failed: %s, took %f s".format
(waiter.jobId, callSite.shortForm, (System.nanoTime - start) / 1e9))
// SPARK-8644: Include user stack trace in exceptions coming from DAGScheduler.
val callerStackTrace = Thread.currentThread().getStackTrace.tail
exception.setStackTrace(exception.getStackTrace ++ callerStackTrace)
throw exception
}
}

runJob()方法就做了三件事：

首先，获取开始时间，方便最后计算Job执行时间；

其次，调用submitJob()方法，提交Job，返回JobWaiter类型的对象waiter；

最后，waiter调用JobWaiter的awaitResult()方法等待Job运行结果，这个运行结果就俩：JobSucceeded代表成功，JobFailed代表失败。

awaitResult()方法通过轮询标志位_jobFinished，如果为false，则调用this.wait()继续等待，否则说明Job运行完成，返回JobResult，其代码如下：

def awaitResult(): JobResult = synchronized {
// 循环，如果标志位_jobFinished为false，则一直循环，否则退出，返回JobResult
while (!_jobFinished) {
this.wait()
}
return jobResult
}

而这个标志位_jobFinished是在Task运行完成后，如果已完成Task数目等于总Task数目时，或者整个Job运行失败时设置的，随着标志位的设置，Job运行结果jobResult也同步进行设置，代码如下：

// 任务运行完成
override def taskSucceeded(index: Int, result: Any): Unit = synchronized {
if (_jobFinished) {
throw new UnsupportedOperationException("taskSucceeded() called on a finished JobWaiter")
}
resultHandler(index, result.asInstanceOf[T])
finishedTasks += 1
// 已完成Task数目是否等于总Task数目
if (finishedTasks == totalTasks) {
// 设置标志位_jobFinished为ture
_jobFinished = true
// 作业运行结果为成功
jobResult = JobSucceeded
this.notifyAll()
}
}
// 作业失败
override def jobFailed(exception: Exception): Unit = synchronized {
// 设置标志位_jobFinished为ture
_jobFinished = true
// 作业运行结果为失败
jobResult = JobFailed(exception)
this.notifyAll()
}

接下来，看看submitJob()方法，代码定义如下：

/**
* Submit an action job to the scheduler.
*
* @param rdd target RDD to run tasks on
* @param func a function to run on each partition of the RDD
* @param partitions set of partitions to run on; some jobs may not want to compute on all
* partitions of the target RDD, e.g. for operations like first()
* @param callSite where in the user program this job was called
* @param resultHandler callback to pass each result to
* @param properties scheduler properties to attach to this job, e.g. fair scheduler pool name
*
* @return a JobWaiter object that can be used to block until the job finishes executing
* or can be used to cancel the job.
*
* @throws IllegalArgumentException when partitions ids are illegal
*/
def submitJob[T, U](
rdd: RDD[T],
func: (TaskContext, Iterator[T]) => U,
partitions: Seq[Int],
callSite: CallSite,
resultHandler: (Int, U) => Unit,
properties: Properties): JobWaiter[U] = {
// Check to make sure we are not launching a task on a partition that does not exist.
// 检测rdd分区以确保我们不会在一个不存在的partition上launch一个task
val maxPartitions = rdd.partitions.length
partitions.find(p => p >= maxPartitions || p < 0).foreach { p =>
throw new IllegalArgumentException(
"Attempting to access a non-existent partition: " + p + ". " +
"Total number of partitions: " + maxPartitions)
}
// 为Job生成一个jobId，jobId为AtomicInteger类型，getAndIncrement()确保了原子操作性，每次生成后都自增
val jobId = nextJobId.getAndIncrement()
// 如果partitions大小为0，即没有需要执行任务的分区，快速返回
if (partitions.size == 0) {
// Return immediately if the job is running 0 tasks
return new JobWaiter[U](this, jobId, 0, resultHandler)
}
assert(partitions.size > 0)
// func转化下，否则JobSubmitted无法接受这个func参数，T转变为_
val func2 = func.asInstanceOf[(TaskContext, Iterator[_]) => _]
// 创建一个JobWaiter对象
val waiter = new JobWaiter(this, jobId, partitions.size, resultHandler)
// eventProcessLoop加入一个JobSubmitted事件到事件队列中
eventProcessLoop.post(JobSubmitted(
jobId, rdd, func2, partitions.toArray, callSite, waiter,
SerializationUtils.clone(properties)))
// 返回JobWaiter
waiter
}

submitJob()方法一共做了5件事情：

第一，数据检测，检测rdd分区以确保我们不会在一个不存在的partition上launch一个task，并且，如果partitions大小为0，即没有需要执行任务的分区，快速返回；

第二，为Job生成一个jobId，该jobId为AtomicInteger类型，getAndIncrement()确保了原子操作性，每次生成后都自增；

第三，将func转化下，否则JobSubmitted无法接受这个func参数，T转变为_；

第四，创建一个JobWaiter对象waiter，该对象会在方法结束时返回给上层方法，以用来监测Job运行结果；

第五，将一个JobSubmitted事件加入到事件队列eventProcessLoop中，等待工作线程轮询调度（速度很快）。

这里，我们有必要研究下事件队列eventProcessLoop，eventProcessLoop为DAGSchedulerEventProcessLoop类型的，在DAGScheduler初始化时被定义并赋值，代码如下：

// 创建DAGSchedulerEventProcessLoop类型的成员变量eventProcessLoop
private[scheduler] val eventProcessLoop = new DAGSchedulerEventProcessLoop(this)

DAGSchedulerEventProcessLoop继承自EventLoop，我们先来看看这个EventLoop的定义。

/**
* An event loop to receive events from the caller and process all events in the event thread. It
* will start an exclusive event thread to process all events.
* EventLoop用来接收来自调用者的事件并在event thread中除了所有的事件。它将开启一个专门的事件处理线程处理所有的事件。
*
* Note: The event queue will grow indefinitely. So subclasses should make sure `onReceive` can
* handle events in time to avoid the potential OOM.
*/
private[spark] abstract class EventLoop[E](name: String) extends Logging {
// LinkedBlockingDeque类型的事件队列，队列元素为E类型
private val eventQueue: BlockingQueue[E] = new LinkedBlockingDeque[E]()
// 标志位
private val stopped = new AtomicBoolean(false)
// 事件处理线程
private val eventThread = new Thread(name) {
// 设置为后台线程
setDaemon(true)
override def run(): Unit = {
try {
// 如果标志位stopped没有被设置为true，一直循环
while (!stopped.get) {
// 从事件队列中take一条事件
val event = eventQueue.take()
try {
// 调用onReceive()方法进行处理
onReceive(event)
} catch {
case NonFatal(e) => {
try {
onError(e)
} catch {
case NonFatal(e) => logError("Unexpected error in " + name, e)
}
}
}
}
} catch {
case ie: InterruptedException => // exit even if eventQueue is not empty
case NonFatal(e) => logError("Unexpected error in " + name, e)
}
}
}
def start(): Unit = {
if (stopped.get) {
throw new IllegalStateException(name + " has already been stopped")
}
// Call onStart before starting the event thread to make sure it happens before onReceive
onStart()
eventThread.start()
}
def stop(): Unit = {
if (stopped.compareAndSet(false, true)) {
eventThread.interrupt()
var onStopCalled = false
try {
eventThread.join()
// Call onStop after the event thread exits to make sure onReceive happens before onStop
onStopCalled = true
onStop()
} catch {
case ie: InterruptedException =>
Thread.currentThread().interrupt()
if (!onStopCalled) {
// ie is thrown from `eventThread.join()`. Otherwise, we should not call `onStop` since
// it's already called.
onStop()
}
}
} else {
// Keep quiet to allow calling `stop` multiple times.
}
}
/**
* Put the event into the event queue. The event thread will process it later.
* 将事件加入到时间队列。事件线程过会会处理它。
*/
def post(event: E): Unit = {
// 将事件加入到待处理队列
eventQueue.put(event)
}
/**
* Return if the event thread has already been started but not yet stopped.
*/
def isActive: Boolean = eventThread.isAlive
/**
* Invoked when `start()` is called but before the event thread starts.
*/
protected def onStart(): Unit = {}
/**
* Invoked when `stop()` is called and the event thread exits.
*/
protected def onStop(): Unit = {}
/**
* Invoked in the event thread when polling events from the event queue.
*
* Note: Should avoid calling blocking actions in `onReceive`, or the event thread will be blocked
* and cannot process events in time. If you want to call some blocking actions, run them in
* another thread.
*/
protected def onReceive(event: E): Unit
/**
* Invoked if `onReceive` throws any non fatal error. Any non fatal error thrown from `onError`
* will be ignored.
*/
protected def onError(e: Throwable): Unit
}

我们可以看到，EventLoop实际上就是一个任务队列及其对该队列一系列操作的封装。在它内部，首先定义了一个LinkedBlockingDeque类型的事件队列，队列元素为E类型，其中DAGSchedulerEventProcessLoop存储的则是DAGSchedulerEvent类型的事件，代码如下：

// LinkedBlockingDeque类型的事件队列，队列元素为E类型
private val eventQueue: BlockingQueue[E] = new LinkedBlockingDeque[E]()

并提供了一个后台线程，专门对事件队列里的事件进行监控，并调用onReceive()方法进行处理，代码如下：

// 事件处理线程
private val eventThread = new Thread(name) {
// 设置为后台线程
setDaemon(true)
override def run(): Unit = {
try {
// 如果标志位stopped没有被设置为true，一直循环
while (!stopped.get) {
// 从事件队列中take一条事件
val event = eventQueue.take()
try {
// 调用onReceive()方法进行处理
onReceive(event)
} catch {
case NonFatal(e) => {
try {
onError(e)
} catch {
case NonFatal(e) => logError("Unexpected error in " + name, e)
}
}
}
}
} catch {
case ie: InterruptedException => // exit even if eventQueue is not empty
case NonFatal(e) => logError("Unexpected error in " + name, e)
}
}
}

那么如何向队列中添加事件呢？调用其post()方法，传入事件即可。如下：

/**
* Put the event into the event queue. The event thread will process it later.
* 将事件加入到时间队列。事件线程过会会处理它。
*/
def post(event: E): Unit = {
// 将事件加入到待处理队列
eventQueue.put(event)
}

言归正传，上面提到，submitJob()方法利用eventProcessLoop的post()方法加入一个JobSubmitted事件到事件队列中，那么DAGSchedulerEventProcessLoop对于JobSubmitted事件是如何处理的呢？我们看它的onReceive()方法，源码如下：

/**
* The main event loop of the DAG scheduler.
* DAGScheduler中事件主循环
*/
override def onReceive(event: DAGSchedulerEvent): Unit = {
val timerContext = timer.time()
try {
// 调用doOnReceive()方法，将DAGSchedulerEvent类型的event传递进去
doOnReceive(event)
} finally {
timerContext.stop()
}
}

继续看doOnReceive()方法，代码如下：

// 事件处理调度函数
private def doOnReceive(event: DAGSchedulerEvent): Unit = event match {
// 如果是JobSubmitted事件，调用dagScheduler.handleJobSubmitted()方法处理
case JobSubmitted(jobId, rdd, func, partitions, callSite, listener, properties) =>
dagScheduler.handleJobSubmitted(jobId, rdd, func, partitions, callSite, listener, properties)
// 如果是MapStageSubmitted事件，调用dagScheduler.handleMapStageSubmitted()方法处理
case MapStageSubmitted(jobId, dependency, callSite, listener, properties) =>
dagScheduler.handleMapStageSubmitted(jobId, dependency, callSite, listener, properties)
case StageCancelled(stageId) =>
dagScheduler.handleStageCancellation(stageId)
case JobCancelled(jobId) =>
dagScheduler.handleJobCancellation(jobId)
case JobGroupCancelled(groupId) =>
dagScheduler.handleJobGroupCancelled(groupId)
case AllJobsCancelled =>
dagScheduler.doCancelAllJobs()
case ExecutorAdded(execId, host) =>
dagScheduler.handleExecutorAdded(execId, host)
case ExecutorLost(execId) =>
dagScheduler.handleExecutorLost(execId, fetchFailed = false)
case BeginEvent(task, taskInfo) =>
dagScheduler.handleBeginEvent(task, taskInfo)
case GettingResultEvent(taskInfo) =>
dagScheduler.handleGetTaskResult(taskInfo)
case completion @ CompletionEvent(task, reason, _, _, taskInfo, taskMetrics) =>
dagScheduler.handleTaskCompletion(completion)
case TaskSetFailed(taskSet, reason, exception) =>
dagScheduler.handleTaskSetFailed(taskSet, reason, exception)
case ResubmitFailedStages =>
dagScheduler.resubmitFailedStages()
}

对于JobSubmitted事件，我们通过调用DAGScheduler的handleJobSubmitted()方法来处理。

好了，到这里，第一阶段Job的调度模型与运行反馈大体已经分析完了，至于后面的第二、第三阶段，留待后续博文继续分析吧~

博客原地址：http://blog.csdn.net/lipeng_bigdata/article/details/50667966

Spark源码分析之二：Job的调度模型与运行反馈的更多相关文章

spark 源码分析之二十一 -- Task的执行流程
引言在上两篇文章 spark 源码分析之十九 -- DAG的生成和Stage的划分和 spark 源码分析之二十 -- Stage的提交中剖析了Spark的DAG的生成,Stage的划分以及St ...
spark 源码分析之二十二-- Task的内存管理
问题的提出本篇文章将回答如下问题: 1. spark任务在执行的时候,其内存是如何管理的? 2. 堆内内存的寻址是如何设计的?是如何避免由于JVM的GC的存在引起的内存地址变化的?其内部的内存缓存 ...
Spark源码分析之九：内存管理模型
Spark是现在很流行的一个基于内存的分布式计算框架,既然是基于内存,那么自然而然的,内存的管理就是Spark存储管理的重中之重了.那么,Spark究竟采用什么样的内存管理模型呢?本文就为大家揭开Sp ...
Spark源码分析（二）-SparkContext创建
原创文章,转载请注明: 转载自http://www.cnblogs.com/tovin/p/3872785.html SparkContext是应用启动时创建的Spark上下文对象,是一个重要的入口 ...
spark 源码分析之二 -- SparkContext 的初始化过程
创建或使用现有Session 从Spark 2.0 开始,引入了 SparkSession的概念,创建或使用已有的session 代码如下: val spark = SparkSession .bui ...
Spark源码分析之八：Task运行（二）
在<Spark源码分析之七:Task运行(一)>一文中,我们详细叙述了Task运行的整体流程,最终Task被传输到Executor上,启动一个对应的TaskRunner线程,并且在线程池中 ...
Spark源码分析之七：Task运行（一）
在Task调度相关的两篇文章<Spark源码分析之五:Task调度(一)>与<Spark源码分析之六:Task调度(二)>中,我们大致了解了Task调度相关的主要逻辑,并且在T ...
spark 源码分析之四 -- TaskScheduler的创建和启动过程
在 spark 源码分析之二 -- SparkContext 的初始化过程中,第 14 步和 16 步分别描述了 TaskScheduler的初始化和启动过程. 话分两头,先说 TaskSc ...
spark 源码分析之五 -- Spark内置RPC机制剖析之一创建NettyRpcEnv
在前面源码剖析介绍中,spark 源码分析之二 -- SparkContext 的初始化过程中的SparkEnv和 spark 源码分析之四 -- TaskScheduler的创建和启动过程中的C ...

随机推荐

读扇区错误:0柱面0磁头1扇区(硬盘问题，蓝屏等 0x0000007B)
原文发布时间为:2010-05-25 -- 来源于本人的百度文章 [由搬家工具导入] 读扇区错误:0柱面0磁头1扇区(硬盘问题，蓝屏等 0x0000007B) DISKGEN能找到,那就没什么大问题的 ...
使用T4模板创建一个例子
1.创建项目,添加新项,名称处填写Messages.tt,如下图: 添加后,Messages.tt文件内容如下: <#@ template debug="false" hos ...
js判断手机的左右滑动
js代码 $(function() { function judge() { var startx;//让startx在touch事件函数里是全局性变量. var endx; var el = doc ...
Docker（四）：docker的安装
docker在Ubuntu下安装必须满足两个条件: 内核版本必须在3.10以上的版本,而且必须是64位的系统. 在Ubuntu的14.04版本中已经自带docker的安装包了. 首先我是在自己的笔记本 ...
java网络编程（二）
客户端程序: package net; import java.io.OutputStream; import java.net.Socket; /** * Created by hu on 2015 ...
C#图解教程学习笔记——类相关的概念
一.一些基本概念1. 字段:隶属于类的变量,即类的成员变量.2. 方法:隶属于类的函数,即类的成员函数.3. 实例成员:类的每个实例拥有自己的各个类成员的副本,这些成员称为实例成员. 改变一个实例字段 ...
多线程设计模式 - Future模式
Future模式是多线程开发中非常常见的一种设计模式,它的核心思想是异步调用.这类似我们日常生活中的在线购物流程,带在购物网看着一件商品时可以提交表单,当订单完成后就可以在家里等待商品送货上门.或者说 ...
cisco packet 实验教程（二）
06. 三层交换机实现VLAN间路由技术原理 1)三层交换机是带有三层路由功能的交换机,也就是这台交换机的端口既有三层路由功能,也具有二层交换功能.三层交换机端口默认为二层口,如果需要启用三层功能就 ...
通过vSphere-client虚拟化服务器
一.什么是vClientvClient是vSphere的重要组件之一.用于用户连接ESXi或vCenter管理和分配vSphere的各种资源.有vClient和WebvClient两个版本.安装部署了 ...
hdu 2104(判断互素)
hide handkerchief Time Limit: 10000/3000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Other ...

Spark源码分析之二：Job的调度模型与运行反馈

Spark源码分析之二：Job的调度模型与运行反馈的更多相关文章

随机推荐

热门专题