在前面的sparkContex和RDD都可以看到,真正的计算工作都是同过调用DAGScheduler的runjob方法来实现的。这是一个很重要的类。在看这个类实现之前,需要对actor模式有一点了解:http://en.wikipedia.org/wiki/Actor_model http://www.slideshare.net/YungLinHo/introduction-to-actor-model-and-akka 粗略知道actor模式怎么实现就可以了。另外,应该先看看DAG相关的概念和论文 http://en.wikipedia.org/wiki/Directed_acyclic_graph    http://www.netlib.org/utk/people/JackDongarra/PAPERS/DAGuE_technical_report.pdf 



===========================Job 提交流程======================================
DAGSchedulerEventProcessActor::submitJob   --每个action都会调用到一个submitJob的操作
    -> send: JobSubmitted --它发送一个消息给DAGScheduler(因为提交job的机器可能不是master?)
        -> handleJobSubmitted   --DAGScheduler处理接收到的消息
            -> newStage   --创建一个stage
            -> new ActiveJob   ---找到一个active状态的
            -> [runLocally]}  --如果是简单的job,直接在本地执行。
            localExecutionEnabled && allowLocal && finalStage.parents.isEmpty && partitions.length == 1
            ->runLocally(job)  --don't block the DAGScheduler event loop or other concurrent jobs
                ->runLocallyWithinThread(job)  --创建新的线程执行本地job,不阻塞DAG进程
                    ->TaskContext(job.finalStage.id, job.partitions(0), 0, runningLocally = true)
                    ->result = job.func(taskContext, rdd.iterator(split, taskContext))  执行job
                    ->job.listener.taskSucceeded(0, result)  --通知监听者job结果
                    ->listenerBus.post(SparkListenerJobEnd(job.jobId, jobResult))  --通知job结束
            ->submitStage(finalStage)   -- Submits stage, but first recursively submits any missing parents递归提交
                -> activeJobForStage   --Finds the earliest-created active job that needs the stage。在jobIdToActiveJob找
                -> getMissingParentStages   --如果一个stage依赖于一个shuffle stage,这个RDD就是missing的
                     ->waitingForVisit.push(stage.rdd)
            ->waitingForVisit.pop()
            ->getShuffleMapStage
                ->registerShuffleDependencies 将所有父节点的shuffle注册到shuffleToMapStage和mapOutputTracker
                    ->getAncestorShuffleDependencies :返回一个栈,里面装着包含shuffle的父依赖节点;
                    ->newOrUsedStage  --给RDD创建shuffle stage;如果存在,使用老的loc覆盖新的loc
                        ->mapOutputTracker.getSerializedMapOutputStatuses(shuffleDep.shuffleId) or
                        ->mapOutputTracker.registerShuffle(shuffleDep.shuffleId,rdd.partitions.size)
                    ->shuffleToMapStage(currentShufDep.shuffleId) = stage  --加入DAG的hash属性中
                ->newOrUsedStage -- 给当前RDD创建shuffle stage
                ->shuffleToMapStage(shuffleDep.shuffleId) = stage   --加入DAG的hash属性中
            ->NarrowDependency  ->waitingForVisit.push(narrowDep.rdd) --narrowDeps的不分析,直接加入栈去找它的父节点。
                -> submitMissingTasks  --Called when stage's parents are available and we can now do its task。这个stage没有依赖缺失了。
            -> stage.pendingTasks.clear() 清空正在执行的task。
            -> partitionsToCompute = ? --First figure out the indexes of partition ids to compute. 
                找出需要执行的分片。shuffle要执行更多分片
            ->runningStages += stage  更新running记录
            ->listenerBus.post(SparkListenerStageSubmitted(stage.latestInfo, properties))  --通知应用程序stage被提交。
            ->Broadcasted binary for the task, used to dispatch tasks to executors. serialized copy of the RDD and for each task,
                which means each task gets a different copy of the RDD, This is necessary in Hadoop 
                where the JobConf/Configuration object is not thread-safe
                     ->// For ShuffleMapTask, serialize and broadcast (rdd, shuffleDep).
            ->// For ResultTask, serialize and broadcast (rdd, func).
            ->new ShuffleMapTask(stage.id, taskBinary, part, locs)  创建task
            ->new ResultTask(stage.id, taskBinary, part, locs, id)
            -> Preemptively serialize a task to make sure it can be serialized. For catch exception.
            ->stage.pendingTasks ++= tasks
            ->taskScheduler.submitTasks  --将task提交到taskScheduler
               -> submitStage(parent) --(递归)如果能找到一个stage是missing状态,那就将它的依赖节点submit
======================end=========================================
每个job都有一个DAG调度器,跟踪RDD和Stage的实例化,并寻找一个最优(?)的调度来执行这个job。它提交一个taskSet给TaskScheduler在集群上执行task。
/**
* The high-level scheduling layer that implements stage-oriented scheduling. It computes a DAG of
* stages for each job, keeps track of which RDDs and stage outputs are materialized, and finds a
* minimal schedule to run the job. It then submits stages as TaskSets to an underlying
* TaskScheduler implementation that runs them on the cluster.
*
* In addition to coming up with a DAG of stages, this class also determines the preferred
* locations to run each task on, based on the current cache status, and passes these to the
* low-level TaskScheduler. Furthermore, it handles failures due to shuffle output files being
* lost, in which case old stages may need to be resubmitted. Failures *within* a stage that are
* not caused by shuffle file loss are handled by the TaskScheduler, which will retry each task
* a small number of times before cancelling the whole stage.
*
*/
package org.apache.spark.scheduler

private[spark]
class DAGScheduler(
private[scheduler] val sc: SparkContext,
private[scheduler] val taskScheduler: TaskScheduler,
listenerBus: LiveListenerBus,
mapOutputTracker: MapOutputTrackerMaster,
blockManagerMaster: BlockManagerMaster,
env: SparkEnv,
clock: Clock = SystemClock)
extends Logging {

状态机(actor 消息响应):
private[scheduler] class DAGSchedulerEventProcessActor(dagScheduler: DAGScheduler)
extends Actor with Logging {

override def preStart() {
// set DAGScheduler for taskScheduler to ensure eventProcessActor is always
// valid when the messages arrive
dagScheduler.taskScheduler.setDAGScheduler(dagScheduler)
}

/**
* The main event loop of the DAG scheduler.
*/
def receive = {
case JobSubmitted(jobId, rdd, func, partitions, allowLocal, callSite, listener, properties) =>
dagScheduler.handleJobSubmitted(jobId, rdd, func, partitions, allowLocal, callSite,
listener, properties)

case StageCancelled(stageId) =>
dagScheduler.handleStageCancellation(stageId)

case JobCancelled(jobId) =>
dagScheduler.handleJobCancellation(jobId)

case JobGroupCancelled(groupId) =>
dagScheduler.handleJobGroupCancelled(groupId)

case AllJobsCancelled =>
dagScheduler.doCancelAllJobs()

case ExecutorAdded(execId, host) =>
dagScheduler.handleExecutorAdded(execId, host)

case ExecutorLost(execId) =>
dagScheduler.handleExecutorLost(execId)

case BeginEvent(task, taskInfo) =>
dagScheduler.handleBeginEvent(task, taskInfo)

case GettingResultEvent(taskInfo) =>
dagScheduler.handleGetTaskResult(taskInfo)

case completion @ CompletionEvent(task, reason, _, _, taskInfo, taskMetrics) =>
dagScheduler.handleTaskCompletion(completion)

case TaskSetFailed(taskSet, reason) =>
dagScheduler.handleTaskSetFailed(taskSet, reason)

case ResubmitFailedStages =>
dagScheduler.resubmitFailedStages()
}

重要的属性:
private val nextStageId = new AtomicInteger(0)
private[scheduler] val nextJobId = new AtomicInteger(0)

private[scheduler] val jobIdToStageIds = new HashMap[Int, HashSet[Int]]
private[scheduler] val stageIdToStage = new HashMap[Int, Stage]
private[scheduler] val shuffleToMapStage = new HashMap[Int, Stage]
private[scheduler] val jobIdToActiveJob = new HashMap[Int, ActiveJob]

// Stages we need to run whose parents aren't done
private[scheduler] val waitingStages = new HashSet[Stage]
// Stages we are running right now
private[scheduler] val runningStages = new HashSet[Stage]
// Stages that must be resubmitted due to fetch failures
private[scheduler] val failedStages = new HashSet[Stage]
private[scheduler] val activeJobs = new HashSet[ActiveJob]
// Contains the locations that each RDD's partitions are cached on
private val cacheLocs = new HashMap[Int, Array[Seq[TaskLocation]]]
private val dagSchedulerActorSupervisor =
env.actorSystem.actorOf(Props(new DAGSchedulerActorSupervisor(this)))
// A closure serializer that we reuse.
// This is only safe because DAGScheduler runs in a single thread.
private val closureSerializer = SparkEnv.get.closureSerializer.newInstance()

private[scheduler] var eventProcessActor: ActorRef = _

private[scheduler] def handleJobSubmitted(jobId: Int,
finalRDD: RDD[_],
func: (TaskContext, Iterator[_]) => _,
partitions: Array[Int],
allowLocal: Boolean,
callSite: CallSite,
listener: JobListener,
properties: Properties = null)
{
/** Submits stage, but first recursively submits any missing parents. */
private def submitStage(stage: Stage) {
/** Called when stage's parents are available and we can now do its task. */
private def submitMissingTasks(stage: Stage, jobId: Int) {

/** Finds the earliest-created active job that needs the stage */
// TODO: Probably should actually find among the active jobs that need this
// stage the one with the highest priority (highest-priority pool, earliest created).
// That should take care of at least part of the priority inversion problem with
// cross-job dependencies.
private def activeJobForStage(stage: Stage): Option[Int] = {
val jobsThatUseStage: Array[Int] = stage.jobIds.toArray.sorted
jobsThatUseStage.find(jobIdToActiveJob.contains)
}

/**
* Types of events that can be handled by the DAGScheduler. The DAGScheduler uses an event queue
* architecture where any thread can post an event (e.g. a task finishing or a new job being
* submitted) but there is a single "logic" thread that reads these events and takes decisions.
* This greatly simplifies synchronization.
*/
private[scheduler] sealed trait DAGSchedulerEvent
/**
* Asynchronously passes SparkListenerEvents to registered SparkListeners.
*
* Until start() is called, all posted events are only buffered. Only after this listener bus
* has started will events be actually propagated to all attached listeners. This listener bus
* is stopped when it receives a SparkListenerShutdown event, which is posted using stop().
*/
private[spark] class LiveListenerBus extends SparkListenerBus with Logging {
/**
* A SparkListenerEvent bus that relays events to its listeners
*/
private[spark] trait SparkListenerBus extends Logging {

// SparkListeners attached to this event bus
protected val sparkListeners = new ArrayBuffer[SparkListener]
with mutable.SynchronizedBuffer[SparkListener]

def addListener(listener: SparkListener) {
sparkListeners += listener
}

/**
* Post an event to all attached listeners.
* This does nothing if the event is SparkListenerShutdown.
*/
def postToAll(event: SparkListenerEvent) {

/**
* Apply the given function to all attached listeners, catching and logging any exception.
*/
private def foreachListener(f: SparkListener => Unit): Unit = {
sparkListeners.foreach { listener =>
try {
f(listener)
} catch {
case e: Exception =>
logError(s"Listener ${Utils.getFormattedClassName(listener)} threw an exception", e)
}
}
}

}







spark 笔记 7: DAGScheduler的更多相关文章

  1. spark笔记 环境配置

    spark笔记 spark简介 saprk 有六个核心组件: SparkCore.SparkSQL.SparkStreaming.StructedStreaming.MLlib,Graphx Spar ...

  2. spark 笔记 13: 再看DAGScheduler,stage状态更新流程

    当某个task完成后,某个shuffle Stage X可能已完成,那么就可能会一些仅依赖Stage X的Stage现在可以执行了,所以要有响应task完成的状态更新流程. ============= ...

  3. 大数据学习——spark笔记

    变量的定义 val a: Int = 1 var b = 2 方法和函数 区别:函数可以作为参数传递给方法 方法: def test(arg: Int): Int=>Int ={ 方法体 } v ...

  4. spark 笔记 14: spark中的delay scheduling实现

    延迟调度算法的实现是在TaskSetManager类中的,它通过将task存放在四个不同级别的hash表里,当有可用的资源时,resourceOffer函数的参数之一(maxLocality)就是这些 ...

  5. spark 笔记 8: Stage

    Stage 是一组独立的任务,他们在一个job中执行相同的功能(function),功能的划分是以shuffle为边界的.DAG调度器以拓扑顺序执行同一个Stage中的task. /** * A st ...

  6. spark 笔记 9: Task/TaskContext

    DAGScheduler最终创建了task set,并提交给了taskScheduler.那先得看看task是怎么定义和执行的. Task是execution执行的一个单元. Task: execut ...

  7. spark 笔记 5: SparkContext,SparkConf

    SparkContext 是spark的程序入口,相当于熟悉的'main'函数.它负责链接spark集群.创建RDD.创建累加计数器.创建广播变量. ) scheduler.initialize(ba ...

  8. Spark笔记——技术点汇总

    目录 概况 手工搭建集群 引言 安装Scala 配置文件 启动与测试 应用部署 部署架构 应用程序部署 核心原理 RDD概念 RDD核心组成 RDD依赖关系 DAG图 RDD故障恢复机制 Standa ...

  9. Spark分析之DAGScheduler

    DAGScheduler概述:是一个面向Stage层面的调度器: 主要入参有: dagScheduler.runJob(rdd, cleanedFunc, partitions, callSite, ...

随机推荐

  1. IAP技术原理

    目录 IAP技术原理 更新记录 IAP与ISP的概念及原理 ISP简介 ISP原理 IAP简介 IAP原理 IAP优势 IAP的设计 1.程序启动流程 2.中断向量表的重定位 3.IAP跳转APP函数 ...

  2. 03 Linux下运行Django项目

    1.安装windows和linux传输文件的工具 pip install lrzsz 提供两个命令 一个是上传一个是下载 rz 接收 直接rz sz 上传 直接sz 或者直接拖拽 2.在线下载资源的命 ...

  3. 帝国cms 通过tags给产品或者新闻进行分类

    1.增加TAGS分类 先找到栏目== >TAGS管理 == > 管理TAGS分类 == >增加分类 2.增加相关的tag标签,也要选好TAGS分类 3.增加自定义标签模板 具体怎么写 ...

  4. 使用HBuilder创建图表

    <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8&quo ...

  5. 超详细思路讲解SQL语句的查询实现,及数据的创建。

    最近一直在看数据库方面的问题,总结了一下SQL语句,这是部分详细的SQL问题,思路讲解: 第一步:创建数据库表,及插入数据信息 --Student(S#,Sname,Sage,Ssex) 学生表 CR ...

  6. SecureCRT 连接 Centos7.0 (NAT模式),且能连接公网。

    1.打开物理主机运行-输入cmd,输入ipconfig,获取物理主机ip地址. ip:192.168.11.138 2.点击网络适配器,选择NAT模式. 3.点击Centos界面左上角-编辑-虚拟网络 ...

  7. 5.flask与数据库

    1.安装postgresql 注意:在flask中,操作数据库还是通过orm调用驱动来操作.sqlalchemy是python下的一款工业级的orm,比Django自带的orm要强大很多,至于什么类型 ...

  8. 2019-2020-1 20199319《Linux内核原理与分析》第九周作业

    进程的切换和系统的一般执行过程 进程调度的时机 1.中断:起到切出进程指令流的作用.中断处理程序是与进程无关的内核指令流.中断类型: 硬中断:可屏蔽中断和不可屏蔽中断.高电平说明有中断请求. 软中断/ ...

  9. python-迭代器与生成器1

    python-迭代器与生成器1 迭代器与生成器列表的定义列表生成式:作用使代码更加简洁通过列表生成式,我们可以直接创建一个列表.但是,受到内存限制,列表容量肯定是有限的.而且,创建一个包含100万个元 ...

  10. constant read 和 current read

    来自网络,并且在本机实验完成: onsistent read :我的理解,就是通过scn来读取.  读取的过程中要保证 scn是一致的.举个例子,一个SELECT 语句在SCN=100的时刻开始读取一 ...