在前面的sparkContex和RDD都可以看到,真正的计算工作都是同过调用DAGScheduler的runjob方法来实现的。这是一个很重要的类。在看这个类实现之前,需要对actor模式有一点了解:http://en.wikipedia.org/wiki/Actor_model http://www.slideshare.net/YungLinHo/introduction-to-actor-model-and-akka 粗略知道actor模式怎么实现就可以了。另外,应该先看看DAG相关的概念和论文 http://en.wikipedia.org/wiki/Directed_acyclic_graph    http://www.netlib.org/utk/people/JackDongarra/PAPERS/DAGuE_technical_report.pdf 



===========================Job 提交流程======================================
DAGSchedulerEventProcessActor::submitJob   --每个action都会调用到一个submitJob的操作
    -> send: JobSubmitted --它发送一个消息给DAGScheduler(因为提交job的机器可能不是master?)
        -> handleJobSubmitted   --DAGScheduler处理接收到的消息
            -> newStage   --创建一个stage
            -> new ActiveJob   ---找到一个active状态的
            -> [runLocally]}  --如果是简单的job,直接在本地执行。
            localExecutionEnabled && allowLocal && finalStage.parents.isEmpty && partitions.length == 1
            ->runLocally(job)  --don't block the DAGScheduler event loop or other concurrent jobs
                ->runLocallyWithinThread(job)  --创建新的线程执行本地job,不阻塞DAG进程
                    ->TaskContext(job.finalStage.id, job.partitions(0), 0, runningLocally = true)
                    ->result = job.func(taskContext, rdd.iterator(split, taskContext))  执行job
                    ->job.listener.taskSucceeded(0, result)  --通知监听者job结果
                    ->listenerBus.post(SparkListenerJobEnd(job.jobId, jobResult))  --通知job结束
            ->submitStage(finalStage)   -- Submits stage, but first recursively submits any missing parents递归提交
                -> activeJobForStage   --Finds the earliest-created active job that needs the stage。在jobIdToActiveJob找
                -> getMissingParentStages   --如果一个stage依赖于一个shuffle stage,这个RDD就是missing的
                     ->waitingForVisit.push(stage.rdd)
            ->waitingForVisit.pop()
            ->getShuffleMapStage
                ->registerShuffleDependencies 将所有父节点的shuffle注册到shuffleToMapStage和mapOutputTracker
                    ->getAncestorShuffleDependencies :返回一个栈,里面装着包含shuffle的父依赖节点;
                    ->newOrUsedStage  --给RDD创建shuffle stage;如果存在,使用老的loc覆盖新的loc
                        ->mapOutputTracker.getSerializedMapOutputStatuses(shuffleDep.shuffleId) or
                        ->mapOutputTracker.registerShuffle(shuffleDep.shuffleId,rdd.partitions.size)
                    ->shuffleToMapStage(currentShufDep.shuffleId) = stage  --加入DAG的hash属性中
                ->newOrUsedStage -- 给当前RDD创建shuffle stage
                ->shuffleToMapStage(shuffleDep.shuffleId) = stage   --加入DAG的hash属性中
            ->NarrowDependency  ->waitingForVisit.push(narrowDep.rdd) --narrowDeps的不分析,直接加入栈去找它的父节点。
                -> submitMissingTasks  --Called when stage's parents are available and we can now do its task。这个stage没有依赖缺失了。
            -> stage.pendingTasks.clear() 清空正在执行的task。
            -> partitionsToCompute = ? --First figure out the indexes of partition ids to compute. 
                找出需要执行的分片。shuffle要执行更多分片
            ->runningStages += stage  更新running记录
            ->listenerBus.post(SparkListenerStageSubmitted(stage.latestInfo, properties))  --通知应用程序stage被提交。
            ->Broadcasted binary for the task, used to dispatch tasks to executors. serialized copy of the RDD and for each task,
                which means each task gets a different copy of the RDD, This is necessary in Hadoop 
                where the JobConf/Configuration object is not thread-safe
                     ->// For ShuffleMapTask, serialize and broadcast (rdd, shuffleDep).
            ->// For ResultTask, serialize and broadcast (rdd, func).
            ->new ShuffleMapTask(stage.id, taskBinary, part, locs)  创建task
            ->new ResultTask(stage.id, taskBinary, part, locs, id)
            -> Preemptively serialize a task to make sure it can be serialized. For catch exception.
            ->stage.pendingTasks ++= tasks
            ->taskScheduler.submitTasks  --将task提交到taskScheduler
               -> submitStage(parent) --(递归)如果能找到一个stage是missing状态,那就将它的依赖节点submit
======================end=========================================
每个job都有一个DAG调度器,跟踪RDD和Stage的实例化,并寻找一个最优(?)的调度来执行这个job。它提交一个taskSet给TaskScheduler在集群上执行task。
/**
* The high-level scheduling layer that implements stage-oriented scheduling. It computes a DAG of
* stages for each job, keeps track of which RDDs and stage outputs are materialized, and finds a
* minimal schedule to run the job. It then submits stages as TaskSets to an underlying
* TaskScheduler implementation that runs them on the cluster.
*
* In addition to coming up with a DAG of stages, this class also determines the preferred
* locations to run each task on, based on the current cache status, and passes these to the
* low-level TaskScheduler. Furthermore, it handles failures due to shuffle output files being
* lost, in which case old stages may need to be resubmitted. Failures *within* a stage that are
* not caused by shuffle file loss are handled by the TaskScheduler, which will retry each task
* a small number of times before cancelling the whole stage.
*
*/
package org.apache.spark.scheduler

private[spark]
class DAGScheduler(
private[scheduler] val sc: SparkContext,
private[scheduler] val taskScheduler: TaskScheduler,
listenerBus: LiveListenerBus,
mapOutputTracker: MapOutputTrackerMaster,
blockManagerMaster: BlockManagerMaster,
env: SparkEnv,
clock: Clock = SystemClock)
extends Logging {

状态机(actor 消息响应):
private[scheduler] class DAGSchedulerEventProcessActor(dagScheduler: DAGScheduler)
extends Actor with Logging {

override def preStart() {
// set DAGScheduler for taskScheduler to ensure eventProcessActor is always
// valid when the messages arrive
dagScheduler.taskScheduler.setDAGScheduler(dagScheduler)
}

/**
* The main event loop of the DAG scheduler.
*/
def receive = {
case JobSubmitted(jobId, rdd, func, partitions, allowLocal, callSite, listener, properties) =>
dagScheduler.handleJobSubmitted(jobId, rdd, func, partitions, allowLocal, callSite,
listener, properties)

case StageCancelled(stageId) =>
dagScheduler.handleStageCancellation(stageId)

case JobCancelled(jobId) =>
dagScheduler.handleJobCancellation(jobId)

case JobGroupCancelled(groupId) =>
dagScheduler.handleJobGroupCancelled(groupId)

case AllJobsCancelled =>
dagScheduler.doCancelAllJobs()

case ExecutorAdded(execId, host) =>
dagScheduler.handleExecutorAdded(execId, host)

case ExecutorLost(execId) =>
dagScheduler.handleExecutorLost(execId)

case BeginEvent(task, taskInfo) =>
dagScheduler.handleBeginEvent(task, taskInfo)

case GettingResultEvent(taskInfo) =>
dagScheduler.handleGetTaskResult(taskInfo)

case completion @ CompletionEvent(task, reason, _, _, taskInfo, taskMetrics) =>
dagScheduler.handleTaskCompletion(completion)

case TaskSetFailed(taskSet, reason) =>
dagScheduler.handleTaskSetFailed(taskSet, reason)

case ResubmitFailedStages =>
dagScheduler.resubmitFailedStages()
}

重要的属性:
private val nextStageId = new AtomicInteger(0)
private[scheduler] val nextJobId = new AtomicInteger(0)

private[scheduler] val jobIdToStageIds = new HashMap[Int, HashSet[Int]]
private[scheduler] val stageIdToStage = new HashMap[Int, Stage]
private[scheduler] val shuffleToMapStage = new HashMap[Int, Stage]
private[scheduler] val jobIdToActiveJob = new HashMap[Int, ActiveJob]

// Stages we need to run whose parents aren't done
private[scheduler] val waitingStages = new HashSet[Stage]
// Stages we are running right now
private[scheduler] val runningStages = new HashSet[Stage]
// Stages that must be resubmitted due to fetch failures
private[scheduler] val failedStages = new HashSet[Stage]
private[scheduler] val activeJobs = new HashSet[ActiveJob]
// Contains the locations that each RDD's partitions are cached on
private val cacheLocs = new HashMap[Int, Array[Seq[TaskLocation]]]
private val dagSchedulerActorSupervisor =
env.actorSystem.actorOf(Props(new DAGSchedulerActorSupervisor(this)))
// A closure serializer that we reuse.
// This is only safe because DAGScheduler runs in a single thread.
private val closureSerializer = SparkEnv.get.closureSerializer.newInstance()

private[scheduler] var eventProcessActor: ActorRef = _

private[scheduler] def handleJobSubmitted(jobId: Int,
finalRDD: RDD[_],
func: (TaskContext, Iterator[_]) => _,
partitions: Array[Int],
allowLocal: Boolean,
callSite: CallSite,
listener: JobListener,
properties: Properties = null)
{
/** Submits stage, but first recursively submits any missing parents. */
private def submitStage(stage: Stage) {
/** Called when stage's parents are available and we can now do its task. */
private def submitMissingTasks(stage: Stage, jobId: Int) {

/** Finds the earliest-created active job that needs the stage */
// TODO: Probably should actually find among the active jobs that need this
// stage the one with the highest priority (highest-priority pool, earliest created).
// That should take care of at least part of the priority inversion problem with
// cross-job dependencies.
private def activeJobForStage(stage: Stage): Option[Int] = {
val jobsThatUseStage: Array[Int] = stage.jobIds.toArray.sorted
jobsThatUseStage.find(jobIdToActiveJob.contains)
}

/**
* Types of events that can be handled by the DAGScheduler. The DAGScheduler uses an event queue
* architecture where any thread can post an event (e.g. a task finishing or a new job being
* submitted) but there is a single "logic" thread that reads these events and takes decisions.
* This greatly simplifies synchronization.
*/
private[scheduler] sealed trait DAGSchedulerEvent
/**
* Asynchronously passes SparkListenerEvents to registered SparkListeners.
*
* Until start() is called, all posted events are only buffered. Only after this listener bus
* has started will events be actually propagated to all attached listeners. This listener bus
* is stopped when it receives a SparkListenerShutdown event, which is posted using stop().
*/
private[spark] class LiveListenerBus extends SparkListenerBus with Logging {
/**
* A SparkListenerEvent bus that relays events to its listeners
*/
private[spark] trait SparkListenerBus extends Logging {

// SparkListeners attached to this event bus
protected val sparkListeners = new ArrayBuffer[SparkListener]
with mutable.SynchronizedBuffer[SparkListener]

def addListener(listener: SparkListener) {
sparkListeners += listener
}

/**
* Post an event to all attached listeners.
* This does nothing if the event is SparkListenerShutdown.
*/
def postToAll(event: SparkListenerEvent) {

/**
* Apply the given function to all attached listeners, catching and logging any exception.
*/
private def foreachListener(f: SparkListener => Unit): Unit = {
sparkListeners.foreach { listener =>
try {
f(listener)
} catch {
case e: Exception =>
logError(s"Listener ${Utils.getFormattedClassName(listener)} threw an exception", e)
}
}
}

}







spark 笔记 7: DAGScheduler的更多相关文章

  1. spark笔记 环境配置

    spark笔记 spark简介 saprk 有六个核心组件: SparkCore.SparkSQL.SparkStreaming.StructedStreaming.MLlib,Graphx Spar ...

  2. spark 笔记 13: 再看DAGScheduler,stage状态更新流程

    当某个task完成后,某个shuffle Stage X可能已完成,那么就可能会一些仅依赖Stage X的Stage现在可以执行了,所以要有响应task完成的状态更新流程. ============= ...

  3. 大数据学习——spark笔记

    变量的定义 val a: Int = 1 var b = 2 方法和函数 区别:函数可以作为参数传递给方法 方法: def test(arg: Int): Int=>Int ={ 方法体 } v ...

  4. spark 笔记 14: spark中的delay scheduling实现

    延迟调度算法的实现是在TaskSetManager类中的,它通过将task存放在四个不同级别的hash表里,当有可用的资源时,resourceOffer函数的参数之一(maxLocality)就是这些 ...

  5. spark 笔记 8: Stage

    Stage 是一组独立的任务,他们在一个job中执行相同的功能(function),功能的划分是以shuffle为边界的.DAG调度器以拓扑顺序执行同一个Stage中的task. /** * A st ...

  6. spark 笔记 9: Task/TaskContext

    DAGScheduler最终创建了task set,并提交给了taskScheduler.那先得看看task是怎么定义和执行的. Task是execution执行的一个单元. Task: execut ...

  7. spark 笔记 5: SparkContext,SparkConf

    SparkContext 是spark的程序入口,相当于熟悉的'main'函数.它负责链接spark集群.创建RDD.创建累加计数器.创建广播变量. ) scheduler.initialize(ba ...

  8. Spark笔记——技术点汇总

    目录 概况 手工搭建集群 引言 安装Scala 配置文件 启动与测试 应用部署 部署架构 应用程序部署 核心原理 RDD概念 RDD核心组成 RDD依赖关系 DAG图 RDD故障恢复机制 Standa ...

  9. Spark分析之DAGScheduler

    DAGScheduler概述:是一个面向Stage层面的调度器: 主要入参有: dagScheduler.runJob(rdd, cleanedFunc, partitions, callSite, ...

随机推荐

  1. python视频学习笔记4(函数)

    函数中return和print的区别,没有return会默认返回None值 函数定义:所谓**函数**,就是把 **具有独立功能的代码块** 组织为一个小模块,在需要的时候 **调用** 1.函数的步 ...

  2. Delphi 程序调试

  3. 七、Vue Cli+ApiCloud

    一.api.js (参考) 顶部注释: 底部注释: 二.导入 效果: 继续使用: 运行环境:用APP打开,浏览器没有api对象,只有APP运行环境才有API对象 代码如下: <template& ...

  4. 第02章 新手必须掌握的 Linux 命令

  5. JDK8之Stream新特性

    https://www.cnblogs.com/cbxBlog/p/9123106.html /** *JDK8 Stream特性 * Created by chengbx on 2018/5/27. ...

  6. ln: /usr/bin/mysql: Operation not permitted

    一.背景 前段时间装mysql,就遇到了ln: /usr/bin/mysql: Operation not permitted的错误,网上好多方法都过时了,下边是我的解决方法 执行 sudo ln - ...

  7. fedora29 安装mongodb 4.0,6问题记录

    如果运行mongod命令时提示 无加载共享库libcrypto.so.10,那就到页面下载http://www.rpmfind.net/linux/rpm2html/search.php?query= ...

  8. 初识容器和Docker

    什么是Docker? Docker 是一个用于开发,交付和运行应用程序的开放平台.能够就应用程序和基础架构分开,从而可以快速的交付软件. 借助Docker可以和管理应用程序的方式来管理基础架构. 使用 ...

  9. 网络编程之套接字socket

    目录 socket套接字 引子 为何学习socket一定要先学习互联网协议 socket是什么 套接字类型 基于文件类型的套接字家族 基于网络类型的套接字家族 套接字工作流程 基于TCP的套接字 简单 ...

  10. i3wm

    1.音量调节(alsa-utils) alsamixer: alsamixer is a graphical mixer program for the Advanced Linux Sound Ar ...