回顾

上一篇,我们分析了了任务在executor端的运行流程,任务运行结束后,在Executor.launchTask方法最后,通过调用execBackend.statusUpdate方法将任务结果以及任务状态发送给driver。回到driver端,我们在driver的rpc服务端DriverEndPoint的receive方法中寻找对StatusUpdate消息的处理逻辑。

DriverEndpoint.receive

case StatusUpdate(executorId, taskId, state, data) =>
// 通知TaskScheduler任务已完成
scheduler.statusUpdate(taskId, state, data.value)
// 如果任务已经运行结束了,包括FINISHED, FAILED, KILLED, LOST这几种状态
// 那么说明任务占用的资源已经释放了,此时就可以回收这部分资源并重新分配任务
if (TaskState.isFinished(state)) {
executorDataMap.get(executorId) match {
case Some(executorInfo) =>
executorInfo.freeCores += scheduler.CPUS_PER_TASK
makeOffers(executorId)
case None =>
// Ignoring the update since we don't know about the executor.
logWarning(s"Ignored task status update ($taskId state $state) " +
s"from unknown executor with ID $executorId")
}
}

所以重点是scheduler.statusUpdate调用

TaskSchedulerImpl.statusUpdate

def statusUpdate(tid: Long, state: TaskState, serializedData: ByteBuffer) {
var failedExecutor: Option[String] = None
var reason: Option[ExecutorLossReason] = None
synchronized {
try {
taskIdToTaskSetManager.get(tid) match {
case Some(taskSet) =>
// 这个状态不明,没看什么地方会产生这个状态
if (state == TaskState.LOST) {
// TaskState.LOST is only used by the deprecated Mesos fine-grained scheduling mode,
// where each executor corresponds to a single task, so mark the executor as failed.
val execId = taskIdToExecutorId.getOrElse(tid, throw new IllegalStateException(
"taskIdToTaskSetManager.contains(tid) <=> taskIdToExecutorId.contains(tid)"))
if (executorIdToRunningTaskIds.contains(execId)) {
reason = Some(
SlaveLost(s"Task $tid was lost, so marking the executor as lost as well."))
removeExecutor(execId, reason.get)
failedExecutor = Some(execId)
}
}
// 任务运行结束,包括这几种状态FINISHED, FAILED, KILLED, LOST
if (TaskState.isFinished(state)) {
// 清除关于这个task的一些簿记量
cleanupTaskState(tid)
// 将这个task从正在运行的task集合中移除
taskSet.removeRunningTask(tid)
if (state == TaskState.FINISHED) {
// 启动一个线程,用来异步地处理任务成功的情况
taskResultGetter.enqueueSuccessfulTask(taskSet, tid, serializedData)
} else if (Set(TaskState.FAILED, TaskState.KILLED, TaskState.LOST).contains(state)) {
taskResultGetter.enqueueFailedTask(taskSet, tid, state, serializedData)
}
}
case None =>
logError(
("Ignoring update with state %s for TID %s because its task set is gone (this is " +
"likely the result of receiving duplicate task finished status updates) or its " +
"executor has been marked as failed.")
.format(state, tid))
}
} catch {
case e: Exception => logError("Exception in statusUpdate", e)
}
}
// Update the DAGScheduler without holding a lock on this, since that can deadlock
if (failedExecutor.isDefined) {
assert(reason.isDefined)
dagScheduler.executorLost(failedExecutor.get, reason.get)
backend.reviveOffers()
}
}

这里,启动了一个异步任务,用来处理任务成功的情况,所以我们分析一下异步任务的处理逻辑。

TaskResultGetter.enqueueSuccessfulTask

def enqueueSuccessfulTask(
taskSetManager: TaskSetManager,
tid: Long,
serializedData: ByteBuffer): Unit = {
// 启动一个异步任务
getTaskResultExecutor.execute(new Runnable {
override def run(): Unit = Utils.logUncaughtExceptions {
try {
// 对传回的结果进行反序列化
val (result, size) = serializer.get().deserialize[TaskResult[_]](serializedData) match {
// 如果是直接传回的结果,那么直接从反序列化的对象中取数据即可
case directResult: DirectTaskResult[_] =>
// 首先检查结果大小是否超过阈值,默认是1g,
// 也即最多能够允许多大的结果放到driver端
if (!taskSetManager.canFetchMoreResults(serializedData.limit())) {
return
}
// deserialize "value" without holding any lock so that it won't block other threads.
// We should call it here, so that when it's called again in
// "TaskSetManager.handleSuccessfulTask", it does not need to deserialize the value.
// 对任务结果进行反序列化,调用该方法不会引起其他线程阻塞,
directResult.value(taskResultSerializer.get())
(directResult, serializedData.limit())
case IndirectTaskResult(blockId, size) =>
// 检查结果大小是否超过限制
if (!taskSetManager.canFetchMoreResults(size)) {
// dropped by executor if size is larger than maxResultSize
// 如果放弃了该任务,那么需要将该任务在blockmanager中对应的block移除掉
sparkEnv.blockManager.master.removeBlock(blockId)
return
}
logDebug("Fetching indirect task result for TID %s".format(tid))
// 这句话最终会通过DAGScheduler给事件总线投递一条TaskGetting的事件
scheduler.handleTaskGettingResult(taskSetManager, tid)
// 通过blockManager远程拉取结果数据
// 而这个blockId对应的块的位置信息已经在之前由executor端传回
val serializedTaskResult = sparkEnv.blockManager.getRemoteBytes(blockId)
if (!serializedTaskResult.isDefined) {
/* We won't be able to get the task result if the machine that ran the task failed
* between when the task ended and when we tried to fetch the result, or if the
* block manager had to flush the result. */
// 这里拉取数据失败分为两种情况:一种是由于任务序列化后体积太大主动丢弃
// 另一种是executor节点网络异常,导致拉取失败
// 这两种情况都算作任务失败
// 这个方法主要是对失败的任务重新运行
scheduler.handleFailedTask(
taskSetManager, tid, TaskState.FINISHED, TaskResultLost)
return
}
// 将从blockManager拉取到的数据进行反序列化
val deserializedResult = serializer.get().deserialize[DirectTaskResult[_]](
serializedTaskResult.get.toByteBuffer)
// force deserialization of referenced value
// 对任务结果进行反序列化
deserializedResult.value(taskResultSerializer.get())
// 将block移除,因为数据已经拉取到了
sparkEnv.blockManager.master.removeBlock(blockId)
(deserializedResult, size)
} // Set the task result size in the accumulator updates received from the executors.
// We need to do this here on the driver because if we did this on the executors then
// we would have to serialize the result again after updating the size.
// 处理累加器,主要是对任务结果大小的统计量需要做特殊处理
result.accumUpdates = result.accumUpdates.map { a =>
// 对于任务结果大小的统计量需要做特殊处理
if (a.name == Some(InternalAccumulator.RESULT_SIZE)) {
val acc = a.asInstanceOf[LongAccumulator]
assert(acc.sum == 0L, "task result size should not have been set on the executors")
acc.setValue(size.toLong)
acc
} else {
a
}
} // 将反序列化好的结果数据告诉TaskSchedulerImpl做进一步处理
scheduler.handleSuccessfulTask(taskSetManager, tid, result)
} catch {
case cnf: ClassNotFoundException =>
val loader = Thread.currentThread.getContextClassLoader
taskSetManager.abort("ClassNotFound with classloader: " + loader)
// Matching NonFatal so we don't catch the ControlThrowable from the "return" above.
case NonFatal(ex) =>
logError("Exception while getting task result", ex)
taskSetManager.abort("Exception while getting task result: %s".format(ex))
}
}
})
}

这里会有好几次反序列化,这时因为在executor端对任务结果数据处理时就是经过了好几次序列化,

  • 首先会把任务运行的结果进行序列化,和累加器一起包装成DirectTaskResult对象
  • 然后对DirectTaskResult对象进行序列化
  • 对于结果太大通过blockManager传输的情况,需要封装一个IndirectTaskResult对象
  • 最后还有对IndirectTaskResult对象进行序列化

可以看到在结果传回driver端后,是按照与上面相反的顺序进行反序列化的。

最后拿到任务运行的结果数据以后,将结果数据交给TaskSchedulerImpl做进一步处理。

TaskSchedulerImpl.handleSuccessfulTask

def handleSuccessfulTask(
taskSetManager: TaskSetManager,
tid: Long,
taskResult: DirectTaskResult[_]): Unit = synchronized {
taskSetManager.handleSuccessfulTask(tid, taskResult)
}

TaskSetManager.handleSuccessfulTask

def handleSuccessfulTask(tid: Long, result: DirectTaskResult[_]): Unit = {
// 更新一些簿记量
val info = taskInfos(tid)
val index = info.index
info.markFinished(TaskState.FINISHED, clock.getTimeMillis())
if (speculationEnabled) {
successfulTaskDurations.insert(info.duration)
}
removeRunningTask(tid) // Kill any other attempts for the same task (since those are unnecessary now that one
// attempt completed successfully).
// 对于这个任务的其他运行中的副本,全部都要杀掉,主要是推测执行机制会对同一个任务同时运行多个副本
for (attemptInfo <- taskAttempts(index) if attemptInfo.running) {
logInfo(s"Killing attempt ${attemptInfo.attemptNumber} for task ${attemptInfo.id} " +
s"in stage ${taskSet.id} (TID ${attemptInfo.taskId}) on ${attemptInfo.host} " +
s"as the attempt ${info.attemptNumber} succeeded on ${info.host}")
killedByOtherAttempt(index) = true
// 通过调度后端发送杀死任务的信息
sched.backend.killTask(
attemptInfo.taskId,
attemptInfo.executorId,
interruptThread = true,
reason = "another attempt succeeded")
}
// 检查是不是第一次,如果是第一次才会更新这些簿记量
// 这么做主要是为了防止多个任务副本多次更新造成不一致
if (!successful(index)) {
tasksSuccessful += 1
logInfo(s"Finished task ${info.id} in stage ${taskSet.id} (TID ${info.taskId}) in" +
s" ${info.duration} ms on ${info.host} (executor ${info.executorId})" +
s" ($tasksSuccessful/$numTasks)")
// Mark successful and stop if all the tasks have succeeded.
successful(index) = true
// 如果全部的任务都完成了,就说明这个任务集(stage)完成了
if (tasksSuccessful == numTasks) {
isZombie = true
}
} else {
logInfo("Ignoring task-finished event for " + info.id + " in stage " + taskSet.id +
" because task " + index + " has already completed successfully")
}
// This method is called by "TaskSchedulerImpl.handleSuccessfulTask" which holds the
// "TaskSchedulerImpl" lock until exiting. To avoid the SPARK-7655 issue, we should not
// "deserialize" the value when holding a lock to avoid blocking other threads. So we call
// "result.value()" in "TaskResultGetter.enqueueSuccessfulTask" before reaching here.
// Note: "result.value()" only deserializes the value when it's called at the first time, so
// here "result.value()" just returns the value and won't block other threads.
// 进一步通知DAG调度器做进一步处理,
// 这里可见在任务提交运行是的处理顺序是从DAGScheduler -> TaskScheduler -> SchedulerBackend -> executor
// 而任务运行结束后结果返回处理的顺序则与上面的顺正好反过来。
// 此外,也能看出TaskScheduler也充当了DAGScheduler和SchedulerBackend中间人的角色,传递消息
sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), result.accumUpdates, info)
// 更新一些簿记量
maybeFinishTaskSet()
}

这个方法的主要工作是更新一些簿记量;杀掉其他的任务副本;

然后通知DAGScheduler做进一步处理。

DAGScheduler.handleTaskCompletion

这个方法很长,所以我们把这个方法的主要逻辑做一个总结:

  • 处理累加器。对于ResultTask类型的任务不会进行重复累加,而对于ShuffleMapTask类型的任务则会进行重复累加(推测执行)

  • 首先,向事件总线中投递一个任务结束的事件

  • 针对任务运行成功的情况做处理。如果是ResultTask类型的任务,需要更新一些簿记量,并在整个stage的所有任务完成时将stage标记为完成,并且通知作业监听器;对于ShuffleMapTask类型的任务处理要复杂一些,同样要更新一些簿记量,并且在mapOutputTracker组件中注册这个任务的输出block信息,如果所有的分区全部完成,那么还要将这个stage标记为完成。

  • 处理拉取数据失败的情况。除了更新一些簿记量,主要做的事就是判断是否要再次提交stage,如果不能再次提交(冲提交次数超过阈值)那么就需要将关联的job取消掉,否则再次提交这个stage。这里需要注意的是,再次提交stage并不会把所有的任务全部再重新运行一遍,只会把那些因失败而导致没有完成的任务重新提交,通过mapOutputTrackerMaster组件追踪mShuffleMap任务的输出情况。

     private[scheduler] def handleTaskCompletion(event: CompletionEvent) {
    val task = event.task
    val taskId = event.taskInfo.id
    val stageId = task.stageId
    val taskType = Utils.getFormattedClassName(task) // 通知outputCommitCoordinator组件对任务完成的事件做一些处理
    // outputCommitCoordinator组件需要对失败的任务
    outputCommitCoordinator.taskCompleted(
    stageId,
    task.partitionId,
    event.taskInfo.attemptNumber, // this is a task attempt number
    event.reason) if (!stageIdToStage.contains(task.stageId)) {
    // The stage may have already finished when we get this event -- eg. maybe it was a
    // speculative task. It is important that we send the TaskEnd event in any case, so listeners
    // are properly notified and can chose to handle it. For instance, some listeners are
    // doing their own accounting and if they don't get the task end event they think
    // tasks are still running when they really aren't.
    // 在获取这个事件时对应的stage可能已经完成了。比如,当前完成的task可能是一个推测执行的task。
    // 但是,无论如何,我们都有必要向事件总线中投递一个任务结束的事件,
    // 这样才能正确第通知监听器,以使得监听器能够做出正确的处理。
    // 例如有的监听器会对所有完成的任务(包括推测执行)进行计数,如果监听器获取不到任务完成的事件
    // 他们就会认为任务还在运行。
    postTaskEnd(event) // Skip all the actions if the stage has been cancelled.
    // 由于stage在之前已经被处理过了,所以这里直接返回
    return
    } val stage = stageIdToStage(task.stageId) // Make sure the task's accumulators are updated before any other processing happens, so that
    // we can post a task end event before any jobs or stages are updated. The accumulators are
    // only updated in certain cases.
    // 这里应该思考一个问题:既然任务的多个副本可能会同时完成,
    // 那么也就有可能会同时发送任务结束事件,
    // 也就说这个方法可能因为任务的多个副本在同一段时间内完成而被同时执行
    // 那么这里没有加锁,也没有CAS或其他的一些同步措施,这样不会尝试线程不安全问题吗??
    // 答案在于EventLoop类中,这个类处理事件的线程只有一个,
    // 所以实际上所有的事件都是串行执行的,自然也就不会有线程不安全的问题了 // 这一步主要是处理累加器
    event.reason match {

    case Success =>

    task match {

    case rt: ResultTask[_, _] =>

    val resultStage = stage.asInstanceOf[ResultStage]

    resultStage.activeJob match {

    case Some(job) =>

    // Only update the accumulator once for each result task.

    // 对于ResultTask的累加器只计算一次,不会重复计算

    if (!job.finished(rt.outputId)) {

    updateAccumulators(event)

    }

    case None => // Ignore update if task's job has finished.

    }

    case _ =>

    // 对于ShuffleMapTask则不会考虑累加器的重复计数,

    // 也就意味着ShufleMapTask中执行的累加器会重复计数

    updateAccumulators(event)

    }

    case _: ExceptionFailure => updateAccumulators(event)

    case _ =>

    }

    // 向事件总线投递一个任务完成的事件

    postTaskEnd(event)

        // 这一步主要是对作业的一些簿记量的更新维护
    // 如果作业的全部分区都已完成,那么移除掉这个作业
    // 并移除作业内不被其他作业依赖的stage的信息
    event.reason match {
    case Success =>
    task match {
    case rt: ResultTask[_, _] =>
    // Cast to ResultStage here because it's part of the ResultTask
    // TODO Refactor this out to a function that accepts a ResultStage
    val resultStage = stage.asInstanceOf[ResultStage]
    resultStage.activeJob match {
    case Some(job) =>
    if (!job.finished(rt.outputId)) {
    job.finished(rt.outputId) = true
    job.numFinished += 1
    // If the whole job has finished, remove it
    // 如果作业的全部分区都已完成,那么移除掉这个作业
    // 并移除作业内不被其他作业依赖的stage的信息
    if (job.numFinished == job.numPartitions) {
    // 把这个stage标记为已完成
    markStageAsFinished(resultStage)
    // 移除作业内不被其他作业依赖的stage的信息
    cleanupStateForJobAndIndependentStages(job)
    // 向事件总线追踪投递一个作业结束的事件
    listenerBus.post(
    SparkListenerJobEnd(job.jobId, clock.getTimeMillis(), JobSucceeded))
    } // taskSucceeded runs some user code that might throw an exception. Make sure
    // we are resilient against that.
    // 最后,需要调用作业监听器的回调函数,以通知作业监听器
    try {
    job.listener.taskSucceeded(rt.outputId, event.result)
    } catch {
    case e: Exception =>
    // TODO: Perhaps we want to mark the resultStage as failed?
    job.listener.jobFailed(new SparkDriverExecutionException(e))
    }
    }
    case None =>
    logInfo("Ignoring result from " + rt + " because its job has finished")
    } // 处理shuffleMapTask的情况
    case smt: ShuffleMapTask =>
    val shuffleStage = stage.asInstanceOf[ShuffleMapStage]
    val status = event.result.asInstanceOf[MapStatus]
    val execId = status.location.executorId
    logDebug("ShuffleMapTask finished on " + execId)
    if (stageIdToStage(task.stageId).latestInfo.attemptNumber == task.stageAttemptId) {
    // This task was for the currently running attempt of the stage. Since the task
    // completed successfully from the perspective of the TaskSetManager, mark it as
    // no longer pending (the TaskSetManager may consider the task complete even
    // when the output needs to be ignored because the task's epoch is too small below.
    // In this case, when pending partitions is empty, there will still be missing
    // output locations, which will cause the DAGScheduler to resubmit the stage below.)
    // 如果如果task的stageAttemptId与当前最新的stage信息相同,
    // 说明该任务已经完成
    shuffleStage.pendingPartitions -= task.partitionId
    }
    // 如果这个任务的epoch比被标记为失败的epoch要早,那么忽略这次运行结果
    if (failedEpoch.contains(execId) && smt.epoch <= failedEpoch(execId)) {
    logInfo(s"Ignoring possibly bogus $smt completion from executor $execId")
    } else {
    // The epoch of the task is acceptable (i.e., the task was launched after the most
    // recent failure we're aware of for the executor), so mark the task's output as
    // available.
    // 这个任务的epoch被接收,那么在mapOutputTracker组件中将这个任务标记为成功
    // 然后就能通过mapOutputTracker组件获取到这个分区的结果状态了
    mapOutputTracker.registerMapOutput(
    shuffleStage.shuffleDep.shuffleId, smt.partitionId, status)
    // Remove the task's partition from pending partitions. This may have already been
    // done above, but will not have been done yet in cases where the task attempt was
    // from an earlier attempt of the stage (i.e., not the attempt that's currently
    // running). This allows the DAGScheduler to mark the stage as complete when one
    // copy of each task has finished successfully, even if the currently active stage
    // still has tasks running.
    // 同样将这个分区标记为已完成
    shuffleStage.pendingPartitions -= task.partitionId
    } // 如果stage的所有分区都已完成,那么将这个stage标记为已完成
    if (runningStages.contains(shuffleStage) && shuffleStage.pendingPartitions.isEmpty) {
    markStageAsFinished(shuffleStage)
    logInfo("looking for newly runnable stages")
    logInfo("running: " + runningStages)
    logInfo("waiting: " + waitingStages)
    logInfo("failed: " + failedStages) // This call to increment the epoch may not be strictly necessary, but it is retained
    // for now in order to minimize the changes in behavior from an earlier version of the
    // code. This existing behavior of always incrementing the epoch following any
    // successful shuffle map stage completion may have benefits by causing unneeded
    // cached map outputs to be cleaned up earlier on executors. In the future we can
    // consider removing this call, but this will require some extra investigation.
    // See https://github.com/apache/spark/pull/17955/files#r117385673 for more details.
    mapOutputTracker.incrementEpoch() // 清除RDD的分区结果位置缓存
    // 以便在访问缓存是重新从blockManager中或rdd分区结果的位置信息
    clearCacheLocs() if (!shuffleStage.isAvailable) {
    // Some tasks had failed; let's resubmit this shuffleStage.
    // 如果有部分任务失败,那么需要重新提交这个stage
    // TODO: Lower-level scheduler should also deal with this
    logInfo("Resubmitting " + shuffleStage + " (" + shuffleStage.name +
    ") because some of its tasks had failed: " +
    shuffleStage.findMissingPartitions().mkString(", "))
    submitStage(shuffleStage)
    } else {
    // Mark any map-stage jobs waiting on this stage as finished
    // 将所有依赖于这个stage的job标记为运行结束
    if (shuffleStage.mapStageJobs.nonEmpty) {
    val stats = mapOutputTracker.getStatistics(shuffleStage.shuffleDep)
    for (job <- shuffleStage.mapStageJobs) {
    markMapStageJobAsFinished(job, stats)
    }
    }
    // 提价下游的子stage
    submitWaitingChildStages(shuffleStage)
    }
    }
    } //处理重复提交的情况
    case Resubmitted =>
    logInfo("Resubmitted " + task + ", so marking it as still running")
    stage match {
    case sms: ShuffleMapStage =>
    sms.pendingPartitions += task.partitionId case _ =>
    assert(false, "TaskSetManagers should only send Resubmitted task statuses for " +
    "tasks in ShuffleMapStages.")
    } // 处理拉取数据失败的情况
    case FetchFailed(bmAddress, shuffleId, mapId, reduceId, failureMessage) =>
    val failedStage = stageIdToStage(task.stageId)
    val mapStage = shuffleIdToMapStage(shuffleId) // 如果这个任务的attempId与stage最近一次的attemptId不同,
    // 那么忽略这个异常,因为又一次更新的stage的尝试正在运行中
    if (failedStage.latestInfo.attemptNumber != task.stageAttemptId) {
    logInfo(s"Ignoring fetch failure from $task as it's from $failedStage attempt" +
    s" ${task.stageAttemptId} and there is a more recent attempt for that stage " +
    s"(attempt ${failedStage.latestInfo.attemptNumber}) running")
    } else {
    // It is likely that we receive multiple FetchFailed for a single stage (because we have
    // multiple tasks running concurrently on different executors). In that case, it is
    // possible the fetch failure has already been handled by the scheduler.
    // 将这个stage标记为已结束
    if (runningStages.contains(failedStage)) {
    logInfo(s"Marking $failedStage (${failedStage.name}) as failed " +
    s"due to a fetch failure from $mapStage (${mapStage.name})")
    markStageAsFinished(failedStage, Some(failureMessage))
    } else {
    logDebug(s"Received fetch failure from $task, but its from $failedStage which is no " +
    s"longer running")
    } // 把拉取失败的stage的attemptId记录下来
    failedStage.fetchFailedAttemptIds.add(task.stageAttemptId)
    // 如果stage的尝试次数已经超过最大允许值,那么将直接将取消该stage
    val shouldAbortStage =
    failedStage.fetchFailedAttemptIds.size >= maxConsecutiveStageAttempts ||
    disallowStageRetryForTest if (shouldAbortStage) {
    val abortMessage = if (disallowStageRetryForTest) {
    "Fetch failure will not retry stage due to testing config"
    } else {
    s"""$failedStage (${failedStage.name})
    |has failed the maximum allowable number of
    |times: $maxConsecutiveStageAttempts.
    |Most recent failure reason: $failureMessage""".stripMargin.replaceAll("\n", " ")
    }
    // 取消这个stage, 做一些处理
    abortStage(failedStage, abortMessage, None)
    } else { // update failedStages and make sure a ResubmitFailedStages event is enqueued
    // TODO: Cancel running tasks in the failed stage -- cf. SPARK-17064
    val noResubmitEnqueued = !failedStages.contains(failedStage)
    // 将这个stage添加到失败的stage队列中,
    // 这个队列是等待重新提交的stage队列
    failedStages += failedStage
    failedStages += mapStage
    if (noResubmitEnqueued) {
    // We expect one executor failure to trigger many FetchFailures in rapid succession,
    // but all of those task failures can typically be handled by a single resubmission of
    // the failed stage. We avoid flooding the scheduler's event queue with resubmit
    // messages by checking whether a resubmit is already in the event queue for the
    // failed stage. If there is already a resubmit enqueued for a different failed
    // stage, that event would also be sufficient to handle the current failed stage, but
    // producing a resubmit for each failed stage makes debugging and logging a little
    // simpler while not producing an overwhelming number of scheduler events.
    logInfo(
    s"Resubmitting $mapStage (${mapStage.name}) and " +
    s"$failedStage (${failedStage.name}) due to fetch failure"
    )
    // 200毫秒之后给内部的事件处理线程发送一个重新提交stage的事件
    // 以通知DAGSchedduler重新提交失败的stage
    messageScheduler.schedule(
    new Runnable {
    override def run(): Unit = eventProcessLoop.post(ResubmitFailedStages)
    },
    DAGScheduler.RESUBMIT_TIMEOUT,
    TimeUnit.MILLISECONDS
    )
    }
    }
    // Mark the map whose fetch failed as broken in the map stage
    // 从mapOutputTracker中将这个任务的map输出信息移除掉
    if (mapId != -1) {
    mapOutputTracker.unregisterMapOutput(shuffleId, mapId, bmAddress)
    } // TODO: mark the executor as failed only if there were lots of fetch failures on it
    // 将拉取失败的block所在的executor移除掉,通知DriverEndpoint移除
    // 并且在blockManagerMaster中将对应的executor上的所有block信息全部移除
    if (bmAddress != null) {
    val hostToUnregisterOutputs = if (env.blockManager.externalShuffleServiceEnabled &&
    unRegisterOutputOnHostOnFetchFailure) {
    // We had a fetch failure with the external shuffle service, so we
    // assume all shuffle data on the node is bad.
    Some(bmAddress.host)
    } else {
    // Unregister shuffle data just for one executor (we don't have any
    // reason to believe shuffle data has been lost for the entire host).
    None
    }
    removeExecutorAndUnregisterOutputs(
    execId = bmAddress.executorId,
    fileLost = true,
    hostToUnregisterOutputs = hostToUnregisterOutputs,
    maybeEpoch = Some(task.epoch))
    }
    }
    case commitDenied: TaskCommitDenied =>
    // Do nothing here, left up to the TaskScheduler to decide how to handle denied commit
    case exceptionFailure: ExceptionFailure =>
    // Nothing left to do, already handled above for accumulator updates.
    case TaskResultLost =>
    // Do nothing here; the TaskScheduler handles these failures and resubmits the task. case _: ExecutorLostFailure | _: TaskKilled | UnknownReason =>
    // Unrecognized failure - also do nothing. If the task fails repeatedly, the TaskScheduler
    // will abort the job.
    }
    }

spark任务运行完成后在driver端的处理逻辑的更多相关文章

  1. PySpark 的背后原理--在Driver端,通过Py4j实现在Python中调用Java的方法.pyspark.executor 端一个Executor上同时运行多少个Task,就会有多少个对应的pyspark.worker进程。

    PySpark 的背后原理 Spark主要是由Scala语言开发,为了方便和其他系统集成而不引入scala相关依赖,部分实现使用Java语言开发,例如External Shuffle Service等 ...

  2. Spark各个组件的概念,Driver进程

    spark应用涉及的一些基本概念: 1.mater:主要是控制.管理和监督整个spark集群 2.client:客户端,将用应用程序提交,记录着要业务运行逻辑和master通讯. 3.sparkCon ...

  3. Spark Streaming揭秘 Day13 数据安全容错(Driver篇)

    Spark Streaming揭秘 Day13 数据安全容错(Driver篇) 书接上回,首先我们要考虑的是在Driver层面,有哪些东西需要维持状态,只有在需要维持状态的情况下才需要容错,总的来说, ...

  4. Spark的 运行模式详解

    Spark的运行模式是多种多样的,那么在这篇博客中谈一下Spark的运行模式 一:Spark On Local 此种模式下,我们只需要在安装Spark时不进行hadoop和Yarn的环境配置,只要将S ...

  5. 9. Spark Streaming技术内幕 : Receiver在Driver的精妙实现全生命周期彻底研究和思考

        原创文章,转载请注明:转载自 听风居士博客(http://www.cnblogs.com/zhouyf/)       Spark streaming 程序需要不断接收新数据,然后进行业务逻辑 ...

  6. Spark Streaming运行流程及源码解析(一)

    本系列主要描述Spark Streaming的运行流程,然后对每个流程的源码分别进行解析 之前总听同事说Spark源码有多么棒,咱也不知道,就是疯狂点头.今天也来撸一下Spark源码. 对Spark的 ...

  7. 通过案例对 spark streaming 透彻理解三板斧之三:spark streaming运行机制与架构

    本期内容: 1. Spark Streaming Job架构与运行机制 2. Spark Streaming 容错架构与运行机制 事实上时间是不存在的,是由人的感官系统感觉时间的存在而已,是一种虚幻的 ...

  8. [Spark内核] 第35课:打通 Spark 系统运行内幕机制循环流程

    本课主题 打通 Spark 系统运行内幕机制循环流程 引言 通过 DAGScheduelr 面向整个 Job,然后划分成不同的 Stage,Stage 是從后往前划分的,执行的时候是從前往后执行的,每 ...

  9. Spark程序运行常见错误解决方法以及优化

    转载自:http://bigdata.51cto.com/art/201704/536499.htm Spark程序运行常见错误解决方法以及优化 task倾斜原因比较多,网络io,cpu,mem都有可 ...

随机推荐

  1. CodeForces 407E: k-d-sequence

    题目传送门:CF407E. 题意简述: 给定非负整数 \(k,d\) 和一个长度为 \(n\)(\(1\le n\le 2\times 10^5\))的整数序列 \(a\). 求这个序列中最长的一个连 ...

  2. 洛谷P1192-台阶问题(线性递推 扩展斐波那契)

    占坑 先贴上AC代码 回头来补坑 #include <iostream> using namespace std; int n, k; const int mod = 100003; lo ...

  3. LCD裸板编程_框架

    lcd程序框架: 为了让程序更好的扩展,介绍面向对象编程: 比如抽象出lcd_3.5和lcd_4.3的共同点: 当我想使用3.5寸的lcd时,只需让lo指向lcd_3.5_opr即可.同样,当我想使用 ...

  4. 201671030129 周婷 实验十四 团队项目评审&课程学习总结

    项目 内容 这个作业属于哪个课程 软件工程 这个作业的要求在哪里 团队项目评审&课程学习总结 课程学习目标 (1)完成项目验收(2)反思总结课程学习内容 1.对<实验一 软件工程准备&g ...

  5. python windows下获取路径时有中文处理

    在windows中用os,path.abspath(__file__)时有中文路径时,默认是转成非unicode格式 这会导致,在其它模块使用该路径时,会报 utf8' codec can't dec ...

  6. $('xx')[0].files[0]

    ①首先得明白jQuery对象只能使用jQuery对象的属性和方法,JavaScript对象只能使用JavaScript对象的属性和方法: ②files[0]是JavaScript的属性: ③$('xx ...

  7. 前端html页面,手机查看

    在写前端页面中,经常会在浏览器运行HTML页面,从本地文件夹中直接打开的一般都是file协议,当代码中存在http或https的链接时,HTML页面就无法正常打开,为了解决这种情况,需要在在本地开启一 ...

  8. Python面向对象 | 双下方法

    定义:双下方法是特殊方法,他是解释器提供的.由双下划线+方法名+双下划线 .它具有特殊意义的方法,双下方法主要是python源码程序员使用的,我们在开发中尽量不要使用双下方法,但是深入研究双下方法,更 ...

  9. redis缓存, 缓存击穿,缓存雪崩,缓存穿透

    在实际项目中,MySQL数据库服务器有时会位于另外一台主机,需要通过网络来访问数据库:即使应用程序与MySQL数据库在同一个主机中,访问MySQL也涉及到磁盘IO操作(MySQL也有一些数据预读技术, ...

  10. three.js 居中-模型

    api: 代码: <!DOCTYPE html> <html lang="en"> <head> <title>three.js w ...