延迟调度算法的实现是在TaskSetManager类中的,它通过将task存放在四个不同级别的hash表里,当有可用的资源时,resourceOffer函数的参数之一(maxLocality)就是这些资源的最大(或者最优)locality级别,如果存在task满足资源的locality,那从最优级别的hash表。也就是task和excutor都有loclity级别,如果能找到匹配的task,那从匹配的task中找一个最优的task。
  
=====================延迟调度算法=============================
->TaskSetManager::resourceOffer(execId: String, host: String,maxLocality: TaskLocality.TaskLocality): Option[TaskDescription]
->if (maxLocality != TaskLocality.NO_PREF) --如果资源是有locality特征的
->allowedLocality = getAllowedLocalityLevel(curTime) --获取当前taskSet允许执行的locality。getAllowedLocalityLevel随时间而变化
->if (allowedLocality > maxLocality)  --如果资源的locality级别高于taskSet允许的级别
->allowedLocality = maxLocality --那么提升taskSet的级别
->task =  findTask(execId, host, allowedLocality) --根据允许的locality级别去找一个满足要求的task
->从最优的locality级别(process_local)开始找,返回一个满足locolity的task(为最优级别)
->task match case Some((index, taskLocality, speculative)) --找到了一个task
-> val info = new TaskInfo(taskId, index, attemptNum, curTime, execId, host, taskLocality, speculative)
->if (maxLocality != TaskLocality.NO_PREF) // NO_PREF will not affect the variables related to delay scheduling
->currentLocalityIndex = getLocalityIndex(taskLocality) // Update our locality level for delay scheduling
->lastLaunchTime = curTime --更新最近执行task的时间,计算当前locality时需要
->addRunningTask(taskId) --加入执行task中
->logInfo("Starting %s (TID %d, %s, %s, %d bytes)"
->sched.dagScheduler.taskStarted(task, info) --通知调度器有task开始运行
->eventProcessActor ! BeginEvent(task, taskInfo)
->return Some(new TaskDescription(taskId, execId, taskName, index, serializedTask)) --返回task
->case _ => return None --没有满足locality要求的task,返回None
=====================end==================================

myLocalityLevels :记录当前所有有效的locality级别
localityWaits :记录不同locality级别的等待时间
currentLocalityIndex :当前的locality级别,随着等待时间而不断变化
pendingTasksForExecutor: PROCESS_LOCAL进程级别的task
pendingTasksForHost :NODE_LOCAL主机界别的task
pendingTasksForRack :机架级别的task
pendingTasksWithNoPrefs :没有locality要求的task
// Figure out which locality levels we have in our TaskSet, so we can do delay scheduling
var myLocalityLevels = computeValidLocalityLevels()
var localityWaits = myLocalityLevels.map(getLocalityWait) // Time to wait at each level
// Delay scheduling variables: we keep track of our current locality level and the time we
// last launched a task at that level, and move up a level when localityWaits[curLevel] expires.
// We then move down if we manage to launch a "more local" task.
var currentLocalityIndex = 0 // Index of our current locality level in validLocalityLevels
// Set of pending tasks for each executor. These collections are actually
// treated as stacks, in which new tasks are added to the end of the
// ArrayBuffer and removed from the end. This makes it faster to detect
// tasks that repeatedly fail because whenever a task failed, it is put
// back at the head of the stack. They are also only cleaned up lazily;
// when a task is launched, it remains in all the pending lists except
// the one that it was launched from, but gets removed from them later.
private val pendingTasksForExecutor = new HashMap[String, ArrayBuffer[Int]]

// Set of pending tasks for each host. Similar to pendingTasksForExecutor,
// but at host level.
private val pendingTasksForHost = new HashMap[String, ArrayBuffer[Int]]

// Set of pending tasks for each rack -- similar to the above.
private val pendingTasksForRack = new HashMap[String, ArrayBuffer[Int]]

// Set containing pending tasks with no locality preferences.
var pendingTasksWithNoPrefs = new ArrayBuffer[Int]
计算当前调度器中有效的locality级别
var lastLaunchTime = clock.getTime()  // Time we last launched a task at this level/**
* Compute the locality levels used in this TaskSet. Assumes that all tasks have already been
* added to queues using addPendingTask.
*
*/
private def computeValidLocalityLevels(): Array[TaskLocality.TaskLocality] = {
import TaskLocality.{PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY}
val levels = new ArrayBuffer[TaskLocality.TaskLocality]
if (!pendingTasksForExecutor.isEmpty && getLocalityWait(PROCESS_LOCAL) != 0 &&
pendingTasksForExecutor.keySet.exists(sched.isExecutorAlive(_))) {
levels += PROCESS_LOCAL
}
if (!pendingTasksForHost.isEmpty && getLocalityWait(NODE_LOCAL) != 0 &&
pendingTasksForHost.keySet.exists(sched.hasExecutorsAliveOnHost(_))) {
levels += NODE_LOCAL
}
if (!pendingTasksWithNoPrefs.isEmpty) {
levels += NO_PREF
}
if (!pendingTasksForRack.isEmpty && getLocalityWait(RACK_LOCAL) != 0 &&
pendingTasksForRack.keySet.exists(sched.hasHostAliveOnRack(_))) {
levels += RACK_LOCAL
}
levels += ANY
logDebug("Valid locality levels for " + taskSet + ": " + levels.mkString(", "))
levels.toArray
}
获取每个locality级别的等待时间
private def getLocalityWait(level: TaskLocality.TaskLocality): Long = {
val defaultWait = conf.get("spark.locality.wait", "3000")
level match {
case TaskLocality.PROCESS_LOCAL =>
conf.get("spark.locality.wait.process", defaultWait).toLong
case TaskLocality.NODE_LOCAL =>
conf.get("spark.locality.wait.node", defaultWait).toLong
case TaskLocality.RACK_LOCAL =>
conf.get("spark.locality.wait.rack", defaultWait).toLong
case _ => 0L
}
}
locality的级别定义
@DeveloperApi
object TaskLocality extends Enumeration {
// Process local is expected to be used ONLY within TaskSetManager for now.
val PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY = Value

type TaskLocality = Value

def isAllowed(constraint: TaskLocality, condition: TaskLocality): Boolean = {
condition <= constraint
}
}
根据输入的locality级别,获取一个在本taskSet有效的locality级别。因为当前taskSet可能有一些级别没有task。向低优先级的靠拢的原则。
/**
* Find the index in myLocalityLevels for a given locality. This is also designed to work with
* localities that are not in myLocalityLevels (in case we somehow get those) by returning the
* next-biggest level we have. Uses the fact that the last value in myLocalityLevels is ANY.
*/
def getLocalityIndex(locality: TaskLocality.TaskLocality): Int = {
var index = 0
while (locality > myLocalityLevels(index)) {
index += 1
}
index
}

获取当前允许的locality级别。它通过已经等待的时间和需要等待的时间做比较得到当前处于什么样的locality级别中。
/**
* Get the level we can launch tasks according to delay scheduling, based on current wait time.
*/
private def getAllowedLocalityLevel(curTime: Long): TaskLocality.TaskLocality = {
while (curTime - lastLaunchTime >= localityWaits(currentLocalityIndex) &&
currentLocalityIndex < myLocalityLevels.length - 1)
{
// Jump to the next locality level, and remove our waiting time for the current one since
// we don't want to count it again on the next one
lastLaunchTime += localityWaits(currentLocalityIndex)
currentLocalityIndex += 1
}
myLocalityLevels(currentLocalityIndex)
}
当一个task得到执行后,重新初始化locality级别
def recomputeLocality() {
val previousLocalityLevel = myLocalityLevels(currentLocalityIndex)
myLocalityLevels = computeValidLocalityLevels()
localityWaits = myLocalityLevels.map(getLocalityWait)
currentLocalityIndex = getLocalityIndex(previousLocalityLevel)
}
获取本taskSet有效的locality级别
/**
* Compute the locality levels used in this TaskSet. Assumes that all tasks have already been
* added to queues using addPendingTask.
*
*/
private def computeValidLocalityLevels(): Array[TaskLocality.TaskLocality] = {
import TaskLocality.{PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY}
val levels = new ArrayBuffer[TaskLocality.TaskLocality]
if (!pendingTasksForExecutor.isEmpty && getLocalityWait(PROCESS_LOCAL) != 0 &&
pendingTasksForExecutor.keySet.exists(sched.isExecutorAlive(_))) {
levels += PROCESS_LOCAL
}
if (!pendingTasksForHost.isEmpty && getLocalityWait(NODE_LOCAL) != 0 &&
pendingTasksForHost.keySet.exists(sched.hasExecutorsAliveOnHost(_))) {
levels += NODE_LOCAL
}
if (!pendingTasksWithNoPrefs.isEmpty) {
levels += NO_PREF
}
if (!pendingTasksForRack.isEmpty && getLocalityWait(RACK_LOCAL) != 0 &&
pendingTasksForRack.keySet.exists(sched.hasHostAliveOnRack(_))) {
levels += RACK_LOCAL
}
levels += ANY
logDebug("Valid locality levels for " + taskSet + ": " + levels.mkString(", "))
levels.toArray
}
查找一个可符合locality要求的task。从最优的locality开始找。所以最优的locality总是优先被执行。
/**
* Dequeue a pending task for a given node and return its index and locality level.
* Only search for tasks matching the given locality constraint.
*
* @return An option containing (task index within the task set, locality, is speculative?)
*/
private def findTask(execId: String, host: String, maxLocality: TaskLocality.Value)
: Option[(Int, TaskLocality.Value, Boolean)] =
{
for (index <- findTaskFromList(execId, getPendingTasksForExecutor(execId))) {
return Some((index, TaskLocality.PROCESS_LOCAL, false))
}
。。。
// find a speculative task if all others tasks have been scheduled
findSpeculativeTask(execId, host, maxLocality).map {
case (taskIndex, allowedLocality) => (taskIndex, allowedLocality, true)}
}


spark 笔记 14: spark中的delay scheduling实现的更多相关文章

  1. spark 笔记 3:Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling

    spark论文中说他使用了延迟调度算法,源于这篇论文:http://people.csail.mit.edu/matei/papers/2010/eurosys_delay_scheduling.pd ...

  2. spark 笔记 2: Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing

    http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf  ucb关于spark的论文,对spark中核心组件RDD最原始.本质的理解, ...

  3. Apache Spark 2.2.0 中文文档 - Spark RDD(Resilient Distributed Datasets)论文 | ApacheCN

    Spark RDD(Resilient Distributed Datasets)论文 概要 1: 介绍 2: Resilient Distributed Datasets(RDDs) 2.1 RDD ...

  4. spark学习笔记总结-spark入门资料精化

    Spark学习笔记 Spark简介 spark 可以很容易和yarn结合,直接调用HDFS.Hbase上面的数据,和hadoop结合.配置很容易. spark发展迅猛,框架比hadoop更加灵活实用. ...

  5. Apache Spark 2.2.0 中文文档 - Spark Streaming 编程指南 | ApacheCN

    Spark Streaming 编程指南 概述 一个入门示例 基础概念 依赖 初始化 StreamingContext Discretized Streams (DStreams)(离散化流) Inp ...

  6. Apache Spark 2.2.0 中文文档

    Apache Spark 2.2.0 中文文档 - 快速入门 | ApacheCN Geekhoo 关注 2017.09.20 13:55* 字数 2062 阅读 13评论 0喜欢 1 快速入门 使用 ...

  7. spark 笔记 7: DAGScheduler

    在前面的sparkContex和RDD都可以看到,真正的计算工作都是同过调用DAGScheduler的runjob方法来实现的.这是一个很重要的类.在看这个类实现之前,需要对actor模式有一点了解: ...

  8. Apache Spark 2.2.0 中文文档 - Spark Streaming 编程指南

    Spark Streaming 编程指南 概述 一个入门示例 基础概念 依赖 初始化 StreamingContext Discretized Streams (DStreams)(离散化流) Inp ...

  9. 二、spark入门之spark shell:文本中发现5个最常用的word

    scala> val textFile = sc.textFile("/Users/admin/spark-1.5.1-bin-hadoop2.4/README.md") s ...

随机推荐

  1. arcgisJs之底图切换插件

    arcgisJs之底图切换插件 底图切换插件在arcgis中有两种表现,如下: 1.两张底图切换 2.多张底图切换 一.两张地图切换 let basemapToggle = new BasemapTo ...

  2. 4.(基础)tornado应用安全与认证

    这一节我们介绍应用安全与认证,其实中间省略了一个数据库.对于tornado来说,读取数据库的数据,性能的瓶颈还是在数据库上面.关于数据库,我在<>中介绍了sqlalchemy,这是一个工业 ...

  3. Linux系统吃“内存”现象

    而当我们使用free命令查看Linux系统内存使用情况时,会发现内存使用一直处于较高的水平,即使此时系统并没有运行多少软件.这正是Windows和Linux在内存管理上的区别,乍一看,Linux系统吃 ...

  4. zabbix-server、proxy、agent的分布式部署步骤

    1.准备工作 关闭防火墙和SELinux防火墙,因为他们会限制一些访问权限,如果服务器不能关闭就需要手动设置规则,这里测试用就直接关闭了 service firewalld stop; setenfo ...

  5. maven地址配置为阿里maven仓库,附ali maven官方指南链接

    一.参考链接 官方指南 链接地址:https://help.aliyun.com/document_detail/102512.html?spm=a2c40.aliyun_maven_repo.0.0 ...

  6. MongoDB的优势应用场景和配置

    一:MongoDB的简介: MongoDB是一个基于分布式文件存储的数据库.由C++语言编写.旨在为WEB应用提供可扩展的高性能数据存储解决方案 MongoDB是一个介于关系数据库和非关系数据库之间的 ...

  7. SELinux 对nginx访问目录的影响

    centos新装的系统,用yum 安装的nginx . 因用yum 安装的nginx 默认目录在/usr下面. 当SELinux开启时,将会禁止访问设置在其他路径下的地址.比如我设置server 中 ...

  8. k8sJob控制器

    Job控制器用于调配pod对象运行一次性任务,容器中的进程在正常运行结束后不会对其进行重启,而是将pod对象置于completed状态.若容器中的进程因错误而终止,则需要依据配置确定重启与否,未运行完 ...

  9. uniapp引用iconfont图标

    不得不说uniapp引入iconfont确实比较坑.下面方法可行: 引入方法: 1.在confont官网找好图标,然后点击复制代码, 2.点击红圈的复制代码后在网页打开,出现如下: 3.然后再unia ...

  10. linux运维、架构之路-MongoDB单机部署

    一.MongoDB介绍 MongoDB 是一个基于分布式文件存储的数据库.由 C++ 语言编写.旨在为 WEB 应用提供可扩展的高性能数据存储解决方案. MongoDB 是一个介于关系型数据库和非关系 ...