对一般小公司来说 可能yarn调度能力足够了 但是对于大规模集群1000 or 2000+的话  yarn的调度性能捉襟见肘

恰好网上看到一篇很好的文章https://tech.meituan.com/2019/08/01/hadoop-yarn-scheduling-performance-optimization-practice.html

参考了YARN-5969 发现hadoop2.9.0已经修正了该issue 实测提高了调度性能

FairScheduler 调度方式有两种

心跳调度:Yarn的NodeManager会通过心跳的方式定期向ResourceManager汇报自身状态 伴随着这次rpc请求 会触发Resourcemanager 触发nodeUpdate()方法 为这个节点进行一次资源调度

持续调度:有一个固定守护线程每隔很短的时间调度 实时的资源分配,与NodeManager的心跳出发的调度相互异步并行进行

  • 每次dataNode 发来心跳 时候作为一个event走下面方法
  1. FairScheduler
  1. @Override
  2. public void handle(SchedulerEvent event) {
  3. switch (event.getType()) {
  4. case NODE_ADDED:
  5. if (!(event instanceof NodeAddedSchedulerEvent)) {
  6. throw new RuntimeException("Unexpected event type: " + event);
  7. }
  8. NodeAddedSchedulerEvent nodeAddedEvent = (NodeAddedSchedulerEvent)event;
  9. addNode(nodeAddedEvent.getContainerReports(),
  10. nodeAddedEvent.getAddedRMNode());
  11. break;
  12. case NODE_REMOVED:
  13. if (!(event instanceof NodeRemovedSchedulerEvent)) {
  14. throw new RuntimeException("Unexpected event type: " + event);
  15. }
  16. NodeRemovedSchedulerEvent nodeRemovedEvent = (NodeRemovedSchedulerEvent)event;
  17. removeNode(nodeRemovedEvent.getRemovedRMNode());
  18. break;
  19. case NODE_UPDATE:
  20. if (!(event instanceof NodeUpdateSchedulerEvent)) {
  21. throw new RuntimeException("Unexpected event type: " + event);
  22. }
  23. NodeUpdateSchedulerEvent nodeUpdatedEvent = (NodeUpdateSchedulerEvent)event;
  24. nodeUpdate(nodeUpdatedEvent.getRMNode());
  25. break;
  26. case APP_ADDED:
  27. if (!(event instanceof AppAddedSchedulerEvent)) {
  28. throw new RuntimeException("Unexpected event type: " + event);
  29. }
  30. AppAddedSchedulerEvent appAddedEvent = (AppAddedSchedulerEvent) event;

每次nodeUpdate 走的都是相同的逻辑

  1. attemptScheduling(node) 持续调度跟心跳调度都走该方法
  1. // If the node is decommissioning, send an update to have the total
  2. // resource equal to the used resource, so no available resource to
  3. // schedule.
  4. if (nm.getState() == NodeState.DECOMMISSIONING) {
  5. this.rmContext
  6. .getDispatcher()
  7. .getEventHandler()
  8. .handle(
  9. new RMNodeResourceUpdateEvent(nm.getNodeID(), ResourceOption
  10. .newInstance(getSchedulerNode(nm.getNodeID())
  11. .getUsedResource(), 0)));
  12. }
  13.  
  14. if (continuousSchedulingEnabled) {
  15. if (!completedContainers.isEmpty()) { //持续调度开启时
  16. attemptScheduling(node);
  17. }
  18. } else {
  19. attemptScheduling(node); //心跳调度
  20. }
  21.  
  22. // Updating node resource utilization
  23. node.setAggregatedContainersUtilization(
  24. nm.getAggregatedContainersUtilization());
  25. node.setNodeUtilization(nm.getNodeUtilization());

持续调度是一个单独的守护线程

  1. 间隔getContinuousSchedulingSleepMs()时间运行一次continuousSchedulingAttempt方法
  1.  
  1. /**
    * Thread which attempts scheduling resources continuously,
    * asynchronous to the node heartbeats.
    */
    private class ContinuousSchedulingThread extends Thread {
  2.  
  3. @Override
    public void run() {
    while (!Thread.currentThread().isInterrupted()) {
    try {
    continuousSchedulingAttempt();
    Thread.sleep(getContinuousSchedulingSleepMs());
    } catch (InterruptedException e) {
    LOG.warn("Continuous scheduling thread interrupted. Exiting.", e);
    return;
    }
    }
    }
    }
  1.  

之后进行一次node节点 根据资源宽松情况的排序

  1. void continuousSchedulingAttempt() throws InterruptedException {
  2. long start = getClock().getTime();
  3. List<NodeId> nodeIdList = new ArrayList<NodeId>(nodes.keySet());
  4. // Sort the nodes by space available on them, so that we offer
  5. // containers on emptier nodes first, facilitating an even spread. This
  6. // requires holding the scheduler lock, so that the space available on a
  7. // node doesn't change during the sort.
  8. synchronized (this) {
  9. Collections.sort(nodeIdList, nodeAvailableResourceComparator);
  10. }
  11.  
  12. // iterate all nodes
  13. for (NodeId nodeId : nodeIdList) {
  14. FSSchedulerNode node = getFSSchedulerNode(nodeId);
  15. try {
  16. if (node != null && Resources.fitsIn(minimumAllocation,
  17. node.getAvailableResource())) {
  18. attemptScheduling(node);
  19. }
  20. } catch (Throwable ex) {
  21. LOG.error("Error while attempting scheduling for node " + node +
  22. ": " + ex.toString(), ex);
  23. if ((ex instanceof YarnRuntimeException) &&
  24. (ex.getCause() instanceof InterruptedException)) {
  25. // AsyncDispatcher translates InterruptedException to
  26. // YarnRuntimeException with cause InterruptedException.
  27. // Need to throw InterruptedException to stop schedulingThread.
  28. throw (InterruptedException)ex.getCause();
  29. }
  30. }
  31. }

依次对node遍历分配Container

  1. queueMgr.getRootQueue().assignContainer(node) root遍历树 对抽象的应用资源遍历
  1. boolean validReservation = false;
  2. FSAppAttempt reservedAppSchedulable = node.getReservedAppSchedulable();
  3. if (reservedAppSchedulable != null) {
  4. validReservation = reservedAppSchedulable.assignReservedContainer(node);
  5. }
  6. if (!validReservation) {
  7. // No reservation, schedule at queue which is farthest below fair share
  8. int assignedContainers = 0;
  9. Resource assignedResource = Resources.clone(Resources.none());
  10. Resource maxResourcesToAssign =
  11. Resources.multiply(node.getAvailableResource(), 0.5f);
  12. while (node.getReservedContainer() == null) {
  13. boolean assignedContainer = false;
  14. Resource assignment = queueMgr.getRootQueue().assignContainer(node);
  15. if (!assignment.equals(Resources.none())) { //判断是否分配到container
  16. assignedContainers++;
  17. assignedContainer = true;
  18. Resources.addTo(assignedResource, assignment);
  19. }
  20. if (!assignedContainer) { break; }
  21. if (!shouldContinueAssigning(assignedContainers,
  22. maxResourcesToAssign, assignedResource)) {
  23. break;
  24. }
  25. }
  1. 接下来在assignContainer 方法中对子队列使用特定的比较器排序这里是fairSchduler
  1. @Override
  2. public Resource assignContainer(FSSchedulerNode node) { 对于每一个服务器,对资源树进行一次递归搜索
  3. Resource assigned = Resources.none();
  4.  
  5. // If this queue is over its limit, reject
  6. if (!assignContainerPreCheck(node)) {
  7. return assigned;
  8. }
  9.  
  10. // Hold the write lock when sorting childQueues
  11. writeLock.lock();
  12. try {
  13. Collections.sort(childQueues, policy.getComparator());
  14. } finally {
  15. writeLock.unlock();
  16. }

对队列下的app排序

  1. /*
  2. * We are releasing the lock between the sort and iteration of the
  3. * "sorted" list. There could be changes to the list here:
  4. * 1. Add a child queue to the end of the list, this doesn't affect
  5. * container assignment.
  6. * 2. Remove a child queue, this is probably good to take care of so we
  7. * don't assign to a queue that is going to be removed shortly.
  8. */
  9. readLock.lock();
  10. try {
  11. for (FSQueue child : childQueues) {
  12. assigned = child.assignContainer(node);
  13. if (!Resources.equals(assigned, Resources.none())) {
  14. break;
  15. }
  16. }
  17. } finally {
  18. readLock.unlock();
  19. }
  20. return assigned;
  1. assignContainer 可能传入的是app 可能传入的是一个队列 是队列的话 进行递归 直到找到app为止(root(FSParentQueue)节点递归调用assignContainer(),最终将到达最终叶子节点的assignContainer()方法,才真正开始进行分配)

优化一 : 优化队列比较器

我们在这里 关注的就是排序

hadoop2.8.4 排序类 FairSharePolicy中的 根据权重 需求的资源大小 和内存占比 进行排序 多次获取

  1. getResourceUsage() 产生了大量重复计算 这个方法是一个动态获取的过程(耗时)
  1. @Override
    public int compare(Schedulable s1, Schedulable s2) {
    double minShareRatio1, minShareRatio2;
    double useToWeightRatio1, useToWeightRatio2;
    Resource minShare1 = Resources.min(RESOURCE_CALCULATOR, null,
    s1.getMinShare(), s1.getDemand());
    Resource minShare2 = Resources.min(RESOURCE_CALCULATOR, null,
    s2.getMinShare(), s2.getDemand());
    boolean s1Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,
    s1.getResourceUsage(), minShare1);
    boolean s2Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,
    s2.getResourceUsage(), minShare2);
    minShareRatio1 = (double) s1.getResourceUsage().getMemorySize()
    / Resources.max(RESOURCE_CALCULATOR, null, minShare1, ONE).getMemorySize();
    minShareRatio2 = (double) s2.getResourceUsage().getMemorySize()
    / Resources.max(RESOURCE_CALCULATOR, null, minShare2, ONE).getMemorySize();
    useToWeightRatio1 = s1.getResourceUsage().getMemorySize() /
    s1.getWeights().getWeight(ResourceType.MEMORY);
    useToWeightRatio2 = s2.getResourceUsage().getMemorySize() /
    s2.getWeights().getWeight(ResourceType.MEMORY);
    int res = 0;
    if (s1Needy && !s2Needy)
    res = -1;
    else if (s2Needy && !s1Needy)
    res = 1;
    else if (s1Needy && s2Needy)
    res = (int) Math.signum(minShareRatio1 - minShareRatio2);
    else
    // Neither schedulable is needy
    res = (int) Math.signum(useToWeightRatio1 - useToWeightRatio2);
    if (res == 0) {
    // Apps are tied in fairness ratio. Break the tie by submit time and job
    // name to get a deterministic ordering, which is useful for unit tests.
    res = (int) Math.signum(s1.getStartTime() - s2.getStartTime());
    if (res == 0)
    res = s1.getName().compareTo(s2.getName());
    }
    return res;
    }
    }

新版优化后如下

  1. @Override
  2. public int compare(Schedulable s1, Schedulable s2) {
  3. int res = compareDemand(s1, s2);
  4.  
  5. // Pre-compute resource usages to avoid duplicate calculation
  6. Resource resourceUsage1 = s1.getResourceUsage();
  7. Resource resourceUsage2 = s2.getResourceUsage();
  8.  
  9. if (res == 0) {
  10. res = compareMinShareUsage(s1, s2, resourceUsage1, resourceUsage2);
  11. }
  12.  
  13. if (res == 0) {
  14. res = compareFairShareUsage(s1, s2, resourceUsage1, resourceUsage2);
  15. }
  16.  
  17. // Break the tie by submit time
  18. if (res == 0) {
  19. res = (int) Math.signum(s1.getStartTime() - s2.getStartTime());
  20. }
  21.  
  22. // Break the tie by job name
  23. if (res == 0) {
  24. res = s1.getName().compareTo(s2.getName());
  25. }
  26.  
  27. return res;
  28. }
  29.  
  30. private int compareDemand(Schedulable s1, Schedulable s2) {
  31. int res = 0;
  32. Resource demand1 = s1.getDemand();
  33. Resource demand2 = s2.getDemand();
  34. if (demand1.equals(Resources.none()) && Resources.greaterThan(
  35. RESOURCE_CALCULATOR, null, demand2, Resources.none())) {
  36. res = 1;
  37. } else if (demand2.equals(Resources.none()) && Resources.greaterThan(
  38. RESOURCE_CALCULATOR, null, demand1, Resources.none())) {
  39. res = -1;
  40. }
  41. return res;
  42. }
  43.  
  44. private int compareMinShareUsage(Schedulable s1, Schedulable s2,
  45. Resource resourceUsage1, Resource resourceUsage2) {
  46. int res;
  47. Resource minShare1 = Resources.min(RESOURCE_CALCULATOR, null,
  48. s1.getMinShare(), s1.getDemand());
  49. Resource minShare2 = Resources.min(RESOURCE_CALCULATOR, null,
  50. s2.getMinShare(), s2.getDemand());
  51. boolean s1Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,
  52. resourceUsage1, minShare1);
  53. boolean s2Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,
  54. resourceUsage2, minShare2);
  55.  
  56. if (s1Needy && !s2Needy) {
  57. res = -1;
  58. } else if (s2Needy && !s1Needy) {
  59. res = 1;
  60. } else if (s1Needy && s2Needy) {
  61. double minShareRatio1 = (double) resourceUsage1.getMemorySize() /
  62. Resources.max(RESOURCE_CALCULATOR, null, minShare1, ONE)
  63. .getMemorySize();
  64. double minShareRatio2 = (double) resourceUsage2.getMemorySize() /
  65. Resources.max(RESOURCE_CALCULATOR, null, minShare2, ONE)
  66. .getMemorySize();
  67. res = (int) Math.signum(minShareRatio1 - minShareRatio2);
  68. } else {
  69. res = 0;
  70. }
  71.  
  72. return res;
  73. }
  74.  
  75. /**
  76. * To simplify computation, use weights instead of fair shares to calculate
  77. * fair share usage.
  78. */
  79. private int compareFairShareUsage(Schedulable s1, Schedulable s2,
  80. Resource resourceUsage1, Resource resourceUsage2) {
  81. double weight1 = s1.getWeights().getWeight(ResourceType.MEMORY);
  82. double weight2 = s2.getWeights().getWeight(ResourceType.MEMORY);
  83. double useToWeightRatio1;
  84. double useToWeightRatio2;
  85. if (weight1 > 0.0 && weight2 > 0.0) {
  86. useToWeightRatio1 = resourceUsage1.getMemorySize() / weight1;
  87. useToWeightRatio2 = resourceUsage2.getMemorySize() / weight2;
  88. } else { // Either weight1 or weight2 equals to 0
  89. if (weight1 == weight2) {
  90. // If they have same weight, just compare usage
  91. useToWeightRatio1 = resourceUsage1.getMemorySize();
  92. useToWeightRatio2 = resourceUsage2.getMemorySize();
  93. } else {
  94. // By setting useToWeightRatios to negative weights, we give the
  95. // zero-weight one less priority, so the non-zero weight one will
  96. // be given slots.
  97. useToWeightRatio1 = -weight1;
  98. useToWeightRatio2 = -weight2;
  99. }
  100. }
  101.  
  102. return (int) Math.signum(useToWeightRatio1 - useToWeightRatio2);
  103. }
  104.  
  105. }

用了测试环境集群 比较了修改前后两次队列排序耗时

图中使用挫劣的方式比对 请观众凑合看吧^-^

上面红框里为 新版本 下面红框为老版本 虽然没有进行压测 但是在同样的调度任务前提下 是有说服力的 在大集群上每秒调度上千万乃至上亿次该方法时  调度优化变的明显

上线压测时 在1000队列 1500 pending任务600running任务时 调度性能提高了一倍 还是比较明显的提升的

优化二 : 优化yarn调度逻辑

思想:在大规模集群中 资源利用率表现的并不好,为了提高资源利用率,开启持续调度 然而实践发现 资源利用率是上去了但是 集群调度能力很弱 处理跟释放的container并没有提高

排查原因是心跳调度跟持续调度 走相同的synchronized 方法修饰的attemptScheduling 导致竞争锁 分配和释放都变的缓慢 且队列排序分配 在集群pending任务巨多时异常缓慢

优化:1,启用持续调度 禁用心跳调度

2,持续调度按批进行 间接减少队列排序造成的耗时影响

3. 释放不重要的锁 解放性能

说干就干

开启yarn的持续调度 配置如下:

  1. <property>
  2. <name>yarn.scheduler.fair.continuous-scheduling-enabled</name>
  3. <value>true</value>
  4. <discription>是否打开连续调度功能</discription>
  5. </property>
  6. <property>

持续调度 每5ms执行一次上述方法 对node依次迭代执行

  1. void continuousSchedulingAttempt() throws InterruptedException {
  2. long start = getClock().getTime();
  3. List<NodeId> nodeIdList = new ArrayList<NodeId>(nodes.keySet());
  4. // Sort the nodes by space available on them, so that we offer
  5. // containers on emptier nodes first, facilitating an even spread. This
  6. // requires holding the scheduler lock, so that the space available on a
  7. // node doesn't change during the sort.
  8. synchronized (this) {
  9. Collections.sort(nodeIdList, nodeAvailableResourceComparator); //对所有node 根据资源排序
  10. }
  11.  
  12. // iterate all nodes
  13. for (NodeId nodeId : nodeIdList) { //遍历所有的node
  14. FSSchedulerNode node = getFSSchedulerNode(nodeId);
  15. try {
  16. if (node != null && Resources.fitsIn(minimumAllocation,
  17. node.getAvailableResource())) { //判断该node 上现有的资源是否大于最小配置资源单位
  18. attemptScheduling(node); //执行ttemptScheduling方法
  1. } } catch (Throwable ex) { LOG.error("Error while attempting scheduling for node " + node + ": " + ex.toString(), ex); if ((ex instanceof YarnRuntimeException) && (ex.getCause() instanceof InterruptedException)) { // AsyncDispatcher translates InterruptedException to // YarnRuntimeException with cause InterruptedException. // Need to throw InterruptedException to stop schedulingThread. throw (InterruptedException)ex.getCause(); } } }

下面看下attemptScheduling方法

  1. @VisibleForTesting
  2. synchronized void attemptScheduling(FSSchedulerNode node) {
  3. if (rmContext.isWorkPreservingRecoveryEnabled()
  4. && !rmContext.isSchedulerReadyForAllocatingContainers()) {
  5. return;
  6. }
  7.  
  8. final NodeId nodeID = node.getNodeID();
  9. if (!nodes.containsKey(nodeID)) { //合法性
  10. // The node might have just been removed while this thread was waiting
  11. // on the synchronized lock before it entered this synchronized method
  12. LOG.info("Skipping scheduling as the node " + nodeID +
  13. " has been removed");
  14. return;
  15. }
  16.  
  17. // Assign new containers...
  18. // 1. Check for reserved applications
  19. // 2. Schedule if there are no reservations
  20.  
  21. boolean validReservation = false;
  22. FSAppAttempt reservedAppSchedulable = node.getReservedAppSchedulable();
  23. if (reservedAppSchedulable != null) {
  24. validReservation = reservedAppSchedulable.assignReservedContainer(node);
  25. }
  26. if (!validReservation) { //合法性判断
  27. // No reservation, schedule at queue which is farthest below fair share
  28. int assignedContainers = 0;
  29. Resource assignedResource = Resources.clone(Resources.none());
  30. Resource maxResourcesToAssign =
  31. Resources.multiply(node.getAvailableResource(), 0.5f); //默认使用该node最大50%的资源
  32. while (node.getReservedContainer() == null) {
  33. boolean assignedContainer = false;
  34. Resource assignment = queueMgr.getRootQueue().assignContainer(node); //主要方法 依次对root树 遍历直到app 对该node上分配container
  35. if (!assignment.equals(Resources.none())) { //分配到资源
  36. assignedContainers++; //分配到的container个数增1
  37. assignedContainer = true;
  38. Resources.addTo(assignedResource, assignment);
  39. }
  40. if (!assignedContainer) { break; } //未匹配到 跳出
  41. if (!shouldContinueAssigning(assignedContainers, //根据相关配置判断 现在分配的container个数 是否超出node上配置最大数 或node上的可用资源是否超出最小的配置资源
  42. maxResourcesToAssign, assignedResource)) {
  43. break;
  44. }
  45. }
  46. }
  47. updateRootQueueMetrics();
  48. }

针对上面源码 修改为如下内容:

  1. interface Schedulable 接口新增 方法
  1. /**
  2. * Assign list container list this node if possible, and return the amount of
  3. * resources assigned.
  4. */
  5. public List<Resource> assignContainers(List<FSSchedulerNode> nodes);
  1. @VisibleForTesting
  2. protected void attemptSchedulings(ArrayList<FSSchedulerNode> fsSchedulerNodeList) {
  3. if (rmContext.isWorkPreservingRecoveryEnabled()
  4. && !rmContext.isSchedulerReadyForAllocatingContainers()) {
  5. return;
  6. }
  7. List<FSSchedulerNode> fsSchedulerNodes = new ArrayList(); //定义个新集合 添加通过检查的node 抽象对象
  8. fsSchedulerNodeList.stream().forEach(node -> {
  9. final NodeId nodeID = node.getNodeID();
  10. if (nodes.containsKey(nodeID)) {
  11. // Assign new containers...// 1. Check for reserved applications
  12. // 2. Schedule if there are no reservations
  13. boolean validReservation = false;
  14. FSAppAttempt reservedAppSchedulable = node.getReservedAppSchedulable();
  15. if (reservedAppSchedulable != null) {
  16. validReservation = reservedAppSchedulable.assignReservedContainer(node);
  17. }
  18. if (!validReservation) { //通过合法检查
  19. if (node.getReservedContainer() == null) { //该node上 没有被某个container预留
  20. fsSchedulerNodes.add(node);
  21. }
  22. }
  23. } else {
  24. LOG.info("Skipping scheduling as the node " + nodeID +
  25. " has been removed");
  26. }
  27. });
  28. if (fsSchedulerNodes.isEmpty()) {
  29. LOG.error("Handle fsSchedulerNodes empty and return");
  30. return;
  31. }
  32. LOG.info("符合条件的nodes:" + fsSchedulerNodeList.size());
  33. List<Resource> resources = queueMgr.getRootQueue().assignContainers(fsSchedulerNodes); //传入node的集合 批量操作
  34. fsOpDurations.addDistributiveContainer(resources.size());
  35. LOG.info("本次分配的container count:" + resources.size());
  36. updateRootQueueMetrics();
  37. }
  1. FSParentQueue 类中 添加实现
  1. @Override
  2. public List<Resource> assignContainers(List<FSSchedulerNode> nodes) {
  3. List<Resource> assignedsNeed = new ArrayList<>();
  4. ArrayList<FSSchedulerNode> fsSchedulerNodes = new ArrayList<>();
  5. for (FSSchedulerNode node : nodes) {
  6. if (assignContainerPreCheck(node)) {
  7. fsSchedulerNodes.add(node);
  8. }
  9. }
  10. if (fsSchedulerNodes.isEmpty()) {
  11. LOG.info("Nodes is empty, skip this assign around");
  12. return assignedsNeed;
  13. }
  14.  
  15. // Hold the write lock when sorting childQueues
  16. writeLock.lock();
  17. try {
  18. Collections.sort(childQueues, policy.getComparator()); //排序又见排序 哈哈
  19. } finally {
  20. writeLock.unlock();
  21. }
  22.  
  23. /*
  24. * We are releasing the lock between the sort and iteration of the
  25. * "sorted" list. There could be changes to the list here:
  26. * 1. Add a child queue to the end of the list, this doesn't affect
  27. * container assignment.
  28. * 2. Remove a child queue, this is probably good to take care of so we
  29. * don't assign to a queue that is going to be removed shortly.
  30. */
  31. readLock.lock();
  32. try {
  33. for (FSQueue child : childQueues) {
  34. List<Resource> assigneds = child.assignContainers(fsSchedulerNodes); //同样传入node集合
  35. if (!assigneds.isEmpty()) {
  36. for (Resource assign : assigneds) {
  37. assignedsNeed.add(assign);
  38. }
  39. break;
  40. }
  41. }
  42. } finally {
  43. readLock.unlock();
  44. }
  45.  
  46. return assignedsNeed;
  47. }

app最终在FSLeafQueue节点上得到处理

  1. @Override
  2. public List<Resource> assignContainers(List<FSSchedulerNode> nodes) {
  3. Resource assigned = Resources.none();
  4. List<Resource> assigneds = new ArrayList<>();
  5. ArrayList<FSSchedulerNode> fsSchedulerNodes = new ArrayList<>();
  6. for (FSSchedulerNode node : nodes) {
  7. if (assignContainerPreCheck(node)) {
  8. fsSchedulerNodes.add(node);
  9. }
  10. }
  11. if (fsSchedulerNodes.isEmpty()) {
  12. LOG.info("Nodes is empty, skip this assign around");
  13. return assigneds;
  14. }
  15. // Apps that have resource demands.
  16. TreeSet<FSAppAttempt> pendingForResourceApps =
  17. new TreeSet<FSAppAttempt>(policy.getComparator());
  18. readLock.lock();
  19. try {
  20. for (FSAppAttempt app : runnableApps) { //所有的app running or pending 队列 进行依次排序
  21. Resource pending = app.getAppAttemptResourceUsage().getPending();
  22. if (!pending.equals(Resources.none())) { //有资源需求的加入排序队列
  23. pendingForResourceApps.add(app);
  24. }
  25. }
  26. } finally {
  27. readLock.unlock();
  28. }
  29.  
  30. int count = 0; //每个node 分配container计数
  31. Set<String> repeatApp = new HashSet<>(); //定义去重集合
  32. for (FSSchedulerNode node : fsSchedulerNodes) { //node 遍历
  33. count = 0;
  34. for (FSAppAttempt sched : pendingForResourceApps) { //app遍历
  35. // One node just allocate for one app once
  36. if (repeatApp.contains(sched.getId())) { //去重
  37. continue;
  38. }
  39. if (SchedulerAppUtils.isPlaceBlacklisted(sched, node, LOG)) { //判断app有没有在node黑名单里
  40. continue;
  41. }
  42. if (node.getReservedContainer() == null
  43. && Resources.fitsIn(minimumAllocation, node.getAvailableResource())) { //判断node上还有没有资源
  44. assigned = sched.assignContainer(node); //具体分配container方法
  45. if (!assigned.equals(Resources.none())) {//给container 在node上分配到了资源
  46. count++;
  47. repeatApp.add(sched.getId());
  48. assigneds.add(assigned);
  49. if (LOG.isDebugEnabled()) {
  50. LOG.debug("Assigned container in queue:" + getName() + " " +
  51. "container:" + assigned);
  52. }
  53. }
  54. }
  55. if (count >= maxNodeContainerAssign) { //node 分配的数量 超出最大的配置数 跳出 给下一node 分配
  56. break;
  57. }
  58. }
  59. }
  60. return assigneds;
  61. }
  62.  

这轮优化 完毕 对比之前 调度性能提高了四倍样子 线上的积压问题得到有效解决

优化后nodeUpdate耗时对比如下

hadoop2.9.0之前的版本yarn RM fairScheduler调度性能优化的更多相关文章

  1. hadoop2.6.0高可靠及yarn 高可靠搭建

    以前用hadoop2.2.0只搭建了hadoop的高可用,但在hadoop2.2.0中始终没有完成YARN HA的搭建,直接下载了hadoop最新稳定版本2.6.0完成了YARN HA及HADOOP ...

  2. Hadoop YARN:调度性能优化实践(转)

    https://tech.meituan.com/2019/08/01/hadoop-yarn-scheduling-performance-optimization-practice.html 文章 ...

  3. Hadoop-2.2.0中国文档—— MapReduce 下一代 -- 公平调度

    目的 此文档描写叙述了 FairScheduler, Hadoop 的一个可插入式的调度器,同意 YARN 应用在一个大集群中公平地共享资源. 简单介绍 公平调度是一种分配资源给应用的方法.以致到最后 ...

  4. Hadoop-2.2.0中文文档—— MapReduce 下一代--容量调度器

    目的 这份文档描写叙述 CapacityScheduler,一个为Hadoop能同意多用户安全地共享一个大集群的插件式调度器,如他们的应用能适时被分配限制的容量. 概述 CapacitySchedul ...

  5. Hadoop2.2.0(yarn)编译部署手册

    Created on 2014-3-30URL : http://www.cnblogs.com/zhxfl/p/3633919.html @author: zhxfl   Hadoop-2.2编译 ...

  6. hadoop2.6.0实践:004 启动伪分布式hadoop的进程

    [hadoop@LexiaofeiMaster hadoop-2.6.0]$ start-dfs.shStarting namenodes on [localhost]localhost: start ...

  7. CentOS6.4编译Hadoop-2.4.0

      因为搭建Hadoop环境的时候,所用的系统镜像是emi-centos-6.4-x86_64,是64位的,而hadoop是默认是32的安装包.这导致我们很多操作都会遇到这个问题(Java HotSp ...

  8. Hadoop 新生报道(二) hadoop2.6.0 集群系统版本安装和启动配置

    本次基于Hadoop2.6版本进行分布式配置,Linux系统是基于CentOS6.5 64位的版本.在此设置一个主节点和两个从节点. 准备3台虚拟机,分别为: 主机名 IP地址 master 192. ...

  9. Spark on YARN模式的安装(spark-1.6.1-bin-hadoop2.6.tgz + hadoop-2.6.0.tar.gz)(master、slave1和slave2)(博主推荐)

    说白了 Spark on YARN模式的安装,它是非常的简单,只需要下载编译好Spark安装包,在一台带有Hadoop YARN客户端的的机器上运行即可.  Spark on YARN简介与运行wor ...

随机推荐

  1. Tomcat7.0.40注册到服务启动报错error Code 1 +connector attribute sslcertificateFile must be defined when using ssl with apr

    Tomcat7.0.40 注册到服务启动遇到以下几个问题: 1.启动报错errorCode1 查看日志如下图: 解决办法: 这个是因为我的jdk版本问题,因为电脑是64位,安装的jdk是32位的所以会 ...

  2. pdfminer API介绍:pdf网页爬虫

    安装 pip install pdfminer 爬取数据是数据分析项目的第一个阶段,有的加密成pdf格式的文件,下载后需要解析,使用pdfminer工具. 先介绍一下什么是pdfminer 下面是官方 ...

  3. PHP in_array

    1.函数的作用:判读一个元素是否在一个数组存在 2.函数的参数: @param mixed $needle @param array $array 3. if it’s ok to use isset ...

  4. 01 【PMP】组织结构类型

    [PMP]组织结构类型   1.简单型 描述:人员并肩工作,所有者/经营者直接做出主要决定并监督执行. PM角色:兼职(协调员) PM权限:极少(无) 项目管理人员:极少(无) 资源可用性:极少(无) ...

  5. Luogu1119灾后重建

    题目背景 BBB 地区在地震过后,所有村庄都造成了一定的损毁,而这场地震却没对公路造成什么影响.但是在村庄重建好之前,所有与未重建完成的村庄的公路均无法通车.换句话说,只有连接着两个重建完成的村庄的公 ...

  6. Cocos2d-x 学习笔记(11.6) Sequence

    1. Sequence 动作序列.动作按参数顺序执行,动作总时长为每个动作的时长之和. 1.1 成员变量 FiniteTimeAction *_actions[]; float _split; // ...

  7. unittest断言

    assertEquals(expected,actual,msg=msg)    # 判断 expected,actual是否一致,msg类似备注,可以为空

  8. 11.Linux用户特殊权限

    1.特殊权限概述 前面我们已经学习过 r(读).w(写). x(执行)这三种普通权限,但是我们在査询系统文件权限时会发现出现了一些其他权限字母,比如: 2.特殊权限SUID set uid 简称sui ...

  9. ESP8266开发之旅 网络篇③ Soft-AP——ESP8266WiFiAP库的使用

    1. 前言     在前面的篇章中,博主给大家讲解了ESP8266的软硬件配置以及基本功能使用,目的就是想让大家有个初步认识.并且,博主一直重点强调 ESP8266 WiFi模块有三种工作模式: St ...

  10. 在已有 Windows10 系统的基础上,安装 Ubuntu17.10 系统(新版 BIOS)

    1.第一步,下载Ubuntu系统镜像 2.第二步,制作启动U盘,使用UltralSO,步骤:打开文件——选择iso文件——启动——写入硬盘映像——选择U盘——写入 3.第三步,分区,在Windows徽 ...