Apache Hadoop NextGen MapReduce (YARN)

MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2) or YARN.

The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs.

The ResourceManager and per-node slave, the NodeManager (NM), form the data-computation framework. The ResourceManager is the ultimate authority that arbitrates resources among all the applications in the system.

The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.

The ResourceManager has two main components: Scheduler and ApplicationsManager.

The Scheduler is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc. The Scheduler is pure scheduler in the sense that it performs no monitoring or tracking of status for the application. Also, it offers no guarantees about restarting failed tasks either due to application failure or hardware failures. The Scheduler performs its scheduling function based the resource requirements of the applications; it does so based on the abstract notion of a resource Container which incorporates elements such as memory, cpu, disk, network etc. In the first version, only memory is supported.

The Scheduler has a pluggable policy plug-in, which is responsible for partitioning the cluster resources among the various queues, applications etc. The current Map-Reduce schedulers such as the CapacityScheduler and the FairScheduler would be some examples of the plug-in.

The CapacityScheduler supports hierarchical queues to allow for more predictable sharing of cluster resources

The ApplicationsManager is responsible for accepting job-submissions, negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure.

The NodeManager is the per-machine framework agent who is responsible for containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the ResourceManager/Scheduler.

The per-application ApplicationMaster has the responsibility of negotiating appropriate resource containers from the Scheduler, tracking their status and monitoring for progress.

MRV2 maintains API compatibility with previous stable release (hadoop-1.x). This means that all Map-Reduce jobs should still run unchanged on top of MRv2 with just a recompile.

from:http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html

YARN :Architecture的更多相关文章

  1. Storm on Yarn :原理分析+平台搭建

    Storm on YARN: Storm on YARN被视为大规模Web应用与传统企业应用之间的桥梁.它将Storm事件处理平台与YARN(Yet Another Resource Negotiat ...

  2. [BigData - Hadoop - YARN] YARN:下一代 Hadoop 计算平台

    Apache Hadoop 是最流行的大数据处理工具之一.它多年来被许多公司成功部署在生产中.尽管 Hadoop 被视为可靠的.可扩展的.富有成本效益的解决方案,但大型开发人员社区仍在不断改进它.最终 ...

  3. Apache Hadoop YARN: 背景及概述

    从2012年8月开始Apache Hadoop YARN(YARN = Yet Another Resource Negotiator)成了Apache Hadoop的一项子工程.自此Apache H ...

  4. Spark on Yarn:任务提交参数配置

    当在YARN上运行Spark作业,每个Spark executor作为一个YARN容器运行.Spark可以使得多个Tasks在同一个容器里面运行. 以下参数配置为例子: spark-submit -- ...

  5. Spark On Yarn:提交Spark应用程序到Yarn

    转载自:http://lxw1234.com/archives/2015/07/416.htm 关键字:Spark On Yarn.Spark Yarn Cluster.Spark Yarn Clie ...

  6. 【原创】大数据基础之Spark(2)Spark on Yarn:container memory allocation容器内存分配

    spark 2.1.1 最近spark任务(spark on yarn)有一个报错 Diagnostics: Container [pid=5901,containerID=container_154 ...

  7. AeroSpike踩坑手记1:Architecture of a Real Time Operational DBMS论文导读

    又开了一个新的坑,笔者工作之后维护着一个 NoSQL 数据库.而笔者维护的数据库正是基于社区版本的 Aerospike打造而来.所以这个踩坑系列的文章属于工作总结型的内容,会将使用开发 Aerospi ...

  8. 第1节 yarn:15、关于yarn中常用的参数设置

    第一个参数:container分配最小内存 yarn.scheduler.minimum-allocation-mb     1024   给应用程序container分配的最小内存 第二个参数:co ...

  9. 第1节 yarn:14、yarn集群当中的三种调度器

    yarn当中的调度器介绍: 第一种调度器:FIFO Scheduler  (队列调度器) 把应用按提交的顺序排成一个队列,这是一个先进先出队列,在进行资源分配的时候,先给队列中最头上的应用进行分配资源 ...

随机推荐

  1. ubuntu14.04安装sipp3.2

    本来在centos里不好装的软件,往往ubuntu里会很好装,但sipp恰恰相反,ubuntu里能装死你. 做VOIP测试的话,有时候为了模拟通话中更好的抓包,在环境简陋,又不想使用集线器引起广播风暴 ...

  2. 今天看了shell大神的写的一个统计脚本

    通过nginx日志统计接口耗时排行 grep '/bigbox?' access_log | awk '{print $7"&process="$NF}'| sed -r ...

  3. 002商城项目:maven工程的测试以及svn的使用

    我们上一篇文章搭建了maven工程,这一篇文章我们就要测试这个工程. 1: 由于这个工程还没有页面,我们要首先建立一个页面.在建立页面的jsp的过程中,我发现了一个问题,我这个eclipse由于缺少J ...

  4. 转 FileStream Read File

    FileStream Read File [C#] This example shows how to safely read file using FileStream in C#. To be s ...

  5. Mysql的操作说明

    Mysql对于用户的操作权限的控制都在:mysql.user表中 User字段:表示用户名称: Host字段:表示允许该用户访问的地址,可以是域名(如localhost).主机名.ip和%:%表示不限 ...

  6. IntelliJ IDEA 13试用手记(附详细截图)

    从去年开始转java以来,一直在寻找一款趁手的兵器,eclipse虽然是很多java程序员的首选,但是我发现一旦安装了一些插件,workspace中的项目达到数10个以后,经常崩溃,实在影响编程的心情 ...

  7. 跟我学习Storm_Storm简介

    Storm是由专业数据分析公司BackType开发的一个分布式实时数据处理软件,可以简单.高效.可靠地处理大量的数据流.Twitter在2011年7月收购该公司,并于2011年9月底正式将Storm项 ...

  8. Theano2.1.9-基础知识之条件

    来自:http://deeplearning.net/software/theano/tutorial/conditions.html conditions 一.IfElse vs Switch 这两 ...

  9. 腾讯 or 华为 =》 求职者的困惑

    本文目的: 希望有老司机指点迷津 个人背景: 本人软件工程专业,硕士研究生,2017年7月毕业,个人喜欢Java开发,希望有机会从事Java分布式应用开发 故事背景一: 本人2016年4月份参加了腾讯 ...

  10. php基础入门

    一.序言 由于新公司的需要,我也就从原来的asp专向了,php的学习中.希望通过自己的学习能够尽快的熟悉了解php 二.php独特的语法特色  1.引号问题 在php中单引号和双引号的作用基本相同,但 ...