MR execution in YARN
Overview
YARN provides API not for application developers but for the great developers working on new computing engines. YARN make it easy and unified for resource management for the computing engines. It fills the gap between mputation and storage. NoSQL database like HBase use slider apdaters to YARN.
With YARN
Withou YARN
Entities in YARN
The base of Distribution is HDFS and YARN. HDFS for managing storage. YARN for managing computing.
Client: who submits the job: connects to MR or HDFS framework.
YARN Resource Manager: allocate computing resource required by the job.
Scheduleer:job scheduling,locate the resources.
Application Manager:performan any monitoring or tracking of application/job status.
YARN Node Manager: on all slave nodes. launch / manager containers.
MR Application Master:carry out execution of the job associated with it. different between computing engines. It coordinates the tasks running and monitors the progress and aggregates it and since reports to its client . It is spawn under node manager on the instruction by RM. spawn for every job and end with the job done.
YARN Child: manages the run of the map and reduce tasks,send updates / progress to application master.
HDFS:i/o
The process of job run in YARN
Job submission:Your program triggers the job client and the job client contacts the RM for the new job id. copy the job resource to HDFS with high replica and then submit the job.
Job Initialization: Then RM (the scheduler)picks up the job from the job queue(FIFO,capacity,fair) and contacts NM,sponsor new container (Linux kernel feature, a abstruct of resource like cpu,mem,disk,network bandwidth. doker uses it too) and launches AM for the job.
Job Assigement: AM creates new objects , it retrives the input splits from HDFS and crete one task per input split. AM then decides if the job is samll or not. If it is small job , run its jvm on a single node. If not,contacts RM locate computing resources.
Job Execution:RM considers data locality while assigning the resources(Scheduler at this time knows where the splits are located.It gathers this info from the heartbeats of NM. Based on it, it consider data locality when allocating resources. try as best, then consider the rack local nodes ,if still fails, it will pick random from available noedes).AM then communicates node managers which launches the yarn child(a java program the main class is YarnChild,seperate JVM from long running system demons from the suer code). yarn child retrieves the code and other resource from HDFS and then run the tasks(mr).Yan child sends the progress to AM(every 3 seconds) which aggregrates(each yarn client's information) the report and sends the report to the client.
Job Exmpletion:On job completion , yarn child and AM terminates themseves for the next job.
MR execution in YARN的更多相关文章
- Yarn源码分析之MRAppMaster上MapReduce作业处理总流程(一)
我们知道,如果想要在Yarn上运行MapReduce作业,仅需实现一个ApplicationMaster组件即可,而MRAppMaster正是MapReduce在Yarn上ApplicationMas ...
- hadoop多机安装HA+YARN
HA 相比于Hadoop1.0,Hadoop 2.0中的HDFS增加了两个重大特性,HA(热备)和Federation(联邦).HA即为High Availability,用于解决NameNode单点 ...
- hadoop多机安装YARN
hadoop伪分布安装称为测试环境安装,多机分布称为生成环境安装.以下安装没有进行HA(热备)和Federation(联邦).除非是性能需要,否则没必要安装Federation,HA可以一试,涉及到Z ...
- Hadoop2.4.1 64-Bit QJM HA and YARN HA + Zookeeper-3.4.6 + Hbase-0.98.8-hadoop2-bin HA Install
Hadoop2.4.1 64-Bit QJM HA and YARN HA Install + Zookeeper-3.4.6 + Hbase-0.98.8-hadoop2-bin HA(Hadoop ...
- Hadoop 5、HDFS HA 和 YARN
Hadoop 2.0 产生的背景Hadoop 1.0 中HDFS和MapReduce存在高可用和扩展方面的问题 HDFS存在的问题 NameNode单点故障,难以用于在线场景 NameNode压力过大 ...
- YARN的基础配置
基于HADOOP3.0+Centos7.0的yarn基础配置: 执行步骤:(1)配置集群yarn (2)启动.测试集群(3)在yarn上执行wordcount案例 一.配置yarn集群 1.配置yar ...
- YARN的三种调度器的使用
YRAN提供了三种调度策略 一.FIFO-先进先出调度器 YRAN默认情况下使用的是该调度器,即所有的应用程序都是按照提交的顺序来执行的,这些应用程序都放在一个队列中,只有在前面的一个任务执行完成之后 ...
- Hadoop YARN上运行MapReduce程序
(1)配置集群 (a)配置hadoop-2.7.2/etc/hadoop/yarn-env.sh 配置一下JAVA_HOME export JAVA_HOME=/home/hadoop/bigdata ...
- hadoop3.1集成yarn ha
1.角色分配
随机推荐
- 暂存,本人博客有bug,正在全力修复。
当阳光洒满大地,当清晨的凝露如水滴滋润着世间万物,我就在这里.我在这里静静的看着这一切,这宁静的美好.耳边传来的英文歌曲.手里拿着的带着书香的书,时光倒流仿佛回到了多年前的清晨,那时的我每天读书背英语 ...
- 技巧:Vimdiff 使用(改)
技巧:Vimdiff 使用(改) 各种 IDE 大行其道的同时,传统的命令行工具以其短小精悍,随手可得的特点仍有很大的生存空间,这篇短文介绍了一个文本比较和合并的小工具:vimdiff.希望能对在 U ...
- chromium之MessagePump.h
上代码,注释已经写得很详细了. 粗看一下,这是个纯虚类,用于跨平台的通用接口. MessagePump,Pump的意思是泵,,MessagePump也就是消息泵,输送消息 namespace base ...
- redhat6 快速部署percona
1.首先得能访问外网 2.yum install http://www.percona.com/downloads/percona-release/redhat/0.1-4/percona-relea ...
- jQuery 动画效果 与 动画队列
基础效果 .hide([duration ] [,easing ] [,complete ]) 用于隐藏元素,没有参数的时候等同于直接设置 display 属性 $('.target').hide() ...
- MySQL 参数slave_pending_jobs_size_max设置
今天生产环境上从库出现SQL进程停止的异常,错误信息如下: Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate ...
- 网站用户行为分析——HBase的安装与配置
Hbase介绍 HBase是一个分布式的.面向列的开源数据库,源于Google的一篇论文<BigTable:一个结构化数据的分布式存储系统>.HBase以表的形式存储数据,表有行和列组成, ...
- js分片上传大文件,前端代码
首先导入jQuery.form.js文件,下面src是相对于改js文件位置, <script type="text/JavaScript" src="jquery/ ...
- SAS中的宏语言
一.sas 宏变量 1) 宏变量属于SAS宏语言,与普通变量的区别是可以独立于DATA步 2) 可以再SAS程序中除数据行之外的任何地方定义并使用宏变量 3) %let语句定义宏变量并分配一个值给宏变 ...
- AS 3.1 项目打包成jar或aar
1.首先明白一个道理. Android Studio编译的时候会自动将项目生成jar和aar的,我一开始以为jar需要自己单独生成,其实AS已经自动生成了,网上找的很多资料都是一个复制的过程而已. 只 ...