Anatomy of a MapReduce Job

In MapReduce, a YARN application is called a Job. The implementation of the Application Master provided by the MapReduce
framework is called MRAppMaster.

Timeline of a
MapReduce Job

This
is the timeline of a MapReduce Job execution:

Map Phase: several Map Tasks are executed
Reduce Phase: several Reduce Tasks are executed

Notice that the Reduce Phase may start before the end of Map Phase. Hence, an interleaving between them is possible.

Map Phase

We now focus our discussion on the Map Phase. A key decision is how many MapTasks the Application Master needs to start for the current job.

What does the user
give us?

Let’s take a step back. When a client submits an application, several kinds of information are provided to the YARN infrastucture. In particular:

a configuration: this may be partial (some parameters are not specified by the user) and in this case the default values are used for the job. Notice that these default values may be the ones chosen by a Hadoop provider
like Amanzon.
a JAR containing:
- a map() implementation
- a combiner implementation
- a reduce() implementation
input and output information:
- input directory: is the input directory on HDFS? On S3? How many files?
- output directory: where will we store the output? On HDFS? On S3?

The number of files inside the input directory is used for deciding the number of Map Tasks of a job.

The Application Master will launch one MapTask for each map split. Typically, there is a map split for each input file. If the input file is too big (bigger than the HDFS block size) then
we have two or more map splits associated to the same input file. This is the pseudocode used inside the method getSplits() of
the FileInputFormat class:

num_splits = 0

for each input file f:

   remaining = f.length

   while remaining / split_size > split_slope:

      num_splits += 1

      remaining -= split_size

where:

split_slope = 1.1

split_size =~ dfs.blocksize

Notice that the configuration parameter mapreduce.job.maps is
ignored in MRv2 (in the past it was just an hint).

MapTask Launch

The MapReduce Application Master asks to the Resource Manager for Containers needed by the Job: one MapTask container request for each MapTask (map split).

A container request for a MapTask tries to exploit data locality of the map split. The Application Master asks for:

a container located on the same Node Manager where the map split is stored (a map split may be stored on multiple nodes due to the HDFS replication factor);
otherwise, a container located on a Node Manager in the same rack where the the map split is stored;
otherwise, a container on any other Node Manager of the cluster

This is just an hint to the Resource Scheduler. The Resource Scheduler is free to ignore data locality if the suggested assignment is in conflict with the Resouce Scheduler’s goal.

When a Container is assigned to the Application Master, the MapTask is launched.

Map
Phase: example of an execution scenario

This is a possible execution scenario of the Map Phase:

there are two Node Managers: each Node Manager has 2GB of RAM (NM capacity) and each MapTask requires 1GB, we can run in parallel 2 containers on each Node Manager (this is the best scenario, the Resource Scheduler may decide
differently)
there are no other YARN applications running in the cluster
our job has 8 map splits (e.g., there are 7 files inside the input directory, but only one of them is bigger than the HDFS block size so we split it into 2 map splits): we need to run 8 Map Tasks.

Map Task Execution
Timeline

Let’s
now focus on a single Map Task. This is the Map Task execution timeline:

INIT phase: we setup the Map Task
EXECUTION phase: for each (key, value) tuple inside the map split we run the map() function
SPILLING phase: the map output is stored in an in-memory buffer; when this buffer is almost full then we start
(in parallel) the spilling phase in order to remove data from it
SHUFFLE phase: at the end of the spilling phase, we merge all the map outputs and package them for the reduce phase

MapTask: INIT

During the INIT phase, we:

create a context (TaskAttemptContext.class)
create an instance of the user Mapper.class
setup the input (e.g., InputFormat.class, InputSplit.class, RecordReader.class)
setup the output (NewOutputCollector.class)
create a mapper context (MapContext.class, Mapper.Context.class)
initialize the input, e.g.:
create a SplitLineReader.class object
create a HdfsDataInputStream.class object

MapTask: EXECUTION

The EXECUTION phase is performed by the run method
of the Mapper class. The user can override it, but by default it will start by calling the setup method:
this function by default does not do anything useful but can be override by the user in order to setup the Task (e.g., initialize class variables). After the setup, for each <key, value> tuple contained in the map split, the map() is
invoked. Therefore, map() receives: a key a value, and a mapper context. Using the context, a map stores
its output to a buffer.

Notice that the map split is fetched chuck by chunk (e.g., 64KB) and each chunk is split in several (key, value) tuples (e.g., using SplitLineReader.class).
This is done inside the Mapper.Context.nextKeyValue method.

When the map split has been completely processed, the run function
calls the clean method: by default, no action is performed but the user may decide to override
it.

MapTask: SPILLING

As seen in the EXECUTING phase, the map will
write (using Mapper.Context.write()) its output into a circular in-memory buffer (MapTask.MapOutputBuffer).
The size of this buffer is fixed and determined by the configuration parameter mapreduce.task.io.sort.mb (default:
100MB).

Whenever this circular buffer is almost full (mapreduce.map. sort.spill.percent: 80% by default), the SPILLING phase is performed (in parallel using a separate thread). Notice that if the splilling thread is too slow and the buffer is 100% full, then the map() cannot
be executed and thus it has to wait.

The SPILLING thread performs the following actions:

it creates a SpillRecord and FSOutputStream (local
filesystem)
in-memory sorts the used chunk of the buffer: the output tuples are sorted by (partitionIdx, key) using a quicksort algorithm.
the sorted output is split into partitions: one partition for each ReduceTask of the job (see later).
Partitions are sequentially written into the local file.

How Many Reduce Tasks?

The number of ReduceTasks for the job is decided by the configuration parameter mapreduce.job.reduces.

What
is the partitionIdx associated to an output tuple?

The paritionIdx of an output tuple is the index of a partition. It is decided inside the Mapper.Context.write():

partitionIdx = (key.hashCode() & Integer.MAX_VALUE) % numReducers

It is stored as metadata in the circular buffer alongside the output tuple. The user can customize the partitioner by setting the configuration parameter mapreduce.job.partitioner.class.

When do we apply
the combiner?

If the user specifies a combiner then the SPILLING thread, before writing the tuples to the file (4), executes the combiner on the tuples contained in each partition. Basically, we:

create an instance of the user Reducer.class (the one specified
for the combiner!)
create a Reducer.Context: the output will be stored on the
local filesystem
execute Reduce.run(): see Reduce Task description

The combiner typically use the same implementation of the standard reduce() function
and thus can be seen as a local reducer.

MapTask: end of EXECUTION

At the end of the EXECUTION phase, the SPILLING thread is triggered for the last time. In more detail, we:

sort and spill the remaining unspilled tuples
start the SHUFFLE phase

Notice that for each time the buffer was almost full, we get one spill file (SpillReciord +
output file). Each Spill file contains several partitions (segments).

MapReduce 图解流程超详细解答(1)-【map阶段】
转自:http://www.open-open.com/lib/view/open1453097241308.html 在MapReduce中,一个YARN 应用被称作一个job, MapReduc ...
MapReduce 图解流程超详细解答(2)-【map阶段】
接上一篇讲解:http://blog.csdn.net/mrcharles/article/details/50465626 map任务:溢写阶段正如我们在执行阶段看到的一样,map会使用Mappe ...
MapReduce基本流程与设计思想初步
1.MapReduce是什么? MapReduce是一种编程模型,用于大规模数据集的并行运算.它借用了函数式的编程概念,是Google发明的一种数据处理模型. 主要思想为:Map(映射)和Reduce ...
MapReduce工作流程及Shuffle原理概述
引言: 虽然MapReduce计算框架简化了分布式程序设计,将所有的并行程序均需要关注的设计细节抽象成公共模块并交由系统实现,用户只需关注自己的应用程序的逻辑实现,提高了开发效率,但是开发如果对Map ...
MapReduce 图解流程
Anatomy of a MapReduce Job In MapReduce, a YARN application is called a Job. The implementation of t ...
mapreduce执行流程
角色描述:JobClient:执行任务的客户端JobTracker:任务调度器TaskTracker:任务跟踪器Task:具体的任务(Map OR Reduce) 从生命周期的角度来看,mapredu ...
MapReduce处理流程
MapReduce是Hadoop2.x的一个计算框架,利用分治的思想,将一个计算量很大的作业分给很多个任务,每个任务完成其中的一小部分,然后再将结果合并到一起.将任务分开处理的过程为map阶段,将每个 ...
MapReduce运行流程分析
研究MapReduce已经有一段时间了.起初是从分析WordCount程序开始,后来开始阅读Hadoop源码,自认为已经看清MapReduce的运行流程.现在把自己的理解贴出来,与大家分享,欢迎纠错. ...
MapReduce执行流程及程序编写
MapReduce 一种分布式计算模型,解决海量数据的计算问题,MapReduce将计算过程抽象成两个函数 Map(映射):对一些独立元素(拆分后的小块)组成的列表的每一个元素进行指定的操作,可以高度 ...

随机推荐

双十一HostGator独立服务器方案
一年一度的“双十一”购物狂欢节到来,各大电商平台线上消费的各种“吸金”开启了“双十一”模式,一年一度的“双十一”网购狂欢又开始以“巨大的价格优势”来勾起消费者的购买欲望. 此次双十一期间,HostGa ...
CSIC_716_20191126【面向对象编程--继承】
继承什么是继承:继承是新建类的一种方式,通过此方式生成的类称为子类.或者派生类,被继承的类称为父类.基类或超类.在python中,一个子类可以继承多个父类. 继承的作用:减少代码的冗余,提高开发效 ...
Java迷宫代码，深度优先遍历
此次迷宫深度优先遍历寻找路径采用栈结构,每个节点都有固定的行走方向(右下左上),除非一个方向走不通,不然会一条道走到黑. 如果路径存在,打印出行走路径,否则打印出迷宫不存在有效路径. 方向常量定义: ...
10月23日——作业1——while循环练习
while循环'''此类编程题,注意带进去试一试1.九九乘法表row=1while row<=9: col=1 while col<=row: print(col,"*" ...
CSS3——伸缩布局及应用
CSS3在布局方面做了非常大的改进,使得我们对块级元素的布局排列变得十分灵活,适应性非常强,其强大的伸缩性,在响应式开中可以发挥极大的作用. 主轴:Flex容器的主轴主要用来配置Flex项目,默认是水 ...
期望dp——zoj3640
/* dp[i]表示力量为i时的期望 dp[i]=sum{tj}/n+sum{dp[i+cj]+1}/n //前一项是cj<i的和,后一项是cj>=i的和初始状态dp[m] */ #in ...
Celery - 异步任务 , 定时任务 , 周期任务
1.什么是Celery?Celery 是芹菜Celery 是基于Python实现的模块, 用于执行异步定时周期任务的其结构的组成是由 1.用户任务 app 2.管道 broker 用于存储 ...
使用Navicat连接管理远程linux服务器上的mysql数据库
第一步:选择连接,选择mysql 第二步:填写下面弹出框的信息:连接名随便写,主机名或IP地址:写上服务器的ip. 端口不变用户名不变. 密码:输入服务器数据库的密码12345678. 接着测 ...
18.scrapy_maitian
ershoufang.py # -*- coding: utf-8 -*- import scrapy class ErshoufangSpider(scrapy.Spider): name = 'e ...
Python flask 构建微电影视频网站✍✍✍
Python flask 构建微电影视频网站整个课程都看完了,这个课程的分享可以往下看,下面有链接,之前做java开发也做了一些年头,也分享下自己看这个视频的感受,单论单个知识点课程本身没问题,大 ...

MapReduce 图解流程