Anatomy of a MapReduce Job

In MapReduce, a YARN application is called a Job. The implementation of the Application Master provided by the MapReduce
framework is called MRAppMaster.

Timeline of a
MapReduce Job

This
is the timeline of a MapReduce Job execution:

  • Map Phase: several Map Tasks are executed
  • Reduce Phase: several Reduce Tasks are executed

Notice that the Reduce Phase may start before the end of Map Phase. Hence, an interleaving between them is possible.

Map Phase

We now focus our discussion on the Map Phase. A key decision is how many MapTasks the Application Master needs to start for the current job.

What does the user
give us?

Let’s take a step back. When a client submits an application, several kinds of information are provided to the YARN infrastucture. In particular:

  • a configuration: this may be partial (some parameters are not specified by the user) and in this case the default values are used for the job. Notice that these default values may be the ones chosen by a Hadoop provider
    like Amanzon.
  • a JAR containing:
    • map() implementation
    • a combiner implementation
    • reduce() implementation
  • input and output information:
    • input directory: is the input directory on HDFS? On S3? How many files?
    • output directory: where will we store the output? On HDFS? On S3?

The number of files inside the input directory is used for deciding the number of Map Tasks of a job.

How many Map Tasks?

The Application Master will launch one MapTask for each map split. Typically, there is a map split for each input file. If the input file is too big (bigger than the HDFS block size) then
we have two or more map splits associated to the same input file. This is the pseudocode used inside the method getSplits() of
the FileInputFormat class:

num_splits = 0
for each input file f:
remaining = f.length
while remaining / split_size > split_slope:
num_splits += 1
remaining -= split_size

where:

split_slope = 1.1
split_size =~ dfs.blocksize

Notice that the configuration parameter mapreduce.job.maps is
ignored in MRv2 (in the past it was just an hint).

MapTask Launch

The MapReduce Application Master asks to the Resource Manager for Containers needed by the Job: one MapTask container request for each MapTask (map split).

A container request for a MapTask tries to exploit data locality of the map split. The Application Master asks for:

  • a container located on the same Node Manager where the map split is stored (a map split may be stored on multiple nodes due to the HDFS replication factor);
  • otherwise, a container located on a Node Manager in the same rack where the the map split is stored;
  • otherwise, a container on any other Node Manager of the cluster

This is just an hint to the Resource Scheduler. The Resource Scheduler is free to ignore data locality if the suggested assignment is in conflict with the Resouce Scheduler’s goal.

When a Container is assigned to the Application Master, the MapTask is launched.

Map
Phase: example of an execution scenario

This is a possible execution scenario of the Map Phase:

  • there are two Node Managers: each Node Manager has 2GB of RAM (NM capacity) and each MapTask requires 1GB, we can run in parallel 2 containers on each Node Manager (this is the best scenario, the Resource Scheduler may decide
    differently)
  • there are no other YARN applications running in the cluster
  • our job has 8 map splits (e.g., there are 7 files inside the input directory, but only one of them is bigger than the HDFS block size so we split it into 2 map splits): we need to run 8 Map Tasks.

Map Task Execution
Timeline

Let’s
now focus on a single Map Task. This is the Map Task execution timeline:

  • INIT phase: we setup the Map Task
  • EXECUTION phase: for each (key, value) tuple inside the map split we run the map() function
  • SPILLING phase: the map output is stored in an in-memory buffer; when this buffer is almost full then we start
    (in parallel) the spilling phase in order to remove data from it
  • SHUFFLE phase: at the end of the spilling phase, we merge all the map outputs and package them for the reduce phase

MapTask: INIT

During the INIT phase, we:

  1. create a context (TaskAttemptContext.class)
  2. create an instance of the user Mapper.class
  3. setup the input (e.g., InputFormat.classInputSplit.classRecordReader.class)
  4. setup the output (NewOutputCollector.class)
  5. create a mapper context (MapContext.classMapper.Context.class)
  6. initialize the input, e.g.:
  7. create a SplitLineReader.class object
  8. create a HdfsDataInputStream.class object

MapTask: EXECUTION

The EXECUTION phase is performed by the run method
of the Mapper class. The user can override it, but by default it will start by calling the setup method:
this function by default does not do anything useful but can be override by the user in order to setup the Task (e.g., initialize class variables). After the setup, for each <key, value> tuple contained in the map split, the map() is
invoked. Therefore, map() receives: a key a value, and a mapper context. Using the context, a map stores
its output to a buffer.

Notice that the map split is fetched chuck by chunk (e.g., 64KB) and each chunk is split in several (key, value) tuples (e.g., using SplitLineReader.class).
This is done inside the Mapper.Context.nextKeyValue method.

When the map split has been completely processed, the run function
calls the clean method: by default, no action is performed but the user may decide to override
it.

MapTask: SPILLING

As seen in the EXECUTING phase, the map will
write (using Mapper.Context.write()) its output into a circular in-memory buffer (MapTask.MapOutputBuffer).
The size of this buffer is fixed and determined by the configuration parameter mapreduce.task.io.sort.mb (default:
100MB).

Whenever this circular buffer is almost full (mapreduce.map.
sort.spill.percent
: 80% by default), the SPILLING phase is performed (in parallel using a separate thread). Notice that if the splilling thread is too slow and the buffer is 100% full, then the map() cannot
be executed and thus it has to wait.

The SPILLING thread performs the following actions:

  1. it creates a SpillRecord and FSOutputStream (local
    filesystem)
  2. in-memory sorts the used chunk of the buffer: the output tuples are sorted by (partitionIdx, key) using a quicksort algorithm.
  3. the sorted output is split into partitions: one partition for each ReduceTask of the job (see later).
  4. Partitions are sequentially written into the local file.
How Many Reduce Tasks?

The number of ReduceTasks for the job is decided by the configuration parameter mapreduce.job.reduces.

What
is the partitionIdx associated to an output tuple?

The paritionIdx of an output tuple is the index of a partition. It is decided inside the Mapper.Context.write():

partitionIdx = (key.hashCode() & Integer.MAX_VALUE) % numReducers

It is stored as metadata in the circular buffer alongside the output tuple. The user can customize the partitioner by setting the configuration parameter mapreduce.job.partitioner.class.

When do we apply
the combiner?

If the user specifies a combiner then the SPILLING thread, before writing the tuples to the file (4), executes the combiner on the tuples contained in each partition. Basically, we:

  1. create an instance of the user Reducer.class (the one specified
    for the combiner!)
  2. create a Reducer.Context: the output will be stored on the
    local filesystem
  3. execute Reduce.run(): see Reduce Task description

The combiner typically use the same implementation of the standard reduce() function
and thus can be seen as a local reducer.

MapTask: end of EXECUTION

At the end of the EXECUTION phase, the SPILLING thread is triggered for the last time. In more detail, we:

  1. sort and spill the remaining unspilled tuples
  2. start the SHUFFLE phase

Notice that for each time the buffer was almost full, we get one spill file (SpillReciord +
output file). Each Spill file contains several partitions (segments).

MapTask: SHUFFLE

Reduce Phase

[…]

YARN and MapReduce
interaction

MapReduce&#160;图解流程的更多相关文章

  1. MapReduce的数据流程、执行流程

    MapReduce的数据流程: 预先加载本地的输入文件 经过MAP处理产生中间结果 经过shuffle程序将相同key的中间结果分发到同一节点上处理 Recude处理产生结果输出 将结果输出保存在hd ...

  2. hadoop笔记之MapReduce的运行流程

    MapReduce的运行流程 MapReduce的运行流程 基本概念: Job&Task:要完成一个作业(Job),就要分成很多个Task,Task又分为MapTask和ReduceTask ...

  3. [MapReduce_3] MapReduce 程序运行流程解析

    0. 说明 Word Count 程序运行流程解析 &&  MapReduce 程序运行流程解析 1. Word Count 程序运行流程解析 2. MapReduce 程序运行流程图

  4. Yarn源码分析之MRAppMaster上MapReduce作业处理总流程(二)

    本文继<Yarn源码分析之MRAppMaster上MapReduce作业处理总流程(一)>,接着讲述MapReduce作业在MRAppMaster上处理总流程,继上篇讲到作业初始化之后的作 ...

  5. Yarn源码分析之MRAppMaster上MapReduce作业处理总流程(一)

    我们知道,如果想要在Yarn上运行MapReduce作业,仅需实现一个ApplicationMaster组件即可,而MRAppMaster正是MapReduce在Yarn上ApplicationMas ...

  6. MapReduce的工作流程

    MapReduce的工作流程 1.客户端将每个block块切片(逻辑切分),每个切片都对应一个map任务,默认一个block块对应一个切片和一个map任务,split包含的信息:分片的元数据信息,包含 ...

  7. MapReduce 图解流程超详细解答(1)-【map阶段】

    转自:http://www.open-open.com/lib/view/open1453097241308.html 在MapReduce中,一个YARN  应用被称作一个job, MapReduc ...

  8. MapReduce 图解流程超详细解答(2)-【map阶段】

    接上一篇讲解:http://blog.csdn.net/mrcharles/article/details/50465626 map任务:溢写阶段 正如我们在执行阶段看到的一样,map会使用Mappe ...

  9. MapReduce 图解流程

    Anatomy of a MapReduce Job In MapReduce, a YARN application is called a Job. The implementation of t ...

随机推荐

  1. C++中的纯虚函数

    ---恢复内容开始--- 在C++中的一种函数申明被称之为:纯虚函数(pure virtual function).它的申明格式如下 class CShape { public: ; }; 在什么情况 ...

  2. 【习题 8-4 UVA - 11491】Erasing and Winning

    [链接] 我是链接,点我呀:) [题意] 在这里输入题意 [题解] 考虑删掉第i位. 则第i+1位就会取代第i位. 则肯定第i+1位比第i位大的话,才比较好. 则从小到大贪心删,找到第一个a[i+1] ...

  3. 几个不错的开源的.net界面控件

    转自原文 几个不错的开源的.net界面控件 (转) 几个不错的开源的.net界面控件 - zt 介绍几个自己觉得不错的几个开源的.net界面控件,不知道是否有人介绍过. DockPanel Suite ...

  4. Android学习笔记技巧之垂直和水平滚动视图

    <?xml version="1.0" encoding="utf-8"?> <ScrollView xmlns:android=" ...

  5. PHP中字符串比较的常用方法

    PHP中字符串比较的常用方法 一.总结 1.其实应该是直接等于号就可以了的 2.也可用strcmp,注意返回值 二.PHP中字符串比较的常用方法 1.按字节比较 按字节比较字符串是最常用的方法.其中可 ...

  6. dropdown下拉菜单

    <!--声明方式的下拉菜单:三个要点--> <!--1 外围容器用dropdown包裹--> <!--2 内部点击事件data-toggle--> <!--3 ...

  7. 【例题 8-7 UVA - 11572】Unique Snowflakes

    [链接] 我是链接,点我呀:) [题意] 在这里输入题意 [题解] 类似尺取法. 用set判断这段区间有没有重复的数字. 有的话,就把头节点的那个数字删掉,直到没有为止. [代码] /* 1.Shou ...

  8. 洛谷——P1042 乒乓球

    https://www.luogu.org/problem/show?pid=1042 题目背景 国际乒联现在主席沙拉拉自从上任以来就立志于推行一系列改革,以推动乒乓球运动在全球的普及.其中11分制改 ...

  9. 洛谷 P1100 高低位交换

    P1100 高低位交换 题目描述 给出一个小于2^32的正整数.这个数可以用一个32位的二进制数表示(不足32位用0补足).我们称这个二进制数的前16位为“高位”,后16位为“低位”.将它的高低位交换 ...

  10. jquery实现转盘抽奖

    jquery实现转盘抽奖 一.总结 一句话总结:这里环形转盘,环形的东西可以化成线性的,然后访问到哪个,给他加上背景为红的样式,用定时器开控制转盘的速度,函数执行的时间间隔越小,就运动的越快. 1.如 ...