In the previous post, we've illustrated how Hadoop MapReduce prepares input for Mappers. Long story short, InputSplit convert physical storaged data into many logical unit, and each one will be processed by a RecordReader, who will generate input (K,V) pairs for Mapper. I used to be confused about how (K,V) pairs are generated, but actually it just breaks a 128M file into single lines (just an example), and each line is a (K,V) pair. A mapper process these pairs one by one untill the end of the file.

A user-defined mapper, takes input (K,V) pairs from RecordReader, generate new key/value pair set at the output side.Usually we call the new (K,V) pairs as 'immediate (K,V) pairs'. For example: in the post (Using MapReduce on Azure), we define a Mapper as following:

#!/usr/bin/env python
"""mapper.py""" import sys # input comes from STDIN (standard input)
for line in sys.stdin:
# remove leading and trailing whitespace
line = line.strip()
# split the line into words
words = line.split()
# increase counters
for word in words:
# write the results to STDOUT (standard output);
# what we output here will be the input for the
# Reduce step, i.e. the input for reducer.py
#
# tab-delimited; the trivial word count is 1
print '%s\t%s' % (word, 1)

We can see, this mapper just breaks a line into words set, and ouput immediate (K,V) pairs, in which key is the word and value is 1.

A funny but intuitive illustration for this process is cutting a car into pieces:

MapReduce(2): How does Mapper work的更多相关文章

  1. Hadoop(十七)之MapReduce作业配置与Mapper和Reducer类

    前言 前面一篇博文写的是Combiner优化MapReduce执行,也就是使用Combiner在map端执行减少reduce端的计算量. 一.作业的默认配置 MapReduce程序的默认配置 1)概述 ...

  2. MapReduce从输入文件到Mapper处理之间的过程

    1.MapReduce代码入口 FileInputFormat.setInputPaths(job, new Path(input)); //设置MapReduce输入格式 job.waitForCo ...

  3. 027_编写MapReduce的模板类Mapper、Reducer和Driver

    模板类编写好后写MapReduce程序,的模板类编写好以后只需要改参数就行了,代码如下: package org.dragon.hadoop.mr.module; import java.io.IOE ...

  4. MapReduce :基于 FileInputFormat 的 mapper 数量控制

    本篇分两部分,第一部分分析使用 java 提交 mapreduce 任务时对 mapper 数量的控制,第二部分分析使用 streaming 形式提交 mapreduce 任务时对 mapper 数量 ...

  5. Mapreduce报错:java.lang.ClassNotFoundException: Class Mapper not found

    mapreduce找不到mapper类 解决方法: 开始自己用的是mapreduce自己打包的一种方法: job.setJarByClass(StandardJob.class); 但这样一直在报错: ...

  6. MapReduce在Shuffle阶段按Mapper输出的Value进行排序

    ZKe ----------------- 在MapReduce框架中,Mapper的输出在Shuffle阶段,根据Key值分组之后,还将会根据Key值进行排序,因此Reducer的输出我们看到的结果 ...

  7. [Hadoop in Action] 第4章 编写MapReduce基础程序

    基于hadoop的专利数据处理示例 MapReduce程序框架 用于计数统计的MapReduce基础程序 支持用脚本语言编写MapReduce程序的hadoop流式API 用于提升性能的Combine ...

  8. MapReduce实例浅析

    在文章<MapReduce原理与设计思想>中,详细剖析了MapReduce的原理,这篇文章则通过实例重点剖析MapReduce 本文地址:http://www.cnblogs.com/ar ...

  9. MapReduce的模式、算法和用例

    英文原文:<MapReduce Patterns, Algorithms, and Use Cases> https://highlyscalable.wordpress.com/2012 ...

随机推荐

  1. HTMLTestRunner_PY3脚本代码

    HTMLTestRunner_PY3.py文件代码如下: # -*- coding: utf-8 -*- """ A TestRunner for use with th ...

  2. 【CF321E】+【bzoj5311】

    决策单调性 + WQS二分 贴个代码先... //by Judge #pragma GCC optimize("Ofast") #include<bits/stdc++.h& ...

  3. Flutter 初探 -

    flutter 安装 经过许久的关注,及最近google算是真正地推行flutter时,加上掘金小册也有相应的教程,我知道自己得跟着这一波潮流学习了,不然迟早会面临着小程序的危(大家都会了就你不会), ...

  4. JS中类或对象的定义说明

    本篇文章主要是对JS中类或对象的定义进行说明介绍.我们知道,JS是面向对象的.谈到面向对象,就不可避免的要涉及类的概念.一般像c#,java这些强类型语言都有固定的定义类的语法.而JS的不同之处在于它 ...

  5. Oracle 数字转为字符串 to_char()

    格式:TO_CHAR(number,'format_model') 9 -->Represents a number 0 --> Forces a zero to be displayed ...

  6. VPS建站

    参考腾讯云的教程 选择了 LAMP的方案,即Linux + Apache + MySQL + Php 参考链接 https://cloud.tencent.com/edu/learning/cours ...

  7. php函数漏洞

    1.ereg — 正则表达式匹配 此函数遇 %00 截断. <?php $a = $_GET['pwd']; var_dump(ereg ("^[0-9]+$", $a)); ...

  8. 八个JS中你见过的类型。

    1.布尔类型 布尔值只能为  true 或者 false ,其他的会报错 let bool: boolean = false; bool = true; // bool = 123; // error ...

  9. windowserver 常用命令

    1.查看端口占用: netstat -ano | findstr "服务端口号"2.查看程序运行id: tasklist | findstr  nginx 3.杀死进程  tskk ...

  10. docker设置proxy

    该方法是持久化的,修改后会一直生效.该方法覆盖了默认的docker.service文件. 1. 为docker服务创建一个内嵌的systemd目录 mkdir -p /etc/systemd/syst ...