Mapper maps input key/value pairs into intermediate key/value pairs.

E.g.

Input: (docID, doc)

Output: (term, 1)

Mapper Class Prototype:

Mapper<Object, Text, Text, IntWritable>
// Object:: INPUT_KEY
// Text:: INPUT_VALUE
// Text:: OUTPUT_KEY
// IntWritable:: OUTPUT_VALUE

Special Data Type for Mapper

IntWritable

A serializable and comparable object for integer.

Example:

private final static IntWritable one = new IntWritable(1);

Text

A serializable, deserializable and comparable object for string at byte level. It stores text in UTF-8 encoding.

Example:

private Text word = new Text();

Hadoop defines its own classes for general data types.

-- All "values" must have Writable interface;

-- All "keys" must have WritableComparable interface;

Map Method for Mapper

Method header

public void map(Object key, Text value, Context context
) throws IOException, InterruptedException
// Object key:: Declare data type of input key;
// Text value:: Declare data type of input value;
// Context context:: Declare data type of output. Context is often used for output data collection.

Tokenization

// Use Java built-in StringTokenizer to split input value (document) into words:
StringTokenizer itr = new StringTokenizer(value.toString());

Building (key, value) pairs

// Loop over all words:
while (itr.hasMoreTokens()) {
// convert built-in String back to Text:
word.set(itr.nextToken());
// build (key, value) pairs into Context and emit:
context.write(word, one);
}

Map Method Summary

Mapper class produces Mapper.Context object, which comprise a series of (key, value) pairs

  public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}

Overview of Mapper Class

public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1);
private Text word = new Text(); public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}

Wordcount -- MapReduce example -- Mapper的更多相关文章

  1. MapReduce之Mapper类,Reducer类中的函数(转载)

    Mapper类4个函数的解析 Mapper有setup(),map(),cleanup()和run()四个方法.其中setup()一般是用来进行一些map()前的准备工作,map()则一般承担主要的处 ...

  2. hadoop中mapreduce的mapper抽象类和reduce抽象类

    mapreduce过程key 和value分别存什么值 https://blog.csdn.net/csdnliuxin123524/article/details/80191199 Mapper抽象 ...

  3. Wordcount -- MapReduce example -- Reducer

    Reducer receives (key, values) pairs and aggregate values to a desired format, then write produced ( ...

  4. MapReduce数据流-Mapper

  5. mapreduce程序编写(WordCount)

    折腾了半天.终于编写成功了第一个自己的mapreduce程序,并通过打jar包的方式运行起来了. 运行环境: windows 64bit eclipse 64bit jdk6.0 64bit 一.工程 ...

  6. Java编程MapReduce实现WordCount

    Java编程MapReduce实现WordCount 1.编写Mapper package net.toocruel.yarn.mapreduce.wordcount; import org.apac ...

  7. Kettle实现MapReduce之WordCount

    作者:Syn良子 出处:http://www.cnblogs.com/cssdongl 欢迎转载 抽空用kettle配置了一个Mapreduce的Word count,发现还是很方便快捷的,废话不多说 ...

  8. Hadoop(十七)之MapReduce作业配置与Mapper和Reducer类

    前言 前面一篇博文写的是Combiner优化MapReduce执行,也就是使用Combiner在map端执行减少reduce端的计算量. 一.作业的默认配置 MapReduce程序的默认配置 1)概述 ...

  9. hadoop2.7之Mapper/reducer源码分析

    一切从示例程序开始: 示例程序 Hadoop2.7 提供的示例程序WordCount.java package org.apache.hadoop.examples; import java.io.I ...

随机推荐

  1. Python开发工具之Sublime Text 3基于文件创建项目

    说明: 本地windows系统 本地已安装Sublime Text 3; 本地已创建python项目文件,如test,并在该文件夹下创建了虚拟环境venv(test/venv). 1.创建项目 依次鼠 ...

  2. http请求常用的状态码

    常见的http请求响应的状态码 一些常见的状态码为: 200 – 服务器成功返回网页 404 – 请求的网页不存在 503 – 服务不可用 1xx(临时响应) 表示临时响应并需要请求者继续执行操作的状 ...

  3. 【POJ 3368】Frequent values(RMQ)

    Description You are given a sequence of n integers a1 , a2 , ... , an in non-decreasing order. In ad ...

  4. 搜索 水题&&错误集锦

    引子: 本以为搜索的题目老师也不会检查,结果今天早上loli慢悠悠的说:“请同学们提交一下搜索的题目~”,顿时心旌摇曳,却也只能装作镇定自若的样子,点了点头.. 然后就开始了今天的疯狂做题,虽说题目都 ...

  5. idea 引入多项目

    1.先导入总包 2.右侧mavenmaven,选择parent的pom.xml 3.右上角“Project Structure”检查SDK

  6. SQL注入的浅尝辄止

    简单的说,SQL注入就是通过在前端页面输入SQL语句,导致系统暴露异常信息在前端页面显示,非法者通过这些异常信息获取数据库的相干信息,为攻击系统做准备.

  7. python反射怎么用

    反射: 通过字符串的形式对 对象 进行增删改查 setattr 设置某个属性的值 class A(object): def __init__(self): self.name = "sath ...

  8. zkfc的znode不存在的问题

    cd /soft/hadoop/logs/hadoop-centos-zkfc-s101.log发现: 2018-09-29 12:42:03,616 FATAL org.apache.hadoop. ...

  9. 破解Wifi

    牛刀小试:Wifi破解的原理. 准备工具:   1:Kali Linux系统 2:一块好用的无线网卡 (推荐免驱版,网上也有推荐,可以去百度上google一下) 3:WPA字典(用来爆破抓获的握手包) ...

  10. Applied Cloud Deep Semantic Recognition: Advanced Anomaly Detection(应用云深层语义识别:高级异态检测)

    亚马逊链接 引言 (by Mehdi Roopaei & Paul Rad) 异态检测与情境感知 在数据分析领域,异态检测讲的是在一个数据集中,发现到其中不符合预期模式的物体,动作,行为或事件 ...