Mapper maps input key/value pairs into intermediate key/value pairs.

E.g.

Input: (docID, doc)

Output: (term, 1)

Mapper Class Prototype:

Mapper<Object, Text, Text, IntWritable>
// Object:: INPUT_KEY
// Text:: INPUT_VALUE
// Text:: OUTPUT_KEY
// IntWritable:: OUTPUT_VALUE

Special Data Type for Mapper

IntWritable

A serializable and comparable object for integer.

Example:

private final static IntWritable one = new IntWritable(1);

Text

A serializable, deserializable and comparable object for string at byte level. It stores text in UTF-8 encoding.

Example:

private Text word = new Text();

Hadoop defines its own classes for general data types.

-- All "values" must have Writable interface;

-- All "keys" must have WritableComparable interface;

Map Method for Mapper

Method header

public void map(Object key, Text value, Context context
) throws IOException, InterruptedException
// Object key:: Declare data type of input key;
// Text value:: Declare data type of input value;
// Context context:: Declare data type of output. Context is often used for output data collection.

Tokenization

// Use Java built-in StringTokenizer to split input value (document) into words:
StringTokenizer itr = new StringTokenizer(value.toString());

Building (key, value) pairs

// Loop over all words:
while (itr.hasMoreTokens()) {
// convert built-in String back to Text:
word.set(itr.nextToken());
// build (key, value) pairs into Context and emit:
context.write(word, one);
}

Map Method Summary

Mapper class produces Mapper.Context object, which comprise a series of (key, value) pairs

  public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}

Overview of Mapper Class

public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1);
private Text word = new Text(); public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}

Wordcount -- MapReduce example -- Mapper的更多相关文章

  1. MapReduce之Mapper类,Reducer类中的函数(转载)

    Mapper类4个函数的解析 Mapper有setup(),map(),cleanup()和run()四个方法.其中setup()一般是用来进行一些map()前的准备工作,map()则一般承担主要的处 ...

  2. hadoop中mapreduce的mapper抽象类和reduce抽象类

    mapreduce过程key 和value分别存什么值 https://blog.csdn.net/csdnliuxin123524/article/details/80191199 Mapper抽象 ...

  3. Wordcount -- MapReduce example -- Reducer

    Reducer receives (key, values) pairs and aggregate values to a desired format, then write produced ( ...

  4. MapReduce数据流-Mapper

  5. mapreduce程序编写(WordCount)

    折腾了半天.终于编写成功了第一个自己的mapreduce程序,并通过打jar包的方式运行起来了. 运行环境: windows 64bit eclipse 64bit jdk6.0 64bit 一.工程 ...

  6. Java编程MapReduce实现WordCount

    Java编程MapReduce实现WordCount 1.编写Mapper package net.toocruel.yarn.mapreduce.wordcount; import org.apac ...

  7. Kettle实现MapReduce之WordCount

    作者:Syn良子 出处:http://www.cnblogs.com/cssdongl 欢迎转载 抽空用kettle配置了一个Mapreduce的Word count,发现还是很方便快捷的,废话不多说 ...

  8. Hadoop(十七)之MapReduce作业配置与Mapper和Reducer类

    前言 前面一篇博文写的是Combiner优化MapReduce执行,也就是使用Combiner在map端执行减少reduce端的计算量. 一.作业的默认配置 MapReduce程序的默认配置 1)概述 ...

  9. hadoop2.7之Mapper/reducer源码分析

    一切从示例程序开始: 示例程序 Hadoop2.7 提供的示例程序WordCount.java package org.apache.hadoop.examples; import java.io.I ...

随机推荐

  1. hadoop-1.2.1运行过程中遇到的问题

    在hadoop-1.2.1中运行所遇到的问题: 2014-11-14   22:43:42  在服务器上运行hadoop-1.2.1中的datanode,出现了内存占用过大,导致ssh登陆出现如下问题 ...

  2. js和jquery对象的相互转换

    在使用jquery的过程中发现很多需要将jquery对象转成js对象的例子. Query 对象是通过 jQuery 包装DOM 对象后产生的对象.jQuery 对象是 jQuery 独有的,其可以使用 ...

  3. jQuery获取所有父级元素及同级元素及子元素的方法

    jQuery获取所有父级元素及同级元素及子元素的方法 1.获取父级元素 $("#id").parent() 获取其父级元素 $("#id").parents() ...

  4. 【Oracle】Oracle安装配置、创建数据库实例及用户和连接

    https://blog.csdn.net/wudiyong22/article/details/78904361 参考资料:https://www.cnblogs.com/hoobey/p/6010 ...

  5. C++笔记005:用面向过程和面向对象方法求解圆形面积

    原创笔记,转载请注明出处! 点击[关注],关注也是一种美德~ 结束了第一个hello world程序后,我们来用面向过程和面向对象两个方法来求解圆的面积这个问题,以能够更清晰的体会面向对象和面向过程. ...

  6. MySQL 5.7修改root密码的4种方法

            sometimes we will forget our password of root in MySQL DB server.so,there're several methods ...

  7. Linux 学习第四天

    Linux学习第四天 一.常用命令 1.tar  (压缩.解压) A.添加压缩包  tar czvf 压缩包名称.tar.gz 源文件 B.添加压缩包  tar cjvf 压缩包名称.tar.bz2 ...

  8. systemd的新特性及常见的systemd unit类型分析

    systemd概述 )systemd是一种新的linux系统服务管理器,用于替换init系统,能够管理系统启动过程和系统服务,一旦启动起来,就将监管整个系统.在centos7系统中,PID1被syst ...

  9. jQuery关于复选框的基本小功能

    这里是我初步学习jquery后中巨做的一个关于复选框的小功能: 点击右边选项如果勾上,对应的左边三个小项全部选中,反之全不选, 左边只要有一个没选中,右边大项就取消选中,反之左边全部选中的话,左边大项 ...

  10. 【转载】jquery实现勾选复选框触发事件给input赋值+回显复选框

    引用:https://blog.csdn.net/rui276933335/article/details/45717461 JSP: <td class="as1"> ...