Wordcount -- MapReduce example -- Mapper
Mapper maps input key/value pairs into intermediate key/value pairs.
E.g.
Input: (docID, doc)
Output: (term, 1)
Mapper Class Prototype:
Mapper<Object, Text, Text, IntWritable>
// Object:: INPUT_KEY
// Text:: INPUT_VALUE
// Text:: OUTPUT_KEY
// IntWritable:: OUTPUT_VALUE
Special Data Type for Mapper
IntWritable
A serializable and comparable object for integer.
Example:
private final static IntWritable one = new IntWritable(1);
Text
A serializable, deserializable and comparable object for string at byte level. It stores text in UTF-8 encoding.
Example:
private Text word = new Text();
Hadoop defines its own classes for general data types.
-- All "values" must have Writable interface;
-- All "keys" must have WritableComparable interface;
Map Method for Mapper
Method header
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException
// Object key:: Declare data type of input key;
// Text value:: Declare data type of input value;
// Context context:: Declare data type of output. Context is often used for output data collection.
Tokenization
// Use Java built-in StringTokenizer to split input value (document) into words:
StringTokenizer itr = new StringTokenizer(value.toString());
Building (key, value) pairs
// Loop over all words:
while (itr.hasMoreTokens()) {
// convert built-in String back to Text:
word.set(itr.nextToken());
// build (key, value) pairs into Context and emit:
context.write(word, one);
}
Map Method Summary
Mapper class produces Mapper.Context object, which comprise a series of (key, value) pairs
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
Overview of Mapper Class
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
Wordcount -- MapReduce example -- Mapper的更多相关文章
- MapReduce之Mapper类,Reducer类中的函数(转载)
Mapper类4个函数的解析 Mapper有setup(),map(),cleanup()和run()四个方法.其中setup()一般是用来进行一些map()前的准备工作,map()则一般承担主要的处 ...
- hadoop中mapreduce的mapper抽象类和reduce抽象类
mapreduce过程key 和value分别存什么值 https://blog.csdn.net/csdnliuxin123524/article/details/80191199 Mapper抽象 ...
- Wordcount -- MapReduce example -- Reducer
Reducer receives (key, values) pairs and aggregate values to a desired format, then write produced ( ...
- MapReduce数据流-Mapper
- mapreduce程序编写(WordCount)
折腾了半天.终于编写成功了第一个自己的mapreduce程序,并通过打jar包的方式运行起来了. 运行环境: windows 64bit eclipse 64bit jdk6.0 64bit 一.工程 ...
- Java编程MapReduce实现WordCount
Java编程MapReduce实现WordCount 1.编写Mapper package net.toocruel.yarn.mapreduce.wordcount; import org.apac ...
- Kettle实现MapReduce之WordCount
作者:Syn良子 出处:http://www.cnblogs.com/cssdongl 欢迎转载 抽空用kettle配置了一个Mapreduce的Word count,发现还是很方便快捷的,废话不多说 ...
- Hadoop(十七)之MapReduce作业配置与Mapper和Reducer类
前言 前面一篇博文写的是Combiner优化MapReduce执行,也就是使用Combiner在map端执行减少reduce端的计算量. 一.作业的默认配置 MapReduce程序的默认配置 1)概述 ...
- hadoop2.7之Mapper/reducer源码分析
一切从示例程序开始: 示例程序 Hadoop2.7 提供的示例程序WordCount.java package org.apache.hadoop.examples; import java.io.I ...
随机推荐
- hadoop-1.2.1运行过程中遇到的问题
在hadoop-1.2.1中运行所遇到的问题: 2014-11-14 22:43:42 在服务器上运行hadoop-1.2.1中的datanode,出现了内存占用过大,导致ssh登陆出现如下问题 ...
- js和jquery对象的相互转换
在使用jquery的过程中发现很多需要将jquery对象转成js对象的例子. Query 对象是通过 jQuery 包装DOM 对象后产生的对象.jQuery 对象是 jQuery 独有的,其可以使用 ...
- jQuery获取所有父级元素及同级元素及子元素的方法
jQuery获取所有父级元素及同级元素及子元素的方法 1.获取父级元素 $("#id").parent() 获取其父级元素 $("#id").parents() ...
- 【Oracle】Oracle安装配置、创建数据库实例及用户和连接
https://blog.csdn.net/wudiyong22/article/details/78904361 参考资料:https://www.cnblogs.com/hoobey/p/6010 ...
- C++笔记005:用面向过程和面向对象方法求解圆形面积
原创笔记,转载请注明出处! 点击[关注],关注也是一种美德~ 结束了第一个hello world程序后,我们来用面向过程和面向对象两个方法来求解圆的面积这个问题,以能够更清晰的体会面向对象和面向过程. ...
- MySQL 5.7修改root密码的4种方法
sometimes we will forget our password of root in MySQL DB server.so,there're several methods ...
- Linux 学习第四天
Linux学习第四天 一.常用命令 1.tar (压缩.解压) A.添加压缩包 tar czvf 压缩包名称.tar.gz 源文件 B.添加压缩包 tar cjvf 压缩包名称.tar.bz2 ...
- systemd的新特性及常见的systemd unit类型分析
systemd概述 )systemd是一种新的linux系统服务管理器,用于替换init系统,能够管理系统启动过程和系统服务,一旦启动起来,就将监管整个系统.在centos7系统中,PID1被syst ...
- jQuery关于复选框的基本小功能
这里是我初步学习jquery后中巨做的一个关于复选框的小功能: 点击右边选项如果勾上,对应的左边三个小项全部选中,反之全不选, 左边只要有一个没选中,右边大项就取消选中,反之左边全部选中的话,左边大项 ...
- 【转载】jquery实现勾选复选框触发事件给input赋值+回显复选框
引用:https://blog.csdn.net/rui276933335/article/details/45717461 JSP: <td class="as1"> ...