mapreduce (六) MapReduce实现去重 NullWritable的使用

习题来源：http://www.cnblogs.com/xia520pi/archive/2012/06/04/2534533.html
file1

2012-3-1 a

2012-3-2 b

2012-3-3 c

2012-3-4 d

2012-3-5 a

2012-3-6 b

2012-3-7 c

2012-3-3 c

file2

2012-3-1 b

2012-3-2 a

2012-3-3 b

2012-3-4 d

2012-3-5 a

2012-3-6 c

2012-3-7 d

2012-3-3 c

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.NullWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MyDedup {

    public static class LineNullMapper extends Mapper<Object, Text, Text, NullWritable>{

        public void map(Object key, Text value, Context context) throws IOException, InterruptedException{

            context.write(value, NullWritable.get());

        }

    }

    public static class SortReducer extends Reducer<Text, NullWritable, Text, NullWritable>{

        public void reduce(Text key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException{

            context.write(key, NullWritable.get());

        }

    }

如果把Iterable<NullWritable> values 替换为 NullWritable values 如果是不用Iterable迭代器的话，则不进行分组么？
结果是：只排序了并没有完成去重
2012-3-1 a

2012-3-1 b

2012-3-2 a

2012-3-2 b

2012-3-3 b

2012-3-3 c

2012-3-3 c

2012-3-3 c

2012-3-4 d

2012-3-4 d

2012-3-5 a

2012-3-5 a

2012-3-6 b

2012-3-6 c

2012-3-7 c

2012-3-7 d

public static void main(String[] args) throws Exception {

        String dir_in = "hdfs://localhost:9000/in_dedup";

        String dir_out = "hdfs://localhost:9000/out_dedup";

        Path in = new Path(dir_in);

        Path out = new Path(dir_out);

        Configuration conf = new Configuration();

        Job sortJob = new Job(conf, "my_dedup");

        sortJob.setJarByClass(MyDedup.class);

        sortJob.setInputFormatClass(TextInputFormat.class);

        sortJob.setMapperClass(LineNullMapper.class);

        sortJob.setCombinerClass(SortReducer.class);

        //countJob.setPartitionerClass(HashPartitioner.class);

        sortJob.setMapOutputKeyClass(Text.class);

        sortJob.setMapOutputValueClass(NullWritable.class);

        FileInputFormat.addInputPath(sortJob, in);

        sortJob.setReducerClass(SortReducer.class);

        // countJob.setNumReduceTasks(1);

        sortJob.setOutputKeyClass(Text.class);

        sortJob.setOutputValueClass(NullWritable.class);

        //countJob.setOutputFormatClass(SequenceFileOutputFormat.class);

        FileOutputFormat.setOutputPath(sortJob, out);

        sortJob.waitForCompletion(true);

    }

}

运行结果：
2012-3-1 a

2012-3-1 b

2012-3-2 a

2012-3-2 b

2012-3-3 b

2012-3-3 c

2012-3-4 d

2012-3-5 a

2012-3-6 b

2012-3-6 c

2012-3-7 c

2012-3-7 d

mapreduce (六) MapReduce实现去重 NullWritable的使用的更多相关文章

Hadoop阅读笔记（二）——利用MapReduce求平均数和去重
前言:圣诞节来了,我怎么能虚度光阴呢?!依稀记得,那一年,大家互赠贺卡,短短几行字,字字融化在心里:那一年,大家在水果市场,寻找那些最能代表自己心意的苹果香蕉梨,摸着冰冷的水果外皮,内心早已滚烫.这一 ...
mapreduce (五) MapReduce实现倒排索引修改版 combiner是把同一个机器上的多个map的结果先聚合一次
(总感觉上一篇的实现有问题)http://www.cnblogs.com/i80386/p/3444726.html combiner是把同一个机器上的多个map的结果先聚合一次现重新实现一个: 思路 ...
mapreduce (二) MapReduce实现倒排索引(一) combiner是把同一个机器上的多个map的结果先聚合一次
1 思路:0.txt MapReduce is simple1.txt MapReduce is powerfull is simple2.txt Hello MapReduce bye MapRed ...
MapReduce编程：单词去重
编程实现单词去重要用到NullWritable类型. NullWritable: NullWritable 是一种特殊的Writable 类型,由于它的序列化是零长度的,所以没有字节被写入流或从流中读 ...
实验六 MapReduce实验：二次排序
实验指导: 6.1 实验目的基于MapReduce思想,编写SecondarySort程序. 6.2 实验要求要能理解MapReduce编程思想,会编写MapReduce版本二次排序程序,然后将其执行 ...
hadoop —— MapReduce例子（数据去重）
参考:http://eric-gcm.iteye.com/blog/1807468 例子1: 概要:数据去重描述:将file1.txt.file2.txt中的数据合并到一个文件中的同时去掉重复的内容 ...
MapReduce(一) mapreduce基础入门
一.mapreduce入门 1.什么是mapreduce 首先让我们来重温一下 hadoop 的四大组件:HDFS:分布式存储系统MapReduce:分布式计算系统YARN: hadoop 的资源调度 ...
mapreduce (四) MapReduce实现Grep+sort
1.txt dong xi cheng xi dong cheng wo ai beijing tian an men qiche dong dong dong 2.txt dong xi cheng ...
mapreduce (三) MapReduce实现倒排索引(二)
hadoop api http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapreduce/Reducer.html 改变一下需求: ...

随机推荐

DataGridView 选中行分类： DataGridView 2015-01-22 09:07 51人阅读评论(0) 收藏
说明: (1)命名 DataGridView 名称:dgvStockFirst 行索引:recordIndex (2)设置DataGridView属性: SelectionMode=FullRowSe ...
winform DataGridView 导出到Excel表格分类： WinForm 2014-07-04 10:48 177人阅读评论(0) 收藏
public bool ExportDataGridview(DataGridView gridView) { if (gridView.Rows.Count ...
（转）百度Map API
转自 http://blog.sina.com.cn/s/blog_6079f38301013sb3.html 一.与地图操作相关的接口哦! (这些接口的开启都是写在执行成功的回调函数那里) map ...
(转载)Windows 7 Ultimate（旗舰版）SP1 32/64位官方原版下载（2011年5月12日更新版）
MSDN于2011年5月12日,最新发布简体中文Windows 7 Ultimate 旗舰版 SP1 DVD镜像安装包,分32位和64位两个版本.最新发行代号分别是:677486(32位),67740 ...
css考核点整理（十三）-jpg/png/gif等图片类型区别
jpg/png/gif等图片类型区别
iOS进度指示器——NSProgress
iOS进度指示器——NSProgress 一.引言在iOS7之前,系统一直没有提供一个完整的框架来描述任务进度相关的功能.这使得在开发中进行耗时任务进度的监听将什么麻烦,在iOS7之后,系统提供了N ...
rabbitmq 消息持久化之receive and send
二: 任务分发 &消息持久化启用多个接收端的时候如果某一个receive 关闭要保证消息有反馈是否收到 send端 #-*- coding: UTF-8 -*-import pika ...
C#,MVC视图中把枚举转成DropdownList
1.拓展EnumHelper public static class EnumHelper { // Get the value of the description attribute if the ...
spring集成 log4j + slf4j
以maven web项目为例, 首先.在pom文件引入相关依赖,如下(spring官网文档有介绍): <dependencies>  < ...
postgresql jsonb类型查询
select * from (select * from ud_order where user_id=10 and status=2unionselect * from ud_order where ...

mapreduce (六) MapReduce实现去重 NullWritable的使用

mapreduce (六) MapReduce实现去重 NullWritable的使用的更多相关文章

随机推荐

热门专题