Hadoop 电话通信清单

一、实例要求

　　现有一批电话通信清单，记录了用户A拨打某些特殊号码（如120，10086，13800138000等）的记录。需要做一个统计结果，记录拨打给用户B的所有用户A。

二、测试样例

　　样例输入：

　　file.txt：

　　13599999999 10086
　　13899999999 120
　　13944444444 1380013800
　　13722222222 1380013800
　　18800000000 120
　　13722222222 10086
　　18944444444 10086

　　样例输出：

三、算法思路

　　源文件——》Mapper(分隔原始数据，以被叫作为key，以主叫作为value)——》Reducer(把拥有相同被叫的主叫号码用|分隔汇总)——》输出到HDFS

四、程序代码

　　程序代码如下：

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public class Tel {

    public static class Map extends Mapper<LongWritable, Text, Text, Text>{

        @Override

        protected void map(LongWritable key, Text value,Mapper<LongWritable, Text, Text, Text>.Context context)

                throws IOException, InterruptedException {

            //  super.map(key, value, context);

            String line = value.toString();

            Text word = new Text();

            String [] lineSplite = line.split(" ");

            String anum = lineSplite[0];

            String bnum = lineSplite[1];

            context.write(new Text(bnum), new Text(anum));

        }

    }

    public static class Reduce extends Reducer<Text, Text, Text, Text>{

        @Override

        protected void reduce(Text key, Iterable<Text> values,Reducer<Text, Text, Text, Text>.Context context)

                throws IOException, InterruptedException {

            //  super.reduce(arg0, arg1, arg2);

            String valueString;

            String out ="";

            for(Text value: values){

                valueString=value.toString();

                out += valueString+"|";

            }

            context.write(key, new Text(out));

        }

    }

    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

        Configuration conf = new Configuration();

        String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();

        if(otherArgs.length!=2){

            System.out.println("Usage:wordcount <in> <out>");

            System.exit(2);

        }

        Job job = new Job(conf,"Tel");

        job.setJarByClass(Tel.class);

        job.setOutputKeyClass(Text.class);

        job.setOutputValueClass(Text.class);

        job.setMapperClass(Map.class);

        job.setReducerClass(Reduce.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));

        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.waitForCompletion(true);

    }

}

Hadoop 电话通信清单的更多相关文章

想从事分布式系统，计算，hadoop等方面，需要哪些基础，推荐哪些书籍？--转自知乎
作者:廖君链接:https://www.zhihu.com/question/19868791/answer/88873783来源:知乎分布式系统(Distributed System)资料 < ...
从事分布式系统，计算，hadoop
作者:廖君链接:https://www.zhihu.com/question/19868791/answer/88873783来源:知乎分布式系统(Distributed System)资料 < ...
mapreduce编程练习（二）倒排索引 Combiner的使用以及练习
问题一:请使用利用Combiner的方式:根据图示内容编写maprdeuce程序示例程序 package com.greate.learn; import java.io.IOException; ...
分布式系统(Distributed System)资料
这个资料关于分布式系统资料,作者写的太好了.拿过来以备用网址:https://github.com/ty4z2008/Qix/blob/master/ds.md 希望转载的朋友,你可以不用联系我．但 ...
[Hadoop in Action] 第7章细则手册
向任务传递定制参数获取任务待定的信息生成多个输出与关系数据库交互让输出做全局排序 1.向任务传递作业定制的参数在编写Mapper和Reducer时,通常会想让一些地方可以配 ...
[Hadoop in Action] 第6章编程实践
Hadoop程序开发的独门绝技在本地,伪分布和全分布模式下调试程序程序输出的完整性检查和回归测试日志和监控性能调优 1.开发MapReduce程序 [本地模式] 本地模式 ...
[Hadoop in Action] 第5章高阶MapReduce
链接多个MapReduce作业执行多个数据集的联结生成Bloom filter 1.链接MapReduce作业 [顺序链接MapReduce作业] mapreduce-1 | mapr ...
[Hadoop in Action] 第4章编写MapReduce基础程序
基于hadoop的专利数据处理示例 MapReduce程序框架用于计数统计的MapReduce基础程序支持用脚本语言编写MapReduce程序的hadoop流式API 用于提升性能的Combine ...
[hadoop in Action] 第3章 Hadoop组件
管理HDFS中的文件分析MapReduce框架中的组件读写输入输出数据 1.HDFS文件操作［命令行方式］ Hadoop的文件命令采取的形式为: hadoop fs -cmd < ...

随机推荐

Mariadb主从复制
前戏: mysql的基本命令复习 .启动mysql systemctl start mariadb .linux客户端连接自己 mysql -uroot -p -h 127.0.0.1 .远程链接my ...
Flask-WTForms 简单使用
安装 wtforms 2.2.1 直接上代码: app.py 文件: from flask import Flask, render_template, request from wtforms im ...
泛型List去除重复指定字段
泛型List去除重复指定字段ID var list=listTemp.Distinct(new IDComparer ()).ToList(); 重写比较的方法: public class IDCom ...
Solidity知识点集 — 溢出和下溢
合约安全增强: 溢出和下溢什么是溢出 (overflow)? 假设我们有一个 uint8, 只能存储8 bit数据.这意味着我们能存储的最大数字就是二进制 11111111 (或者说十进制的 2^ ...
MySQL 导入导出数据
导入数据 1. 使用 MySQl Workbench 界面操作导入 csv JSON 格式文件 2 使用 load data 命令 load data 命令官网教程 https://dev.mysq ...
laravel 默认所有请求带session解决办法
laravel app/Http/Kernel.php protected $middlewareGroups = [ 'web' => [ \App\Http\Middleware\Encr ...
Java 初学UDP传输
不谈理论,先举简单例子. 发送端代码: public class UDPDemo { public static void main(String[] args) throws Exception { ...
python3 第二十九章 - 内置函数之tuple相关
Python元组包含了以下内置函数序号方法及描述实例 1 len(tuple)计算元组元素个数. >>> tuple1 = ('Google', 'Baidu', 'Taoba ...
C++动态库的几点认识
1.动态库也有lib文件,称为导入库,一般大小只有几k: 2.动态库有静态调用和动态调用两种方式: 静态调用:使用.h和.lib文件动态调用: 先LoadLibrary,再GetProcAddres ...
【翻译】Flume 1.8.0 User Guide(用户指南)
翻译自官网flume1.8用户指南,原文地址:Flume 1.8.0 User Guide 篇幅限制,分为以下5篇: [翻译]Flume 1.8.0 User Guide(用户指南) [翻译]Flum ...

Hadoop 电话通信清单

Hadoop 电话通信清单的更多相关文章

随机推荐

热门专题