Hadoop学习之路（二十五）MapReduce的API使用（二）

学生成绩---增强版

数据信息

 computer,huangxiaoming,85,86,41,75,93,42,85
 computer,xuzheng,54,52,86,91,42
 computer,huangbo,85,42,96,38
 english,zhaobenshan,54,52,86,91,42,85,75
 english,liuyifei,85,41,75,21,85,96,14
 algorithm,liuyifei,75,85,62,48,54,96,15
 computer,huangjiaju,85,75,86,85,85
 english,liuyifei,76,95,86,74,68,74,48
 english,huangdatou,48,58,67,86,15,33,85
 algorithm,huanglei,76,95,86,74,68,74,48
 algorithm,huangjiaju,85,75,86,85,85,74,86
 computer,huangdatou,48,58,67,86,15,33,85
 english,zhouqi,85,86,41,75,93,42,85,75,55,47,22
 english,huangbo,85,42,96,38,55,47,22
 algorithm,liutao,85,75,85,99,66
 computer,huangzitao,85,86,41,75,93,42,85
 math,wangbaoqiang,85,86,41,75,93,42,85
 computer,liujialing,85,41,75,21,85,96,14,74,86
 computer,liuyifei,75,85,62,48,54,96,15
 computer,liutao,85,75,85,99,66,88,75,91
 computer,huanglei,76,95,86,74,68,74,48
 english,liujialing,75,85,62,48,54,96,15
 math,huanglei,76,95,86,74,68,74,48
 math,huangjiaju,85,75,86,85,85,74,86
 math,liutao,48,58,67,86,15,33,85
 english,huanglei,85,75,85,99,66,88,75,91
 math,xuzheng,54,52,86,91,42,85,75
 math,huangxiaoming,85,75,85,99,66,88,75,91
 math,liujialing,85,86,41,75,93,42,85,75
 english,huangxiaoming,85,86,41,75,93,42,85
 algorithm,huangdatou,48,58,67,86,15,33,85
 algorithm,huangzitao,85,86,41,75,93,42,85,75

数据解释

数据字段个数不固定：
第一个是课程名称，总共四个课程，computer，math，english，algorithm，
第二个是学生姓名，后面是每次考试的分数

统计需求

1、统计每门课程的参考人数和课程平均分

2、统计每门课程参考学生的平均分，并且按课程存入不同的结果文件，要求一门课程一个结果文件，并且按平均分从高到低排序，分数保留一位小数

3、求出每门课程参考学生成绩最高的学生的信息：课程，姓名和平均分

第一题

MRAvgScore1.java

 /**
  * 需求：统计每门课程的参考人数和课程平均分
  * */
 public class MRAvgScore1 {
 
     public static void main(String[] args) throws Exception {
 
         Configuration conf1 = new Configuration();
         Configuration conf2 = new Configuration();
 
         Job job1 = Job.getInstance(conf1);
         Job job2 = Job.getInstance(conf2);
 
         job1.setJarByClass(MRAvgScore1.class);
         job1.setMapperClass(AvgScoreMapper1.class);
         //job.setReducerClass(MFReducer.class);
 
         job1.setOutputKeyClass(Text.class);
         job1.setOutputValueClass(DoubleWritable.class);
 
         Path inputPath1 = new Path("D:\\MR\\hw\\work3\\input");
         Path outputPath1 = new Path("D:\\MR\\hw\\work3\\output_hw1_1");
 
         FileInputFormat.setInputPaths(job1, inputPath1);
         FileOutputFormat.setOutputPath(job1, outputPath1);
 
         job2.setMapperClass(AvgScoreMapper2.class);
         job2.setReducerClass(AvgScoreReducer2.class);
 
         job2.setOutputKeyClass(Text.class);
         job2.setOutputValueClass(DoubleWritable.class);
 
         Path inputPath2 = new Path("D:\\MR\\hw\\work3\\output_hw1_1");
         Path outputPath2 = new Path("D:\\MR\\hw\\work3\\output_hw1_end");
 
         FileInputFormat.setInputPaths(job2, inputPath2);
         FileOutputFormat.setOutputPath(job2, outputPath2);
 
         JobControl control = new JobControl("AvgScore");
 
         ControlledJob aJob = new ControlledJob(job1.getConfiguration());
         ControlledJob bJob = new ControlledJob(job2.getConfiguration());
 
         bJob.addDependingJob(aJob);
 
         control.addJob(aJob);
         control.addJob(bJob);
 
         Thread thread = new Thread(control);
         thread.start();
 
         while(!control.allFinished()) {
             thread.sleep(1000);
         }
         System.exit(0);
 
     }
 
     /**
      * 数据类型：computer,huangxiaoming,85,86,41,75,93,42,85
      *
      * 需求：统计每门课程的参考人数和课程平均分
      *
      * 分析：以课程名称+姓名作为key，以平均分数作为value
      * */
     public static class AvgScoreMapper1 extends Mapper<LongWritable, Text, Text, DoubleWritable>{
 
         @Override
         protected void map(LongWritable key, Text value,Context context)
                 throws IOException, InterruptedException {
 
             String[] splits = value.toString().split(",");
             //拼接成要输出的key
             String outKey = splits[0]+"\t"+splits[1];
             int length = splits.length;
             int sum = 0;
             //求出成绩的总和
             for(int i=2;i<length;i++) {
                 sum += Integer.parseInt(splits[i]);
             }
             //求出平均分
             double outValue = sum / (length - 2);
 
             context.write(new Text(outKey), new DoubleWritable(outValue));
 
         }
 
     }
 
     /**
      * 对第一次MapReduce输出的结果进一步计算，第一步输出结果样式为
      *  math    huangjiaju    82.0
      *  math    huanglei    74.0
      *    math    huangxiaoming    83.0
      *    math    liujialing    72.0
      *    math    liutao    56.0
      *    math    wangbaoqiang    72.0
      *    math    xuzheng    69.0
      *
      *    需求：统计每门课程的参考人数和课程平均分
      *    分析：以课程名称作为key，以分数作为value进行 输出
      *
      * */
     public static class AvgScoreMapper2 extends Mapper<LongWritable, Text, Text, DoubleWritable>{
 
         @Override
         protected void map(LongWritable key, Text value,Context context)
                 throws IOException, InterruptedException {
 
             String[] splits = value.toString().split("\t");
             String outKey = splits[0];
             String outValue = splits[2];
 
             context.write(new Text(outKey), new DoubleWritable(Double.parseDouble(outValue)));
         }
 
     }
 
     /**
      * 针对同一门课程，对values进行遍历计数，看看有多少人参加了考试，并计算出平均成绩
      * */
     public static class AvgScoreReducer2 extends Reducer<Text, DoubleWritable, Text, Text>{
 
         @Override
         protected void reduce(Text key, Iterable<DoubleWritable> values,
                 Context context) throws IOException, InterruptedException {
 
             int count = 0;
             double sum = 0;
             for(DoubleWritable value : values) {
                 count++;
                 sum += value.get();
             }
 
             double avg = sum / count;
             String outValue = count + "\t" + avg;
             context.write(key, new Text(outValue));
         }
 
     }
 
 }

第二题

MRAvgScore2.java

 public class MRAvgScore2 {
 
     public static void main(String[] args) throws Exception {
 
         Configuration conf = new Configuration();
 
         Job job = Job.getInstance(conf);
 
         job.setJarByClass(MRAvgScore2.class);
         job.setMapperClass(ScoreMapper3.class);
         job.setReducerClass(ScoreReducer3.class);
 
         job.setOutputKeyClass(StudentBean.class);
         job.setOutputValueClass(NullWritable.class);
 
         job.setPartitionerClass(CoursePartitioner.class);
         job.setNumReduceTasks(4);
 
         Path inputPath = new Path("D:\\MR\\hw\\work3\\output_hw1_1");
         Path outputPath = new Path("D:\\MR\\hw\\work3\\output_hw2_1");
 
         FileInputFormat.setInputPaths(job, inputPath);
         FileOutputFormat.setOutputPath(job, outputPath);
         boolean isDone = job.waitForCompletion(true);
         System.exit(isDone ? 0 : 1);
     }
 
     public static class ScoreMapper3 extends Mapper<LongWritable, Text, StudentBean, NullWritable>{
 
         @Override
         protected void map(LongWritable key, Text value,Context context)
                 throws IOException, InterruptedException {
 
             String[] splits = value.toString().split("\t");
 
             double score = Double.parseDouble(splits[2]);
             DecimalFormat df = new DecimalFormat("#.0");
             df.format(score);
 
             StudentBean student = new StudentBean(splits[0],splits[1],score);
 
             context.write(student, NullWritable.get());
 
         }
 
     }
 
     public static class ScoreReducer3 extends Reducer<StudentBean, NullWritable, StudentBean, NullWritable>{
 
         @Override
         protected void reduce(StudentBean key, Iterable<NullWritable> values,Context context)
                 throws IOException, InterruptedException {
 
             for(NullWritable nvl : values){
                 context.write(key, nvl);
             }
 
         }
     }
 }

StudentBean.java

 public class StudentBean implements WritableComparable<StudentBean>{
     private String course;
     private String name;
     private double avgScore;
 
     public String getCourse() {
         return course;
     }
     public void setCourse(String course) {
         this.course = course;
     }
     public String getName() {
         return name;
     }
     public void setName(String name) {
         this.name = name;
     }
     public double getavgScore() {
         return avgScore;
     }
     public void setavgScore(double avgScore) {
         this.avgScore = avgScore;
     }
     public StudentBean(String course, String name, double avgScore) {
         super();
         this.course = course;
         this.name = name;
         this.avgScore = avgScore;
     }
     public StudentBean() {
         super();
     }
 
     @Override
     public String toString() {
         return course + "\t" + name + "\t" + avgScore;
     }
     @Override
     public void readFields(DataInput in) throws IOException {
         course = in.readUTF();
         name = in.readUTF();
         avgScore = in.readDouble();
     }
     @Override
     public void write(DataOutput out) throws IOException {
         out.writeUTF(course);
         out.writeUTF(name);
         out.writeDouble(avgScore);
     }
     @Override
     public int compareTo(StudentBean stu) {
         double diffent =  this.avgScore - stu.avgScore;
         if(diffent == 0) {
             return 0;
         }else {
             return diffent > 0 ? -1 : 1;
         }
     }
 
 }

第三题

MRScore3.java

 public class MRScore3 {
 
     public static void main(String[] args) throws Exception {
 
         Configuration conf1 = new Configuration();
         Configuration conf2 = new Configuration();
 
         Job job1 = Job.getInstance(conf1);
         Job job2 = Job.getInstance(conf2);
 
         job1.setJarByClass(MRScore3.class);
         job1.setMapperClass(MRMapper3_1.class);
         //job.setReducerClass(ScoreReducer3.class);
 
         job1.setMapOutputKeyClass(IntWritable.class);
         job1.setMapOutputValueClass(StudentBean.class);
         job1.setOutputKeyClass(IntWritable.class);
         job1.setOutputValueClass(StudentBean.class);
 
         job1.setPartitionerClass(CoursePartitioner2.class);
 
         job1.setNumReduceTasks(4);
 
         Path inputPath = new Path("D:\\MR\\hw\\work3\\input");
         Path outputPath = new Path("D:\\MR\\hw\\work3\\output_hw3_1");
 
         FileInputFormat.setInputPaths(job1, inputPath);
         FileOutputFormat.setOutputPath(job1, outputPath);
 
         job2.setMapperClass(MRMapper3_2.class);
         job2.setReducerClass(MRReducer3_2.class);
 
         job2.setMapOutputKeyClass(IntWritable.class);
         job2.setMapOutputValueClass(StudentBean.class);
         job2.setOutputKeyClass(StudentBean.class);
         job2.setOutputValueClass(NullWritable.class);
 
         Path inputPath2 = new Path("D:\\MR\\hw\\work3\\output_hw3_1");
         Path outputPath2 = new Path("D:\\MR\\hw\\work3\\output_hw3_end");
 
         FileInputFormat.setInputPaths(job2, inputPath2);
         FileOutputFormat.setOutputPath(job2, outputPath2);
 
         JobControl control = new JobControl("Score3");
 
         ControlledJob aJob = new ControlledJob(job1.getConfiguration());
         ControlledJob bJob = new ControlledJob(job2.getConfiguration());
 
         bJob.addDependingJob(aJob);
 
         control.addJob(aJob);
         control.addJob(bJob);
 
         Thread thread = new Thread(control);
         thread.start();
 
         while(!control.allFinished()) {
             thread.sleep(1000);
         }
         System.exit(0);
 
     }
 
     public static class MRMapper3_1 extends Mapper<LongWritable, Text, IntWritable, StudentBean>{
 
         StudentBean outKey = new StudentBean();
         IntWritable outValue = new IntWritable();
         List<String> scoreList = new ArrayList<>();
 
         protected void map(LongWritable key, Text value, Context context) throws java.io.IOException ,InterruptedException {
 
             scoreList.clear();
             String[] splits = value.toString().split(",");
             long sum = 0;
 
             for(int i=2;i<splits.length;i++) {
                 scoreList.add(splits[i]);
                 sum += Long.parseLong(splits[i]);
             }
 
             Collections.sort(scoreList);
             outValue.set(Integer.parseInt(scoreList.get(scoreList.size()-1)));
 
             double avg = sum * 1.0/(splits.length-2);
             outKey.setCourse(splits[0]);
             outKey.setName(splits[1]);
             outKey.setavgScore(avg);
 
             context.write(outValue, outKey);
 
         };
     }
 
     public static class MRMapper3_2 extends Mapper<LongWritable, Text,IntWritable, StudentBean >{
 
         StudentBean outValue = new StudentBean();
         IntWritable outKey = new IntWritable();
 
         protected void map(LongWritable key, Text value, Context context) throws java.io.IOException ,InterruptedException {
 
             String[] splits = value.toString().split("\t");
             outKey.set(Integer.parseInt(splits[0]));
 
             outValue.setCourse(splits[1]);
             outValue.setName(splits[2]);
             outValue.setavgScore(Double.parseDouble(splits[3]));
 
             context.write(outKey, outValue);
 
         };
     }
 
     public static class MRReducer3_2 extends Reducer<IntWritable, StudentBean, StudentBean, NullWritable>{
 
         StudentBean outKey = new StudentBean();
 
         @Override
         protected void reduce(IntWritable key, Iterable<StudentBean> values,Context context)
                 throws IOException, InterruptedException {
 
             int length = values.toString().length();
 
             for(StudentBean value : values) {
                 outKey = value;
             }
 
             context.write(outKey, NullWritable.get());
 
         }
     }
 
 }

Hadoop学习之路（二十五）MapReduce的API使用（二）的更多相关文章

Hadoop学习之路（十五）MapReduce的多Job串联和全局计数器
MapReduce 多 Job 串联需求一个稍复杂点的处理逻辑往往需要多个 MapReduce 程序串联处理,多 job 的串联可以借助 MapReduce 框架的 JobControl 实现实 ...
FastAPI 学习之路（十五）响应状态码
系列文章: FastAPI 学习之路(一)fastapi--高性能web开发框架 FastAPI 学习之路(二) FastAPI 学习之路(三) FastAPI 学习之路(四) FastAPI 学习之 ...
Hadoop学习之路（十三）MapReduce的初识
MapReduce是什么首先让我们来重温一下 hadoop 的四大组件: HDFS:分布式存储系统 MapReduce:分布式计算系统 YARN:hadoop 的资源调度系统 Common:以上三大 ...
Hadoop学习之路（十二）分布式集群中HDFS系统的各种角色
NameNode 学习目标理解 namenode 的工作机制尤其是元数据管理机制,以增强对 HDFS 工作原理的理解,及培养 hadoop 集群运营中“性能调优”.“namenode”故障问题的分 ...
Hadoop学习之路（十四）MapReduce的核心运行机制
概述一个完整的 MapReduce 程序在分布式运行时有两类实例进程: 1.MRAppMaster:负责整个程序的过程调度及状态协调 2.Yarnchild:负责 map 阶段的整个数据处理流程 3 ...
Hadoop学习之路（十九）MapReduce框架排序
流量统计项目案例样本示例需求 1. 统计每一个用户(手机号)所耗费的总上行流量.总下行流量,总流量 2. 得出上题结果的基础之上再加一个需求:将统计结果按照总流量倒序排序 3. 将流量汇总统计结果 ...
Hadoop学习之路（十八）MapReduce框架Combiner分区
对combiner的理解 combiner其实属于优化方案,由于带宽限制,应该尽量map和reduce之间的数据传输数量.它在Map端把同一个key的键值对合并在一起并计算,计算规则与reduce一致 ...
Kubernetes学习之路（十五）之Ingress和Ingress Controller
目录一.什么是Ingress? 1.Pod 漂移问题 2.端口管理问题 3.域名分配及动态更新问题二.如何创建Ingress资源三.Ingress资源类型 1.单Service资源型Ingres ...
学习之路三十五：Android和WCF通信 - 大数据压缩后传输
最近一直在优化项目的性能,就在前几天找到了一些资料,终于有方案了,那就是压缩数据. 一丶前端和后端的压缩和解压缩流程二丶优点和缺点优点:①字符串的压缩率能够达到70%-80%左右 ②字符串数量更少 ...
Python小白学习之路（十五）—【map()函数】【filter()函数】【reduce()函数】
一.map()函数 map()是 Python 内置的高阶函数有两个参数,第一个是接收一个函数 f(匿名函数或者自定义函数都OK啦):第二个参数是一个可迭代对象功能是通过把函数 f 依次作用在 ...

随机推荐

mongodb在w10安装及配置
官网网站下载mongodb 第一步:安装默认安装一直next,直到choose setup type,系统盘空间足够大,安装在c盘就好第二步:配置及使用 1.创建目录mongodb,及三个文件夹d ...
转【js & jquery】遮罩层实现禁止a、span、button等元素的鼠标事件
/*遮罩层代码作用:通过遮罩层的方式防止表单提交次数过多 */ function MaskIt(obj){ var hoverdiv = '<div class="divMask&q ...
LeetCode刷题第一天
1 . 两数之和给定一个整数数组 nums 和一个目标值 target,请你在该数组中找出和为目标值的那两个整数,并返回他们的数组下标. 你可以假设每种输入只会对应一个答案.但是,你不能重复利用 ...
【SSH网上商城项目实战25】使用java email给用户发送邮件
转自: https://blog.csdn.net/eson_15/article/details/51475046 当用户购买完商品后,我们应该向用户发送一封邮件,告诉他订单已生成之类的信息, ...
借助 Filter 生成静态页面缓存问题
如果有些 jsp 页面,在一次 jsp 页面生成后 html 后, 就不太可能需要更新.可以使用缓存机制来解决这个问题. 解决思路如下 1. 定义一个文件夹 pagestaticize,用来存放 j ...
中南月赛 B题 Scoop water
Problem B: Scoop water Time Limit: 2 Sec Memory Limit: 128 MBSubmit: 261 Solved: 57[Submit][Status ...
Elasticsearch数据类型
Elasticsearch自带的数据类型是Lucene索引的依据,也是做手动映射调整的依据.映射中主要就是针对字段设置类型以及类型相关参数.1.JSON基础类型如下:字符串:string数字:byte ...
MySQL数据库的备份与恢复命令
1.数据库导出SQL脚本启动MySQL服务器输入:mysqldump -u root -p 数据库名>生成脚本文件路径输入登录密码,回车键例如: $ mysql.server star ...
Lucas定理及扩展
Lucas定理不会证明... 若$p$为质数则$C(n, m)\equiv C(n/p, m/p)*C(n\%p, m\%p)(mod\ p)$ 扩展求 $C(n,m)$ 模 \(M ...
洛谷P2792 [JSOI2008]小店购物(最小树形图)
题意题目链接 Sol 一开始的思路:新建一个虚点向每个点连边,再加上题面中给出的边,边权均为大小*需要购买的数量然后发现死活都过不去看了题解才发现题目中有个细节--买了$A$就可以买\(B\ ...