Hadoop学习之路(二十五)MapReduce的API使用(二)
学生成绩---增强版
数据信息
- computer,huangxiaoming,85,86,41,75,93,42,85
- computer,xuzheng,54,52,86,91,42
- computer,huangbo,85,42,96,38
- english,zhaobenshan,54,52,86,91,42,85,75
- english,liuyifei,85,41,75,21,85,96,14
- algorithm,liuyifei,75,85,62,48,54,96,15
- computer,huangjiaju,85,75,86,85,85
- english,liuyifei,76,95,86,74,68,74,48
- english,huangdatou,48,58,67,86,15,33,85
- algorithm,huanglei,76,95,86,74,68,74,48
- algorithm,huangjiaju,85,75,86,85,85,74,86
- computer,huangdatou,48,58,67,86,15,33,85
- english,zhouqi,85,86,41,75,93,42,85,75,55,47,22
- english,huangbo,85,42,96,38,55,47,22
- algorithm,liutao,85,75,85,99,66
- computer,huangzitao,85,86,41,75,93,42,85
- math,wangbaoqiang,85,86,41,75,93,42,85
- computer,liujialing,85,41,75,21,85,96,14,74,86
- computer,liuyifei,75,85,62,48,54,96,15
- computer,liutao,85,75,85,99,66,88,75,91
- computer,huanglei,76,95,86,74,68,74,48
- english,liujialing,75,85,62,48,54,96,15
- math,huanglei,76,95,86,74,68,74,48
- math,huangjiaju,85,75,86,85,85,74,86
- math,liutao,48,58,67,86,15,33,85
- english,huanglei,85,75,85,99,66,88,75,91
- math,xuzheng,54,52,86,91,42,85,75
- math,huangxiaoming,85,75,85,99,66,88,75,91
- math,liujialing,85,86,41,75,93,42,85,75
- english,huangxiaoming,85,86,41,75,93,42,85
- algorithm,huangdatou,48,58,67,86,15,33,85
- algorithm,huangzitao,85,86,41,75,93,42,85,75
数据解释
数据字段个数不固定:
第一个是课程名称,总共四个课程,computer,math,english,algorithm,
第二个是学生姓名,后面是每次考试的分数
统计需求
1、统计每门课程的参考人数和课程平均分
2、统计每门课程参考学生的平均分,并且按课程存入不同的结果文件,要求一门课程一个结果文件,并且按平均分从高到低排序,分数保留一位小数
3、求出每门课程参考学生成绩最高的学生的信息:课程,姓名和平均分
第一题
MRAvgScore1.java
- /**
- * 需求:统计每门课程的参考人数和课程平均分
- * */
- public class MRAvgScore1 {
- public static void main(String[] args) throws Exception {
- Configuration conf1 = new Configuration();
- Configuration conf2 = new Configuration();
- Job job1 = Job.getInstance(conf1);
- Job job2 = Job.getInstance(conf2);
- job1.setJarByClass(MRAvgScore1.class);
- job1.setMapperClass(AvgScoreMapper1.class);
- //job.setReducerClass(MFReducer.class);
- job1.setOutputKeyClass(Text.class);
- job1.setOutputValueClass(DoubleWritable.class);
- Path inputPath1 = new Path("D:\\MR\\hw\\work3\\input");
- Path outputPath1 = new Path("D:\\MR\\hw\\work3\\output_hw1_1");
- FileInputFormat.setInputPaths(job1, inputPath1);
- FileOutputFormat.setOutputPath(job1, outputPath1);
- job2.setMapperClass(AvgScoreMapper2.class);
- job2.setReducerClass(AvgScoreReducer2.class);
- job2.setOutputKeyClass(Text.class);
- job2.setOutputValueClass(DoubleWritable.class);
- Path inputPath2 = new Path("D:\\MR\\hw\\work3\\output_hw1_1");
- Path outputPath2 = new Path("D:\\MR\\hw\\work3\\output_hw1_end");
- FileInputFormat.setInputPaths(job2, inputPath2);
- FileOutputFormat.setOutputPath(job2, outputPath2);
- JobControl control = new JobControl("AvgScore");
- ControlledJob aJob = new ControlledJob(job1.getConfiguration());
- ControlledJob bJob = new ControlledJob(job2.getConfiguration());
- bJob.addDependingJob(aJob);
- control.addJob(aJob);
- control.addJob(bJob);
- Thread thread = new Thread(control);
- thread.start();
- while(!control.allFinished()) {
- thread.sleep(1000);
- }
- System.exit(0);
- }
- /**
- * 数据类型:computer,huangxiaoming,85,86,41,75,93,42,85
- *
- * 需求:统计每门课程的参考人数和课程平均分
- *
- * 分析:以课程名称+姓名作为key,以平均分数作为value
- * */
- public static class AvgScoreMapper1 extends Mapper<LongWritable, Text, Text, DoubleWritable>{
- @Override
- protected void map(LongWritable key, Text value,Context context)
- throws IOException, InterruptedException {
- String[] splits = value.toString().split(",");
- //拼接成要输出的key
- String outKey = splits[0]+"\t"+splits[1];
- int length = splits.length;
- int sum = 0;
- //求出成绩的总和
- for(int i=2;i<length;i++) {
- sum += Integer.parseInt(splits[i]);
- }
- //求出平均分
- double outValue = sum / (length - 2);
- context.write(new Text(outKey), new DoubleWritable(outValue));
- }
- }
- /**
- * 对第一次MapReduce输出的结果进一步计算,第一步输出结果样式为
- * math huangjiaju 82.0
- * math huanglei 74.0
- * math huangxiaoming 83.0
- * math liujialing 72.0
- * math liutao 56.0
- * math wangbaoqiang 72.0
- * math xuzheng 69.0
- *
- * 需求:统计每门课程的参考人数和课程平均分
- * 分析:以课程名称作为key,以分数作为value进行 输出
- *
- * */
- public static class AvgScoreMapper2 extends Mapper<LongWritable, Text, Text, DoubleWritable>{
- @Override
- protected void map(LongWritable key, Text value,Context context)
- throws IOException, InterruptedException {
- String[] splits = value.toString().split("\t");
- String outKey = splits[0];
- String outValue = splits[2];
- context.write(new Text(outKey), new DoubleWritable(Double.parseDouble(outValue)));
- }
- }
- /**
- * 针对同一门课程,对values进行遍历计数,看看有多少人参加了考试,并计算出平均成绩
- * */
- public static class AvgScoreReducer2 extends Reducer<Text, DoubleWritable, Text, Text>{
- @Override
- protected void reduce(Text key, Iterable<DoubleWritable> values,
- Context context) throws IOException, InterruptedException {
- int count = 0;
- double sum = 0;
- for(DoubleWritable value : values) {
- count++;
- sum += value.get();
- }
- double avg = sum / count;
- String outValue = count + "\t" + avg;
- context.write(key, new Text(outValue));
- }
- }
- }
第二题
MRAvgScore2.java
- public class MRAvgScore2 {
- public static void main(String[] args) throws Exception {
- Configuration conf = new Configuration();
- Job job = Job.getInstance(conf);
- job.setJarByClass(MRAvgScore2.class);
- job.setMapperClass(ScoreMapper3.class);
- job.setReducerClass(ScoreReducer3.class);
- job.setOutputKeyClass(StudentBean.class);
- job.setOutputValueClass(NullWritable.class);
- job.setPartitionerClass(CoursePartitioner.class);
- job.setNumReduceTasks(4);
- Path inputPath = new Path("D:\\MR\\hw\\work3\\output_hw1_1");
- Path outputPath = new Path("D:\\MR\\hw\\work3\\output_hw2_1");
- FileInputFormat.setInputPaths(job, inputPath);
- FileOutputFormat.setOutputPath(job, outputPath);
- boolean isDone = job.waitForCompletion(true);
- System.exit(isDone ? 0 : 1);
- }
- public static class ScoreMapper3 extends Mapper<LongWritable, Text, StudentBean, NullWritable>{
- @Override
- protected void map(LongWritable key, Text value,Context context)
- throws IOException, InterruptedException {
- String[] splits = value.toString().split("\t");
- double score = Double.parseDouble(splits[2]);
- DecimalFormat df = new DecimalFormat("#.0");
- df.format(score);
- StudentBean student = new StudentBean(splits[0],splits[1],score);
- context.write(student, NullWritable.get());
- }
- }
- public static class ScoreReducer3 extends Reducer<StudentBean, NullWritable, StudentBean, NullWritable>{
- @Override
- protected void reduce(StudentBean key, Iterable<NullWritable> values,Context context)
- throws IOException, InterruptedException {
- for(NullWritable nvl : values){
- context.write(key, nvl);
- }
- }
- }
- }
StudentBean.java
- public class StudentBean implements WritableComparable<StudentBean>{
- private String course;
- private String name;
- private double avgScore;
- public String getCourse() {
- return course;
- }
- public void setCourse(String course) {
- this.course = course;
- }
- public String getName() {
- return name;
- }
- public void setName(String name) {
- this.name = name;
- }
- public double getavgScore() {
- return avgScore;
- }
- public void setavgScore(double avgScore) {
- this.avgScore = avgScore;
- }
- public StudentBean(String course, String name, double avgScore) {
- super();
- this.course = course;
- this.name = name;
- this.avgScore = avgScore;
- }
- public StudentBean() {
- super();
- }
- @Override
- public String toString() {
- return course + "\t" + name + "\t" + avgScore;
- }
- @Override
- public void readFields(DataInput in) throws IOException {
- course = in.readUTF();
- name = in.readUTF();
- avgScore = in.readDouble();
- }
- @Override
- public void write(DataOutput out) throws IOException {
- out.writeUTF(course);
- out.writeUTF(name);
- out.writeDouble(avgScore);
- }
- @Override
- public int compareTo(StudentBean stu) {
- double diffent = this.avgScore - stu.avgScore;
- if(diffent == 0) {
- return 0;
- }else {
- return diffent > 0 ? -1 : 1;
- }
- }
- }
第三题
MRScore3.java
- public class MRScore3 {
- public static void main(String[] args) throws Exception {
- Configuration conf1 = new Configuration();
- Configuration conf2 = new Configuration();
- Job job1 = Job.getInstance(conf1);
- Job job2 = Job.getInstance(conf2);
- job1.setJarByClass(MRScore3.class);
- job1.setMapperClass(MRMapper3_1.class);
- //job.setReducerClass(ScoreReducer3.class);
- job1.setMapOutputKeyClass(IntWritable.class);
- job1.setMapOutputValueClass(StudentBean.class);
- job1.setOutputKeyClass(IntWritable.class);
- job1.setOutputValueClass(StudentBean.class);
- job1.setPartitionerClass(CoursePartitioner2.class);
- job1.setNumReduceTasks(4);
- Path inputPath = new Path("D:\\MR\\hw\\work3\\input");
- Path outputPath = new Path("D:\\MR\\hw\\work3\\output_hw3_1");
- FileInputFormat.setInputPaths(job1, inputPath);
- FileOutputFormat.setOutputPath(job1, outputPath);
- job2.setMapperClass(MRMapper3_2.class);
- job2.setReducerClass(MRReducer3_2.class);
- job2.setMapOutputKeyClass(IntWritable.class);
- job2.setMapOutputValueClass(StudentBean.class);
- job2.setOutputKeyClass(StudentBean.class);
- job2.setOutputValueClass(NullWritable.class);
- Path inputPath2 = new Path("D:\\MR\\hw\\work3\\output_hw3_1");
- Path outputPath2 = new Path("D:\\MR\\hw\\work3\\output_hw3_end");
- FileInputFormat.setInputPaths(job2, inputPath2);
- FileOutputFormat.setOutputPath(job2, outputPath2);
- JobControl control = new JobControl("Score3");
- ControlledJob aJob = new ControlledJob(job1.getConfiguration());
- ControlledJob bJob = new ControlledJob(job2.getConfiguration());
- bJob.addDependingJob(aJob);
- control.addJob(aJob);
- control.addJob(bJob);
- Thread thread = new Thread(control);
- thread.start();
- while(!control.allFinished()) {
- thread.sleep(1000);
- }
- System.exit(0);
- }
- public static class MRMapper3_1 extends Mapper<LongWritable, Text, IntWritable, StudentBean>{
- StudentBean outKey = new StudentBean();
- IntWritable outValue = new IntWritable();
- List<String> scoreList = new ArrayList<>();
- protected void map(LongWritable key, Text value, Context context) throws java.io.IOException ,InterruptedException {
- scoreList.clear();
- String[] splits = value.toString().split(",");
- long sum = 0;
- for(int i=2;i<splits.length;i++) {
- scoreList.add(splits[i]);
- sum += Long.parseLong(splits[i]);
- }
- Collections.sort(scoreList);
- outValue.set(Integer.parseInt(scoreList.get(scoreList.size()-1)));
- double avg = sum * 1.0/(splits.length-2);
- outKey.setCourse(splits[0]);
- outKey.setName(splits[1]);
- outKey.setavgScore(avg);
- context.write(outValue, outKey);
- };
- }
- public static class MRMapper3_2 extends Mapper<LongWritable, Text,IntWritable, StudentBean >{
- StudentBean outValue = new StudentBean();
- IntWritable outKey = new IntWritable();
- protected void map(LongWritable key, Text value, Context context) throws java.io.IOException ,InterruptedException {
- String[] splits = value.toString().split("\t");
- outKey.set(Integer.parseInt(splits[0]));
- outValue.setCourse(splits[1]);
- outValue.setName(splits[2]);
- outValue.setavgScore(Double.parseDouble(splits[3]));
- context.write(outKey, outValue);
- };
- }
- public static class MRReducer3_2 extends Reducer<IntWritable, StudentBean, StudentBean, NullWritable>{
- StudentBean outKey = new StudentBean();
- @Override
- protected void reduce(IntWritable key, Iterable<StudentBean> values,Context context)
- throws IOException, InterruptedException {
- int length = values.toString().length();
- for(StudentBean value : values) {
- outKey = value;
- }
- context.write(outKey, NullWritable.get());
- }
- }
- }
Hadoop学习之路(二十五)MapReduce的API使用(二)的更多相关文章
- Hadoop学习之路(十五)MapReduce的多Job串联和全局计数器
MapReduce 多 Job 串联 需求 一个稍复杂点的处理逻辑往往需要多个 MapReduce 程序串联处理,多 job 的串联可以借助 MapReduce 框架的 JobControl 实现 实 ...
- FastAPI 学习之路(十五)响应状态码
系列文章: FastAPI 学习之路(一)fastapi--高性能web开发框架 FastAPI 学习之路(二) FastAPI 学习之路(三) FastAPI 学习之路(四) FastAPI 学习之 ...
- Hadoop学习之路(十三)MapReduce的初识
MapReduce是什么 首先让我们来重温一下 hadoop 的四大组件: HDFS:分布式存储系统 MapReduce:分布式计算系统 YARN:hadoop 的资源调度系统 Common:以上三大 ...
- Hadoop学习之路(十二)分布式集群中HDFS系统的各种角色
NameNode 学习目标 理解 namenode 的工作机制尤其是元数据管理机制,以增强对 HDFS 工作原理的 理解,及培养 hadoop 集群运营中“性能调优”.“namenode”故障问题的分 ...
- Hadoop学习之路(十四)MapReduce的核心运行机制
概述 一个完整的 MapReduce 程序在分布式运行时有两类实例进程: 1.MRAppMaster:负责整个程序的过程调度及状态协调 2.Yarnchild:负责 map 阶段的整个数据处理流程 3 ...
- Hadoop学习之路(十九)MapReduce框架排序
流量统计项目案例 样本示例 需求 1. 统计每一个用户(手机号)所耗费的总上行流量.总下行流量,总流量 2. 得出上题结果的基础之上再加一个需求:将统计结果按照总流量倒序排序 3. 将流量汇总统计结果 ...
- Hadoop学习之路(十八)MapReduce框架Combiner分区
对combiner的理解 combiner其实属于优化方案,由于带宽限制,应该尽量map和reduce之间的数据传输数量.它在Map端把同一个key的键值对合并在一起并计算,计算规则与reduce一致 ...
- Kubernetes学习之路(十五)之Ingress和Ingress Controller
目录 一.什么是Ingress? 1.Pod 漂移问题 2.端口管理问题 3.域名分配及动态更新问题 二.如何创建Ingress资源 三.Ingress资源类型 1.单Service资源型Ingres ...
- 学习之路三十五:Android和WCF通信 - 大数据压缩后传输
最近一直在优化项目的性能,就在前几天找到了一些资料,终于有方案了,那就是压缩数据. 一丶前端和后端的压缩和解压缩流程 二丶优点和缺点 优点:①字符串的压缩率能够达到70%-80%左右 ②字符串数量更少 ...
- Python小白学习之路(十五)—【map()函数】【filter()函数】【reduce()函数】
一.map()函数 map()是 Python 内置的高阶函数 有两个参数,第一个是接收一个函数 f(匿名函数或者自定义函数都OK啦):第二个参数是一个 可迭代对象 功能是通过把函数 f 依次作用在 ...
随机推荐
- mongodb在w10安装及配置
官网网站下载mongodb 第一步:安装 默认安装一直next,直到choose setup type,系统盘空间足够大,安装在c盘就好 第二步:配置及使用 1.创建目录mongodb,及三个文件夹d ...
- 转【js & jquery】遮罩层实现禁止a、span、button等元素的鼠标事件
/*遮罩层代码 作用:通过遮罩层的方式防止表单提交次数过多 */ function MaskIt(obj){ var hoverdiv = '<div class="divMask&q ...
- LeetCode刷题第一天
1 . 两数之和 给定一个整数数组 nums 和一个目标值 target,请你在该数组中找出和为目标值的那 两个 整数,并返回他们的数组下标. 你可以假设每种输入只会对应一个答案.但是,你不能重复利用 ...
- 【SSH网上商城项目实战25】使用java email给用户发送邮件
转自: https://blog.csdn.net/eson_15/article/details/51475046 当用户购买完商品后,我们应该向用户发送一封邮件,告诉他订单已生成之类的信息, ...
- 借助 Filter 生成静态页面缓存问题
如果有些 jsp 页面,在一次 jsp 页面生成后 html 后, 就不太可能需要更新.可以使用缓存机制来解决这个问题. 解决思路如下 1. 定义一个文件夹 pagestaticize,用来存放 j ...
- 中南月赛 B题 Scoop water
Problem B: Scoop water Time Limit: 2 Sec Memory Limit: 128 MBSubmit: 261 Solved: 57[Submit][Status ...
- Elasticsearch数据类型
Elasticsearch自带的数据类型是Lucene索引的依据,也是做手动映射调整的依据.映射中主要就是针对字段设置类型以及类型相关参数.1.JSON基础类型如下:字符串:string数字:byte ...
- MySQL数据库的备份与恢复命令
1.数据库导出SQL脚本 启动MySQL服务器 输入:mysqldump -u root -p 数据库名>生成脚本文件路径 输入登录密码,回车键 例如: $ mysql.server star ...
- Lucas定理及扩展
Lucas定理 不会证明... 若\(p\)为质数 则\(C(n, m)\equiv C(n/p, m/p)*C(n\%p, m\%p)(mod\ p)\) 扩展 求 \(C(n,m)\) 模 \(M ...
- 洛谷P2792 [JSOI2008]小店购物(最小树形图)
题意 题目链接 Sol 一开始的思路:新建一个虚点向每个点连边,再加上题面中给出的边,边权均为大小*需要购买的数量 然后发现死活都过不去 看了题解才发现题目中有个细节--买了\(A\)就可以买\(B\ ...