【Hadoop】MapReduce练习：分科目等级并按分区统计学生以及人数

需求

背景：学校的学生的是一个非常大的生成数据的集体，比如每次考试的成绩

现有一个班级的学生一个月的考试成绩数据。

科目姓名分数

需求：求出每门成绩中属于甲级的学生人数和总人数

乙级的学生人数和总人数

丙级的学生人数和总人数

甲级（90及以上）乙级（80到89）丙级（0到79）

处理数据结果：

                    甲级分区

课程\t甲级\t学生1,学生2,...\t总人数

                    乙级分区

课程\t乙级\t学生1,学生2,...\t总人数

                    丙级分区

课程\t丙级\t学生1,学生2,...\t总人数

文档格式

English,liudehua,80

English,lijing,79

English,nezha,85

English,jinzha,60

English,muzha,71

English,houzi,99

English,libai,88

English,hanxin,66

English,zhugeliang,95

Math,liudehua,74

Math,lijing,72

Math,nezha,95

Math,jinzha,61

Math,muzha,37

Math,houzi,37

Math,libai,84

Math,hanxin,89

Math,zhugeliang,93

Computer,liudehua,54

Computer,lijing,73

Computer,nezha,86

Computer,jinzha,96

Computer,muzha,76

Computer,houzi,92

Computer,libai,73

Computer,hanxin,82

Computer,zhugeliang,100

代码示例

StuDriver

import org.apache.hadoop.io.Text;

import stuScore.JobUtils;

public class StuDriver {

    public static void main(String[] args) {

        String[] paths = {"F:/stu_score.txt", "F:/output"};

        JobUtils.commit(paths, true, 3, false, StuDriver.class,

                StuMapper.class, Text.class, Text.class, null, StuPartitioner.class, StuReduce.class,

                Text.class, Text.class);

    }

}

JobUtils

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.File;

import java.io.IOException;

public class JobUtils {

    private static Configuration conf;

    static {

        conf = new Configuration();

    }

    /**

     * 提交job

     *

     * @param paths        输入输出路径数组

     * @param isPartition  是否包含自定义分区类

     * @param reduceNumber reduce数量(若自定义分区为true，则此项必须>=自定义分区数)

     * @param isGroup  是否分组

     * @param params       可变参数

     */

    public static void commit(String[] paths, boolean isPartition, int reduceNumber, boolean isGroup, Class... params) {

        try {

            Job job = Job.getInstance(conf);

            job.setJarByClass(params[0]);

            job.setMapperClass(params[1]);

            job.setMapOutputKeyClass(params[2]);

            job.setMapOutputValueClass(params[3]);

            if(isGroup) {

                job.setGroupingComparatorClass(params[4]);

            }

            if (isPartition) {

                job.setPartitionerClass(params[5]);//设置自定义分区；

            }

            if (reduceNumber > 0) {

                job.setNumReduceTasks(reduceNumber);

                job.setReducerClass(params[6]);

                job.setOutputKeyClass(params[7]);

                job.setOutputValueClass(params[8]);

            } else {

                job.setNumReduceTasks(0);

            }

            FileInputFormat.setInputPaths(job, new Path(paths[0]));

            FileOutputFormat.setOutputPath(job, new Path(paths[1]));

            job.waitForCompletion(true);

        } catch (InterruptedException | ClassNotFoundException | IOException e) {

            e.printStackTrace();

        }

    }

}

StuMapper

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class StuMapper extends Mapper<LongWritable, Text, Text, Text> {

    Text k = new Text();

    Text v = new Text();

    @Override

    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

        String line = value.toString();

        String[] splits = line.split(",");

        int score = Integer.parseInt(splits[2]);

        String level;

        if (score >= 90) {

            level = "甲级";

        } else if (score < 90 && score >= 80) {

            level = "乙级";

        } else {

            level = "丙级";

        }

        k.set(splits[0] + "\t" + level);

        v.set(splits[1]);

        context.write(k, v);

    }

}

StuReduce

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class StuReduce extends Reducer<Text,Text,Text, Text> {

    @Override

    protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {

        StringBuilder builder = new StringBuilder();

        int count =0;

        for (Text v : values) {

            builder.append(v+",");

            count++;

        }

        builder.replace(builder.length()-1,builder.length(),"\t");

        builder.append(count);

        context.write(key,new Text(builder.toString()));

    }

}

StuPartitioner

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Partitioner;

public class StuPartitioner extends Partitioner<Text, Text> {

    @Override

    public int getPartition(Text text, Text text2, int i) {

        String line = text.toString();

        if(line.contains("甲级")){

            return 0;

        }else if(line.contains("乙级")){

            return 1;

        }else{

            return 2;

        }

    }

}

输出结果

【Hadoop】MapReduce练习：分科目等级并按分区统计学生以及人数的更多相关文章

Hadoop MapReduce编程 API入门系列之分区和合并（十四）
不多说,直接上代码. 代码 package zhouls.bigdata.myMapReduce.Star; import java.io.IOException; import org.apache ...
Hadoop MapReduce编程 API入门系列之薪水统计（三十一）
不多说,直接上代码. 代码 package zhouls.bigdata.myMapReduce.SalaryCount; import java.io.IOException; import jav ...
Hadoop Mapreduce分区、分组、二次排序过程详解[转]
原文地址:Hadoop Mapreduce分区.分组.二次排序过程详解[转]作者: 徐海蛟教学用途 1.MapReduce中数据流动 (1)最简单的过程: map - reduce (2) ...
Hadoop mapreduce自定义分区HashPartitioner
本文发表于本人博客. 在上一篇文章我写了个简单的WordCount程序,也大致了解了下关于mapreduce运行原来,其中说到还可以自定义分区.排序.分组这些,那今天我就接上一次的代码继续完善实现自定 ...
Hadoop MapReduce执行过程详解（带hadoop例子）
https://my.oschina.net/itblog/blog/275294 摘要: 本文通过一个例子,详细介绍Hadoop 的 MapReduce过程. 分析MapReduce执行过程 Map ...
Hadoop MapReduce 二次排序原理及其应用
关于二次排序主要涉及到这么几个东西: 在0.20.0 以前使用的是 setPartitionerClass setOutputkeyComparatorClass setOutputValueGrou ...
三种方法实现Hadoop(MapReduce)全局排序(1)
我们可能会有些需求要求MapReduce的输出全局有序,这里说的有序是指Key全局有序.但是我们知道,MapReduce默认只是保证同一个分区内的Key是有序的,但是不保证全局有序.基于此,本文提供三 ...
hadoop MapReduce
简单介绍官方给出的介绍是hadoop MR是一个用于轻松编写以一种可靠的.容错的方式在商业化硬件上的大型集群上并行处理大量数据的应用程序的软件框架. MR任务通常会先把输入的数据集切分成独立的块(可 ...
Hadoop Mapreduce 案例 wordcount+统计手机流量使用情况
mapreduce设计思想概念:它是一个分布式并行计算的应用框架它提供相应简单的api模型,我们只需按照这些模型规则编写程序,即可实现"分布式并行计算"的功能. 案例一:word ...

随机推荐

[原创]在Windows平台使用msys2、mingw64和vscode编写和调试C/C++代码
相关名词就不解释了,这里主要讲讲在vscode里怎么配,这里假设大家相关工具已经装好. 题外话:里面的大多数坑都是windows平台和linux平台的差异造成的,如果在linux平台配置,应该会顺利很 ...
Eclipse创建Servers没有Apache选项
help->install new software加入网址是http://download.eclipse.org/releases/Neon,最后一个是你eclipse的版本.得到一系列的插 ...
FFmpeg学习笔记之安装
本随笔原文出自:一叶知秋0830链接:https://www.jianshu.com/p/ab469a2ffd28 1.下载FFmpeg 先进入要存放下载文件的目录,比如要放在/Users/qinji ...
linux基础_vi和vim快捷键
(1)拷贝当前行 yy,拷贝当前行向下5行 5yy,并粘贴. (2)删除当前行 dd, 删除当前行向下的5行 5dd. (3)在文件中查找某个单词.[在命令行下使用 /+关键字,回车查找,输入n就是查 ...
ActiveMQ部署和代码尝试(二)
部署和代码尝试 1. 部署在linux 上的acvtiveMQ 要可以通过前台windows 的页面访问,必须把linux 的IP和 windows的 IP 地址配置到同一个网关下 .这种情况一般都是 ...
esxi克隆虚拟机
1.->选中虚拟机->导出(需要关闭虚拟机电源) 此时会下载下两个文件: 2.新建虚拟机 ->从OVF或OVA文件部署虚拟机然后创建虚拟机,选择第二项然后填入新虚拟机名称,并把下 ...
mysqldump表损坏问题
遇到的问题:mysqldump: Error 1194: Table 'user' is marked as crashed and should be repaired when dumping t ...
learning armbian steps(6) ----- armbian 源码分析（一）
为了深入学习armbian,前面已经学习了如何手动构建arm ubuntu rootfs. 由于armbian官方的文档比较的匮乏,所以最终还是决定通过其编译的过程来深入地学习. 为了快速度深入地学习 ...
asmlinkage的用法
转自:https://www.cnblogs.com/china_blue/archive/2010/01/15/1648523.html https://blog.csdn.net/liujiaoy ...
POJ 1741 Tree ——（树分治）
思路参考于:http://blog.csdn.net/yang_7_46/article/details/9966455,不再赘述. 复杂度:找树的重心然后分治复杂度为logn,每次对距离数组dep排 ...