Hadoop学习之路(7)MapReduce自定义排序

本文测试文本：

tom 20 8000

nancy 22 8000

ketty 22 9000

stone 19 10000

green 19 11000

white 39 29000

socrates 30 40000

MapReduce中，根据key进行分区、排序、分组

MapReduce会按照基本类型对应的key进行排序，如int类型的IntWritable，long类型的LongWritable,Text类型，默认升序排序

为什么要自定义排序规则？现有需求，需要自定义key类型，并自定义key的排序规则，如按照人的salary降序排序，若相同，则再按age升序排序

以Text类型为例：

Text类实现了WritableComparable接口，并且有write()、readFields()和compare()方法

readFields()方法：用来反序列化操作

write()方法：用来序列化操作

所以要想自定义类型用来排序需要有以上的方法

自定义类代码：

import org.apache.hadoop.io.WritableComparable;

import java.io.DataInput;

import java.io.DataOutput;

import java.io.IOException;

public class Person implements WritableComparable<Person> {

    private String name;

    private int age;

    private int salary;

    public Person() {

    }

    public Person(String name, int age, int salary) {

        //super();

        this.name = name;

        this.age = age;

        this.salary = salary;

    }

    public String getName() {

        return name;

    }

    public void setName(String name) {

        this.name = name;

    }

    public int getAge() {

        return age;

    }

    public void setAge(int age) {

        this.age = age;

    }

    public int getSalary() {

        return salary;

    }

    public void setSalary(int salary) {

        this.salary = salary;

    }

    @Override

    public String toString() {

        return this.salary + "  " + this.age + "    " + this.name;

    }

    //先比较salary，高的排序在前；若相同，age小的在前

    public int compareTo(Person o) {

        int compareResult1= this.salary - o.salary;

        if(compareResult1 != 0) {

            return -compareResult1;

        } else {

            return this.age - o.age;

        }

    }

    //序列化，将NewKey转化成使用流传送的二进制

    public void write(DataOutput dataOutput) throws IOException {

        dataOutput.writeUTF(name);

        dataOutput.writeInt(age);

        dataOutput.writeInt(salary);

    }

    //使用in读字段的顺序，要与write方法中写的顺序保持一致

    public void readFields(DataInput dataInput) throws IOException {

        //read string

        this.name = dataInput.readUTF();

        this.age = dataInput.readInt();

        this.salary = dataInput.readInt();

    }

}

MapReuduce程序：

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.NullWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

import java.net.URI;

public class  SecondarySort {

	public static void main(String[] args) throws Exception {

		System.setProperty("HADOOP_USER_NAME","hadoop2.7");

		Configuration configuration = new Configuration();

        //设置本地运行的mapreduce程序 jar包

        configuration.set("mapreduce.job.jar","C:\\Users\\tanglei1\\IdeaProjects\\Hadooptang\\target\\com.kaikeba.hadoop-1.0-SNAPSHOT.jar");

		Job job = Job.getInstance(configuration, SecondarySort.class.getSimpleName());

		FileSystem fileSystem = FileSystem.get(URI.create(args[1]), configuration);

		if (fileSystem.exists(new Path(args[1]))) {

			fileSystem.delete(new Path(args[1]), true);

		}

		FileInputFormat.setInputPaths(job, new Path(args[0]));

		job.setMapperClass(MyMap.class);

		job.setMapOutputKeyClass(Person.class);

		job.setMapOutputValueClass(NullWritable.class);

		//设置reduce的个数

		job.setNumReduceTasks(1);

		job.setReducerClass(MyReduce.class);

		job.setOutputKeyClass(Person.class);

		job.setOutputValueClass(NullWritable.class);

		FileOutputFormat.setOutputPath(job, new Path(args[1]));

		job.waitForCompletion(true);

	}

	public static class MyMap extends

			Mapper<LongWritable, Text, Person, NullWritable> {

		//LongWritable：输入参数键类型，Text：输入参数值类型

		//Persion:输出参数键类型，NullWritable：输出参数值类型

		@Override

		//map的输出值是键值对<K,V>，NullWritable说关心V的值

		protected void map(LongWritable key, Text value,

				Context context)

				throws IOException, InterruptedException {

			//LongWritable key：输入参数键值对的键，Text value：输入参数键值对的值

			//获得一行数据，输入参数的键（距首行的位置）,Hadoop读取数据的时候逐行读取文本

			//fields:代表着文本一行的的数据

			String[] fields = value.toString().split(" ");

			// 本列中文本一行数据：nancy 22 8000

			String name = fields[0];

			//字符串转换成int

			int age = Integer.parseInt(fields[1]);

			int salary = Integer.parseInt(fields[2]);

			//在自定义类中进行比较

			Person person = new Person(name, age, salary);

			context.write(person, NullWritable.get());

		}

	}

	public static class MyReduce extends

			Reducer<Person, NullWritable, Person, NullWritable> {

		@Override

		protected void reduce(Person key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {

			context.write(key, NullWritable.get());

		}

	}

}

运行结果：

40000  30    socrates

29000  39    white

11000  19    green

10000  19    stone

9000  22    ketty

8000  20    tom

8000  22    nancy

Hadoop学习之路(7)MapReduce自定义排序的更多相关文章

Hadoop学习之路(6)MapReduce自定义分区实现
MapReduce自带的分区器是HashPartitioner 原理:先对map输出的key求hash值,再模上reduce task个数,根据结果,决定此输出kv对,被匹配的reduce任务取走. ...
Hadoop学习之路(5)Mapreduce程序完成wordcount
程序使用的测试文本数据: Dear River Dear River Bear Spark Car Dear Car Bear Car Dear Car River Car Spark Spark D ...
阿里封神谈hadoop学习之路
阿里封神谈hadoop学习之路封神 2016-04-14 16:03:51 浏览3283 评论3 发表于: 阿里云E-MapReduce >> 开源大数据周刊 hadoop 学生 s ...
《Hadoop学习之路》学习实践
(实践机器:blog-bench) 本文用作博文<Hadoop学习之路>实践过程中遇到的问题记录. 本文所学习的博文为博主“扎心了,老铁” 博文记录.参考链接https://www.cnb ...
Hadoop学习之路（十三）MapReduce的初识
MapReduce是什么首先让我们来重温一下 hadoop 的四大组件: HDFS:分布式存储系统 MapReduce:分布式计算系统 YARN:hadoop 的资源调度系统 Common:以上三大 ...
Hadoop学习（4）-- MapReduce
MapReduce是一种用于大规模数据集的并行计算编程模型,由Google提出,主要用于搜索领域,解决海量数据的计算问题.其主要思想Map(映射)和Reduce(规约)都是从函数是编程语言中借鉴而来的 ...
小强的Hadoop学习之路
本人一直在做NET开发,接触这行有6年了吧.毕业也快四年了(6年是因为大学就开始在一家小公司做门户网站,哈哈哈),之前一直秉承着学要精,就一直一门心思的在做NET(也是懒吧).最近的工作一直都和大数据 ...
我的hadoop学习之路
Hadoop实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS.HDFS有高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上. Ha ...
Hadoop学习基础之三：MapReduce
现在是讨论这个问题的不错的时机,因为最近媒体上到处充斥着新的革命所谓“云计算”的信息.这种模式需要利用大量的(低端)处理器并行工作来解决计算问题.实际上,这建议利用大量的低端处理器来构建数据中心,而不 ...

随机推荐

codeforces 1020 C Elections(枚举+贪心)
题意: 有 n个人,m个党派,第i个人开始想把票投给党派pi,而如果想让他改变他的想法需要花费ci元.你现在是党派1,问你最少花多少钱使得你的党派得票数大于其它任意党派. n,m<3000 思路 ...
CentOS 7 GNOME桌面系统网络配置
问题概述:在学习Linux系统的过程中,在WORKSTATION 14 PRO上安装了CentOS 7 Linux虚拟机,安装过程一切正常,但在应用过程中无法连接网络: 具体问题:1. 通过 ip a ...
python xlrd操作
python里面的xlrd模块详解(一) 那我就一下面积个问题对xlrd模块进行学习一下: 1.什么是xlrd模块? 2.为什么使用xlrd模块? 3.怎样使用xlrd模块? 1.什么是xlrd模 ...
使用helm安装jenkin和gitlab
一.使用服务介绍存储: 阿里云NAS k8s网络插件: calico k8s版本: 1.15.2 二.helm安装 https://www.cnblogs.com/zhangb8042/p/1020 ...
[转载]理解weight decay
http://blog.sina.com.cn/s/blog_a89e19440102x1el.html
python-nmap 使用
安装 [root@localhost ~]# yum -y install nmap [root@localhost ~]# pip install python-nmap 使用 import nma ...
Mybatis随记（一）update动态SQL
<update id="updateUser"> UPDATE user_info SET <if test="gzhOpenId != null an ...
【全集】大数据Java基础
课程介绍本课程是由猎豹移动大数据架构师,根据Java在公司大数据开发中的实际应用,精心设计和打磨的大数据必备Java课程.通过本课程的学习大数据新手能够少走弯路,以较短的时间系统掌握大数据开发必备语 ...
backgroud图片充满元素的方法
background-image: url("img/headimg.png"); height: 219px; background-size: 100% 100%; backg ...
利用url地址获取你需要的参数，window.location系列
这是我要获取url中一个code的参数值所以用了如下的方法GetQueryString(name) { let reg = new RegExp('(^|&)' + name + '=([^& ...

Hadoop学习之路(7)MapReduce自定义排序

Hadoop学习之路(7)MapReduce自定义排序的更多相关文章

随机推荐

热门专题