mapreduce数据处理—

接上篇https://www.cnblogs.com/sengzhao666/p/11850849.html

2、数据处理：

·统计最受欢迎的视频/文章的Top10访问次数（id）

·按照地市统计最受欢迎的Top10课程（ip）

·按照流量统计最受欢迎的Top10课程（traffic）

分两步：

统计；排序

初始文件部分样例：

1.192.25.84    2016-11-10-00:01:14    10    54    video    5551

1.194.144.222    2016-11-10-00:01:20    10    54    video    3589

1.194.187.2    2016-11-10-00:01:05    10    54    video    2212

1.203.177.243    2016-11-10-00:01:18    10    6050    video    7361

1.203.177.243    2016-11-10-00:01:19    10    72    video    7361

1.203.177.243    2016-11-10-00:01:22    10    6050    video    7361

1.30.162.63    2016-11-10-00:01:46    10    54    video    3639

1.84.205.195    2016-11-10-00:01:12    10    54    video    1412

统计：

package priv.tzk.mapreduce.dataProcess.visits;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class DataVisits {

    public static String INPUT_PATH="/home/hadoop/out";

    public static String OUTPUT_PATH="hdfs://localhost:9000/mapReduce/mymapreduce1/out";    

    public static class Map extends Mapper<Object,Text,Text,IntWritable>{    //将输入输出作为string类型，对应Text类型

            private static Text newKey=new Text();    //每一行作为一个数据

            public void map(Object key, Text value, Context context) throws IOException, InterruptedException{

                String line=value.toString();//转为字符串类型

                //System.out.println(line);

                if(!("".equals(line)))//增加控制语句，使得line为”“时能够停止。否则不符合reduce接受的数据不会执行reduce

                {

                    String arr[]=line.split("\t");//splite是按照输入的值拆分成数组

                    newKey.set(arr[5]);

                    int click=1;

                    context.write(newKey,new IntWritable(click));

                    //System.out.println(newKey+"  "+new IntWritable(click));

                }

             }

         }   

    public static class Reduce extends Reducer<Text,IntWritable,Text,IntWritable>{

        public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException,InterruptedException{

                 int count=0;

                 for(IntWritable val:values) {

                     //Iterable迭代器

                     count++;

                 }

                 context.write(key,new IntWritable(count));

                 //System.out.println("reduceStart");

             }

        }    

        public static void main(String[] args) throws IOException,ClassNotFoundException,InterruptedException{

            Configuration conf=new Configuration();

            System.out.println("start");

            Job job=Job.getInstance(conf);

            job.setJobName("MyAverage");

            //Job job =new Job(conf,"MyAverage");

            job.setJarByClass(DataVisits.class);

            job.setMapperClass(Map.class);

            job.setReducerClass(Reduce.class);

            job.setOutputKeyClass(Text.class);

            job.setOutputValueClass(IntWritable.class);//设置map的输出格式

            job.setInputFormatClass(TextInputFormat.class);

            job.setOutputFormatClass(TextOutputFormat.class);

            Path outputpath=new Path(OUTPUT_PATH);

            Path inputpath=new Path(INPUT_PATH);

            FileInputFormat.addInputPath(job,inputpath );

            FileOutputFormat.setOutputPath(job,outputpath);

            boolean flag = job.waitForCompletion(true);

            System.out.println(flag);

            System.exit(flag? 0 : 1);

         }

}

统计部分结果样例：

针对统计结果排序：

package priv.tzk.mapreduce.dataProcess.visits;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.io.WritableComparable;

import org.apache.hadoop.io.WritableComparator;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class visitsSort {

    public static String INPUT_PATH="/home/hadoop/visits_out";

    public static String OUTPUT_PATH="hdfs://localhost:9000/mapReduce/mymapreduce1/out1";    

    public static class Sort extends WritableComparator {

        public Sort(){

        //这里就是看你map中填的输出key是什么数据类型，就给什么类型

        super(IntWritable.class,true);

        }

        @Override

        public int compare(WritableComparable a, WritableComparable b) {

        return -a.compareTo(b);//加个负号就是倒序，把负号去掉就是正序。

        }

    }

    public static class Map extends Mapper<Object,Text,IntWritable,Text>{    //将输入输出作为string类型，对应Text类型

            private static Text mid=new Text();

            private static IntWritable num=new IntWritable();

            public void map(Object key, Text value, Context context) throws IOException, InterruptedException{

                String line=value.toString();//转为字符串类型

                if(!("".equals(line)))//增加控制语句，使得line为”“时能够停止。否则不符合reduce接受的数据不会执行reduce

                {

                    String arr[]=line.split("\t");//splite是按照输入的值拆分成数组

                    mid.set(arr[0]);

                    num.set(Integer.parseInt(arr[1]));

                    context.write(num,mid);

                }

             }

         }

         //MapReduce框架默认排序规则。它是按照key值进行排序的

    public static class Reduce extends Reducer<IntWritable,Text,IntWritable,Text>{

        private static int i=0;

        public void reduce(IntWritable key,Iterable<Text> values,Context context) throws IOException,InterruptedException{      

                 for(Text val:values) {

                     //Iterable迭代器

                     if(i<10) {

                         i++;

                         context.write(key, val);

                     }

                 }

                 //System.out.println("reduceStart");

             }

        }    

        public static void main(String[] args) throws IOException,ClassNotFoundException,InterruptedException{

            Configuration conf=new Configuration();

            System.out.println("start");

            Job job=Job.getInstance(conf);

            //Job job =new Job(conf,"");

            job.setJarByClass(visitsSort.class);

            job.setMapperClass(Map.class);

            job.setReducerClass(Reduce.class);

            job.setSortComparatorClass(Sort.class);

            //设置map的输出格式

            job.setOutputKeyClass(IntWritable.class);

            job.setOutputValueClass(Text.class);

            job.setInputFormatClass(TextInputFormat.class);

            job.setOutputFormatClass(TextOutputFormat.class);

            Path outputpath=new Path(OUTPUT_PATH);

            Path inputpath=new Path(INPUT_PATH);

            FileInputFormat.addInputPath(job,inputpath );

            FileOutputFormat.setOutputPath(job,outputpath);

            boolean flag = job.waitForCompletion(true);

            System.out.println(flag);

            System.exit(flag? 0 : 1);

         }

}

排序结果：

31    2402

19    1309

18    3078

18    2801

16    5683

16    3369

16    1336

16    4018

15    11239

15    13098

mapreduce数据处理——统计排序的更多相关文章

一脸懵逼学习Hadoop中的序列化机制——流量求和统计MapReduce的程序开发案例——流量求和统计排序
一:序列化概念序列化(Serialization)是指把结构化对象转化为字节流.反序列化(Deserialization)是序列化的逆过程.即把字节流转回结构化对象.Java序列化(java.io. ...
Hadoop学习笔记—11.MapReduce中的排序和分组
一.写在之前的 1.1 回顾Map阶段四大步骤首先,我们回顾一下在MapReduce中,排序和分组在哪里被执行: 从上图中可以清楚地看出,在Step1.4也就是第四步中,需要对不同分区中的数据进行排 ...
MapReduce 单词统计案例编程
MapReduce 单词统计案例编程一.在Linux环境安装Eclipse软件 1. 解压tar包下载安装包eclipse-jee-kepler-SR1-linux-gtk-x86_64.ta ...
mapreduce 实现数子排序
设计思路: 使用mapreduce的默认排序,按照key值进行排序的,如果key为封装int的IntWritable类型,那么MapReduce按照数字大小对key排序,如果key为封装为String ...
Hadoop学习笔记： MapReduce二次排序
本文给出一个实现MapReduce二次排序的例子 package SortTest; import java.io.DataInput; import java.io.DataOutput; impo ...
(转)MapReduce二次排序
一.概述 MapReduce框架对处理结果的输出会根据key值进行默认的排序,这个默认排序可以满足一部分需求,但是也是十分有限的.在我们实际的需求当中,往往有要对reduce输出结果进行二次排序的需求 ...
Mysql 分别按月, 日为组group，进行统计排序order
在数据库中我们经经常使用sql语句对表进行简单的统计排序,对于日期字段.我们能够简单的对其进行order. 对于复杂一点的能够按日期中的年.月,日分别进行group,order. 按年份进行group ...
详细讲解MapReduce二次排序过程
我在15年处理大数据的时候还都是使用MapReduce, 随着时间的推移, 计算工具的发展, 内存越来越便宜, 计算方式也有了极大的改变. 到现在再做大数据开发的好多同学都是直接使用spark, hi ...
【Cloud Computing】Hadoop环境安装、基本命令及MapReduce字数统计程序
[Cloud Computing]Hadoop环境安装.基本命令及MapReduce字数统计程序 1.虚拟机准备 1.1 模板机器配置 1.1.1 主机配置 IP地址:在学校校园网Wifi下连接下 V ...

随机推荐

php通过curl发送XML数据，并获取XML数据
php编程中经常会用到用xml格式传送数据,如调用微信等第三方接口经常用到,这里演示下php以curl形式发送xml,并通过服务器接收一.发送xml数据 -- postXml.php <?ph ...
JavaWeb 之 JSTL 标签
JSTL 标签库一.概述 1.概念 JSTL : JavaServer Pages Tag Library JSP标准标签库. 是由 Apache 组织提供的开源的免费的 jsp 标签. 2.作用 ...
SpringBoot集成MyBatis的分页插件PageHelper--详细步骤
1.pom中添加依赖包  <dependency> <groupId>com.github.pagehelper< ...
yum仓库的部署
https://segmentfault.com/a/1190000013968371 私有yum仓库在企业中的应用还是比较广泛,有方便.快捷.灵活等优势.如某公司安全部门不允许大批量的主机连接互联网 ...
Chrome浏览器内部协议Chrome://收集
Chromium 采用 Chrome:// 协议开头的形式, 规定了一系列的内部协议, 有的用来显示数据, 有的用来实现一些功能, 但对普通用户进行了屏蔽.在Chrome浏览器地址栏直接访问就好了! ...
Caused by SSLError("Can’t connect to HTTPS URL because the SSL module is not available)
window7系统: 今天刚安装的anaconda(开源的Python包管理器),把原来的python3和python2都给卸载了,结果运行爬虫程序的时候报错: Caused by SSLError( ...
Kotlin字节码生成机制详尽分析
通过注解修改Kotlin的class文件名: 对于Kotlin文件在编译之后生成的class文件名默认是有一定规则的,比如: 而其实这个生成字节码的文件名称是可以被改的,之前https://www.c ...
sql查询时增加自动编号和分页
查询时加序号 a:没有主键的情形: ,) as iid,* into #tmp from TableName Select * from #tmp Drop table #tmp b:有主键的情形: ...
waitpid()函数
waitpid函数作用同于wait,但可指定pid进程清理,可以不阻塞. pid_t waitpid(pid_t pid,int *status,int options);成功:返回清理掉的子进程I ...
HDU - 3535：AreYouBusy （分组背包）
题意:给你n个工作集合,给你T的时间去做它们.给你m和s,说明这个工作集合有m件事可以做,它们是s类的工作集合(s=0,1,2,s=0说明这m件事中最少得做一件,s=1说明这m件事中最多只能做一件,s ...

mapreduce数据处理——统计排序

mapreduce数据处理——统计排序的更多相关文章

随机推荐

热门专题