一:Counter计数器的使用

　　hadoop计数器:可以让开发人员以全局的视角来审查程序的运行情况以及各项指标，及时做出错误诊断并进行相应处理。

　　内置计数器（MapReduce相关、文件系统相关和作业调度相关）

　　也可以通过http://master:50030/jobdetails.jsp查看

/**

 * 度量,在运行job任务的时候产生了那些j输出.通过计数器可以观察整个计算的过程,运行时关键的指标到底是那些.可以表征程序运行时一些关键的指标.

 * 计数器 counter 统计敏感单词出现次数

 */

public class WordCountApp {

    private static final String INPUT_PATH = "hdfs://hadoop1:9000/abd";

    private static final String OUT_PATH = "hdfs://hadoop1:9000/out";

    public static void main(String[] args) {

        Configuration conf = new Configuration();

        try {

            FileSystem fileSystem = FileSystem.get(new URI(OUT_PATH), conf);

            fileSystem.delete(new Path(OUT_PATH), true);

            Job job = new Job(conf, WordCountApp.class.getSimpleName());

            job.setJarByClass(WordCountApp.class);

            FileInputFormat.setInputPaths(job, INPUT_PATH);

            job.setMapperClass(MyMapper.class);

            job.setMapOutputKeyClass(Text.class);

            job.setMapOutputValueClass(LongWritable.class);

            job.setReducerClass(MyReducer.class);

            job.setOutputKeyClass(Text.class);

            job.setOutputValueClass(LongWritable.class);

            FileOutputFormat.setOutputPath(job, new Path(OUT_PATH));

            job.waitForCompletion(true);

        } catch (Exception e) {

            e.printStackTrace();

        }

    }

    public static class MyMapper extends

            Mapper<LongWritable, Text, Text, LongWritable> {

        @Override

        protected void map(LongWritable key, Text value, Context context)

                throws IOException, InterruptedException {

            //获得计数器

            Counter counter = context.getCounter("Sensitive Words", "hello");//组名称  计数器名称

            String line = value.toString();

            if(line.contains("hello")){//假设hello为敏感词

                counter.increment(1L);

            }

            String[] splited = line.split("\t");

            for (String word : splited) {

                context.write(new Text(word), new LongWritable(1));

            }

        }

    }

    public static class MyReducer extends

            Reducer<Text, LongWritable, Text, LongWritable> {

        @Override

        protected void reduce(Text key, Iterable<LongWritable> values,

                Context context) throws IOException, InterruptedException {

            long count = 0L;

            for (LongWritable times : values) {

                count += times.get();

            }

            context.write(key, new LongWritable(count));

        }

    }

}

Counter计数器的使用

二:Combiner 的使用

　　每一个map可能会产生大量的输出，combiner的作用就是在map端对输出先做一次合并，以减少传输到reducer的数据量。

　　combiner最基本是实现本地key的归并，combiner具有类似本地的reduce功能。

　　如果不用combiner，那么，所有的结果都是reduce完成，效率会相对低下。使用combiner，先完成的map会在本地聚合，提升速度。

　　注意：Combiner的输出是Reducer的输入，Combiner绝不能改变最终的计算结果。所以从我的想法来看，Combiner只应该用于那种Reduce的输入key/value与输出key/value类型完全一致，且不影响最终结果的场景。比如累加，最大值等。

/**

 * combiner位于map和reducer中间,会处理一下数据.

 * 原来的时候记录在直接从map到了reduce,

 * 现在map端有了combiner,combiner位于map阶段的后面.数据就会经过combiner再进入reduce端

 * 加入combiner之后就会在map端分组之后进行合并.

 *

 *     为什么使用combiner

    目的:减少map端的输出,意味着shuffle时传输的数据量小,网络开销就小了.

     使用combiner有什么限制?什么时候不使用,什么时候使用?

    有一些时候使用combiner是不合适的 ,比如求平均值不合适.在进行运算的时候,运算的结果和数据的总量有关系的时候就不能使用combiner

    幂等可以使用,幂不等就不可以使用.求平均数只能根据全部的样本来求,取一部分那就不行了.

    使用combiner的时候通常和reducer的代码是一样的.

    但是combiner并不能代表reducer的作用,因为在reducer端还会把多个map的输出合并到一起.

    因为combiner只会对单个map做处理,不会对多个map的输出做处理.

 */

public class WordCountApp {

    private static final String INPUT_PATH = "hdfs://hadoop1:9000/files";

    private static final String OUT_PATH = "hdfs://hadoop1:9000/out";

    public static void main(String[] args) {

        Configuration conf = new Configuration();

        try {

            FileSystem fileSystem = FileSystem.get(new URI(OUT_PATH), conf);

            fileSystem.delete(new Path(OUT_PATH), true);

            Job job = new Job(conf, WordCountApp.class.getSimpleName());

            job.setJarByClass(WordCountApp.class);

            FileInputFormat.setInputPaths(job, INPUT_PATH);

            job.setMapperClass(MyMapper.class);

            job.setCombinerClass(MyReducer.class);//设置combiner

            job.setMapOutputKeyClass(Text.class);

            job.setMapOutputValueClass(LongWritable.class);

            //使用combiner之后,产生的结果和reducer产生的结果是一样的话,可以不要reducer

            job.setReducerClass(MyReducer.class);

            job.setOutputKeyClass(Text.class);

            job.setOutputValueClass(LongWritable.class);

            FileOutputFormat.setOutputPath(job, new Path(OUT_PATH));

            job.waitForCompletion(true);

        } catch (Exception e) {

            e.printStackTrace();

        }

    }

    public static class MyMapper extends

            Mapper<LongWritable, Text, Text, LongWritable> {

        @Override

        protected void map(LongWritable key, Text value, Context context)

                throws IOException, InterruptedException {

            String line = value.toString();

            String[] splited = line.split("\t");

            for (String word : splited) {

                context.write(new Text(word), new LongWritable(1));

            }

        }

    }

    public static class MyReducer extends

            Reducer<Text, LongWritable, Text, LongWritable> {

        @Override

        protected void reduce(Text key, Iterable<LongWritable> values,

                Context context) throws IOException, InterruptedException {

            long count = 0L;

            for (LongWritable times : values) {

                count += times.get();

            }

            context.write(key, new LongWritable(count));

        }

    }

}

Combiner的使用

三:自定义Partitioner的使用:

　　1. Partitioner是partitioner的基类，如果需要定制partitioner也需要继承该类。

　　2. HashPartitioner是mapreduce的默认partitioner。计算方法是 which reducer=(key.hashCode() & Integer.MAX_VALUE) % numReduceTasks，得到当前的目的reducer。

　　3. (例子以jar形式运行)

/**

 * partitioner:分区,指的是对输出的数据进行划分.

 * 在map端要分成多少个reducer去处理,就会分成多少个区.

 * 输出结果是手机号和非手机号.要求通过两个reduce分别处理不同的数据.一个是手机号的,一个是非手机的处理.

 * reduce中的数据是通过shuffle去map那拿的.shuffle在读取数据的时候需要知道哪些数据是给哪些reduce处理的,就需要在map端对数据进行分区.

 * 分区说白了就是对数据分区的一个索引.

 * 默认分区类:HashPartitioner

 * 在Partitioner返回的分区数一定要和reducer的数目相同.

 */

public class KpiApp {

    public static final String INPUT_PATH = "hdfs://hadoop1:9000/kpi";

    public static final String OUT_PATH = "hdfs://hadoop1:9000/kpi_out";

    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();

        FileSystem fileSystem = FileSystem.get(new URI(OUT_PATH),conf);

        if(fileSystem.isDirectory(new Path(OUT_PATH))){

            fileSystem.delete(new Path(OUT_PATH));

        }

        Job job = new Job(conf, KpiApp.class.getSimpleName());

        job.setJarByClass(KpiApp.class);

        FileInputFormat.setInputPaths(job, new Path(INPUT_PATH));

        job.setMapperClass(MyMapper.class);

        job.setPartitionerClass(MyPartitioner.class);

        job.setNumReduceTasks(2);

        job.setReducerClass(MyReducer.class);

        job.setOutputKeyClass(Text.class);

        job.setOutputValueClass(KpiWritable.class);

        FileOutputFormat.setOutputPath(job, new Path(OUT_PATH));

        job.waitForCompletion(true);

    }

    public static class MyMapper extends Mapper<LongWritable, Text, Text, KpiWritable>{

        @Override

        protected void map(LongWritable key, Text value,Context context)

                throws IOException, InterruptedException {

            String line = value.toString();//value就是输入的每一行

            String[] splited = line.split("\t");//制表符分割

            String mobileNumber = splited[1];//手机号

            Text k2 = new Text(mobileNumber);

            KpiWritable v2 = new KpiWritable(Long.parseLong(splited[6]), Long.parseLong(splited[7]), Long.parseLong(splited[8]), Long.parseLong(splited[9]));

            context.write(k2, v2);

        }

    }

    public static class MyReducer extends Reducer<Text, KpiWritable, Text, KpiWritable>{

        @Override

        protected void reduce(Text k2, Iterable<KpiWritable> v2s,Context context)throws IOException, InterruptedException {

            long upPackNum = 0L ;//上行数据包数

            long downPackNum = 0L ;//下行数据包数

            long upPayLoad = 0L ;//上行总流量

            long downPayLoad = 0L ;//下行总流量

            for (KpiWritable kpiWritable : v2s) {

                upPackNum += kpiWritable.upPackNum ;

                downPackNum += kpiWritable.downPackNum ;

                upPayLoad += kpiWritable.upPayLoad ;

                downPayLoad += kpiWritable.downPayLoad ;

            }

            KpiWritable v3 = new KpiWritable(upPackNum, downPackNum, upPayLoad, downPayLoad);

            context.write(k2, v3);

        }

    }

    //如果有一个分区就会返回一个结果,并且这个值还得是0

    //reduce的数量一定要大于等于分区的数量.

    public static class MyPartitioner extends Partitioner<Text, KpiWritable>{

        @Override

        public int getPartition(Text key, KpiWritable value, int numPartitions) {

            int length = key.toString().length();

            return length==11?0:1;

            //正常的应该是模 而不是简单的比较

//            return (int)Math.abs((Math.signum(length-11))%numPartitions) ;

        }

    }

}

class KpiWritable implements Writable{

    long upPackNum ;//上行数据包数

    long downPackNum ;//下行数据包数

    long upPayLoad ;//上行总流量

    long downPayLoad ;//下行总流量

    @Override

    public void write(DataOutput out) throws IOException {

        out.writeLong(upPackNum);

        out.writeLong(downPackNum);

        out.writeLong(upPayLoad);

        out.writeLong(downPayLoad);

    }

    //需要注意 按照什么顺序写出去,就按照什么顺序读进来,以为我们的数据写出去之后,是一个流,流是一个一维的.

    //就是从这个方向到那个方向.

    @Override

    public void readFields(DataInput in) throws IOException {

        this.upPackNum = in.readLong();

        this.downPackNum = in.readLong();

        this.upPayLoad = in.readLong();

        this.downPayLoad = in.readLong();

    }

    public KpiWritable() {

    }

    public KpiWritable(long upPackNum, long downPackNum, long upPayLoad,

            long downPayLoad) {

        super();

        set(upPackNum, downPackNum, upPayLoad, downPayLoad);

    }

    public void set(long upPackNum, long downPackNum, long upPayLoad,

            long downPayLoad) {

        this.upPackNum = upPackNum;

        this.downPackNum = downPackNum;

        this.upPayLoad = upPayLoad;

        this.downPayLoad = downPayLoad;

    }

    @Override

    public String toString() {

        return upPackNum + "\t"+downPackNum + "\t"+upPayLoad+"\t"+downPayLoad;

    }

}

自定义Partitioner的使用

四:自定义排序Sort的使用:

　　1. 在map和reduce阶段进行排序时，比较的是k2。v2是不参与排序比较的。如果要想让v2也进行排序，需要把k2和v2组装成新的类，作为k2，才能参与比较。

　　2. 分组时也是按照k2进行比较的。

/**

 * 自定义排序

 * 默认排序规则是按照k2进行排序的,v2是不参与排序的

 * 如果想让第二列也参与排序 意味着第二列都作为k2,因为我们的规则就是k2参加排序,所以这里使用自定义序列化类型

 */

public class SortApp {

    private static final String INPUT_PATH = "hdfs://hadoop1:9000/data";// 输入路径

    private static final String OUT_PATH = "hdfs://hadoop1:9000/out";// 输出路径,reduce作业输出的结果是一个目录

    public static void main(String[] args) {

        Configuration conf = new Configuration();// 配置对象

        try {

            FileSystem fileSystem = FileSystem.get(new URI(OUT_PATH), conf);

            fileSystem.delete(new Path(OUT_PATH), true);

            Job job = new Job(conf, SortApp.class.getSimpleName());// jobName:作业名称

            job.setJarByClass(SortApp.class);

            FileInputFormat.setInputPaths(job, INPUT_PATH);// 指定数据的输入

            job.setMapperClass(MyMapper.class);// 指定自定义map类

            job.setMapOutputKeyClass(NewK2.class);// 指定map输出key的类型

            job.setMapOutputValueClass(LongWritable.class);// 指定map输出value的类型

            job.setReducerClass(MyReducer.class);// 指定自定义Reduce类

            job.setOutputKeyClass(LongWritable.class);// 设置Reduce输出key的类型

            job.setOutputValueClass(LongWritable.class);// 设置Reduce输出的value类型

            FileOutputFormat.setOutputPath(job, new Path(OUT_PATH));// Reduce输出完之后,就会产生一个最终的输出,指定最终输出的位置

            job.waitForCompletion(true);// 提交给jobTracker并等待结束

        } catch (Exception e) {

            e.printStackTrace();

        }

    }

    public static class MyMapper extends

            Mapper<LongWritable, Text, NewK2, LongWritable> {

        @Override

        protected void map(LongWritable key, Text value, Context context)

                throws IOException, InterruptedException {

            String line = value.toString();

            String[] splited = line.split("\t");

            context.write(new NewK2(Long.parseLong(splited[0]),Long.parseLong(splited[1])), new LongWritable());// 把每个单词出现的次数1写出去.

        }

    }

    public static class MyReducer extends

            Reducer<NewK2, LongWritable, LongWritable, LongWritable> {

        @Override

        protected void reduce(NewK2 key, Iterable<LongWritable> values,

                Context context) throws IOException, InterruptedException {

            context.write(new LongWritable(key.first), new LongWritable(key.second));

        }

    }

    public static class NewK2 implements WritableComparable<NewK2>{

        long first ;

        long second ;

        public NewK2(long first, long second) {

            super();

            this.first = first;

            this.second = second;

        }

        //无参必须有

        public NewK2() {

            // TODO Auto-generated constructor stub

        }

        @Override

        public void write(DataOutput out) throws IOException {

            out.writeLong(this.first);

            out.writeLong(this.second);

        }

        @Override

        public void readFields(DataInput in) throws IOException {

            this.first = in.readLong() ;

            this.second = in.readLong() ;

        }

        @Override

        public int compareTo(NewK2 o) {

            long minus = this.first - o.first;

            if(minus != 0){

                return (int) minus ;

            }

            return (int)(this.second - o.second);

        }

    }

}

自定义排序Sort的使用

五:自定义分组Grop的使用:

/**

 * 自定义分组

 * 当第一列相同 要第二列的最大值

 * 默认排完序之后是分成6个组的,因为是第二列也参与比较的,那么就没法三组,只有分成第二列中找到最大值

 *

    3    3

    3    2

    3    1

    2    2

    2    1

    1    1

 */

public class GroupApp {

    private static final String INPUT_PATH = "hdfs://hadoop1:9000/data";

    private static final String OUT_PATH = "hdfs://hadoop1:9000/out";

    public static void main(String[] args) {

        Configuration conf = new Configuration();

        try {

            FileSystem fileSystem = FileSystem.get(new URI(OUT_PATH), conf);

            fileSystem.delete(new Path(OUT_PATH), true);

            Job job = new Job(conf, GroupApp.class.getSimpleName());

            job.setJarByClass(GroupApp.class);

            FileInputFormat.setInputPaths(job, INPUT_PATH);

            job.setMapperClass(MyMapper.class);

            job.setMapOutputKeyClass(NewK2.class);

            job.setMapOutputValueClass(LongWritable.class);

            job.setGroupingComparatorClass(MyGroupComparator.class);//实现一个比较键

            job.setReducerClass(MyReducer.class);

            job.setOutputKeyClass(LongWritable.class);

            job.setOutputValueClass(LongWritable.class);

            FileOutputFormat.setOutputPath(job, new Path(OUT_PATH));

            job.waitForCompletion(true);

        } catch (Exception e) {

            e.printStackTrace();

        }

    }

    public static class MyMapper extends

            Mapper<LongWritable, Text, NewK2, LongWritable> {

        @Override

        protected void map(LongWritable key, Text value, Context context)

                throws IOException, InterruptedException {

            String line = value.toString();

            String[] splited = line.split("\t");

            context.write(new NewK2(Long.parseLong(splited[0]),Long.parseLong(splited[1])), new LongWritable(Long.parseLong(splited[1])));// 把每个单词出现的次数1写出去.

        }

    }

    public static class MyReducer extends

            Reducer<NewK2, LongWritable, LongWritable, LongWritable> {

        @Override

        protected void reduce(NewK2 key, Iterable<LongWritable> values,

                Context context) throws IOException, InterruptedException {

            long min = Long.MAX_VALUE ;

            for (LongWritable longWritable : values) {

                if(longWritable.get() < min){

                    min = longWritable.get() ;

                }

            }

            context.write(new LongWritable(key.first), new LongWritable(min));

        }

    }

    public static class NewK2 implements WritableComparable<NewK2>{

        long first ;

        long second ;

        public NewK2(long first, long second) {

            super();

            this.first = first;

            this.second = second;

        }

        //无参必须有

        public NewK2() {

            // TODO Auto-generated constructor stub

        }

        @Override

        public void write(DataOutput out) throws IOException {

            out.writeLong(this.first);

            out.writeLong(this.second);

        }

        @Override

        public void readFields(DataInput in) throws IOException {

            this.first = in.readLong() ;

            this.second = in.readLong() ;

        }

        @Override

        public int compareTo(NewK2 o) {

            long minus = this.first - o.first;

            if(minus != 0){

                return (int) minus ;

            }

            return (int)(this.second - o.second);

        }

    }

    public static class MyGroupComparator implements RawComparator<NewK2>{

        @Override

        public int compare(NewK2 o1, NewK2 o2) {

            return 0;

        }

        //分组时只使用这个方法

        /**

         * b1:相当于this

         * b2:相当于o 比较的

         * s1和s2表示从很长的字节数组中从哪个位置去读取你的这个值.

         * l1和l2表示处理的值长度

         */

        @Override

        public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {

            //只需要比较第一列 long占有8个字节

            return WritableComparator.compareBytes(b1, s1, 8, b2, s2, 8);

        }

    }

}

自定义分组Grop的使用

MapReducer Counter计数器的使用,Combiner ,Partitioner,Sort,Grop的使用,的更多相关文章

python-Day3-set 集合-counter计数器-默认字典(defaultdict) -可命名元组(namedtuple)-有序字典(orderedDict)-双向队列(deque)--Queue单项队列--深浅拷贝---函数参数
上节内容回顾:C语言为什么比起他语言块,因为C 会把代码变异成机器码Pyhton 的 .pyc文件是什么python 把.py文件编译成的.pyc文件是Python的字节码, 字符串本质是字符数组, ...
counter 计数器
包含了两个属性和一个方法: 1. counter-reset 2. counter-increment 3. counter()/counters() counter-reset(主要作用就是给计 ...
CSS counter计数器(content目录序号自动递增)详解
一.CSS计数器三角关系 CSS计数器只能跟content属性在一起的时候才有作用,而content属性貌似专门用在before/after伪元素上的.于是,就有了,“计数器↔伪元素↔content属 ...
Jmeter系列（34）- 详解 Counter 计数器
如果你想从头学习Jmeter,可以看看这个系列的文章哦 https://www.cnblogs.com/poloyy/category/1746599.html 简单介绍计数器的作用:循环递增生成数 ...
计数器:counter
组成:2属性,1方法属性1: counter-reset 命名属性2: counter-increment 启动/自增方法 : counter()/counters() 调用方法 1.计数器命 ...
collections 模块（namedtuple， deque， Counter ）
基本介绍我们都知道,Python拥有一些内置的数据类型,比如str, int, list, tuple, dict等, collections模块在这些内置数据类型的基础上,提供了几个额外的数据类型 ...
css计数器详解
什么是css计数器体验更佳排版请戳原文链接:http://blog.liuxianan.com/css-counters.html 就是采用css给一些html元素自动生成编号,比如类似1.3.2这 ...
python 模块一(random,counter,defaultdict,time,wraps,reduce) 栈队列双向队列
####################总结####################### 模块:你写的py文件引用其他模块 1.import 模块 2.from 模块 import 功能,类,变量 ...
collections模块---（namedtuple、deque、OrderdDict、defaultdict、Counter）和configparser模块
在内置数据类型(dict. list.set.tuple)的基础上,collections模块还提供了几个额外的数据类型:Counter. deque.defaultdict.namedtuple 和 ...

随机推荐

HttpServletRequest接口实例化的使用
HttpServletRequ接口的使用和jsp内置对象的request对象非常类似,request对象其实就是HttpServletRequest接口的一个实例,不过气实例化的过程是自动的,无须自 ...
车牌识别LPR（六）-- 字符分割
第六篇:字符分割在知道了车牌字符的规律之后,可以根据车牌的特点对字符进行分割.一般最容易想到的方法就是根据车牌投影.像素统计特征对车牌图像进行字符分割的方法.是一种最常用的.最基本的.最简单的车牌字 ...
Java类型
Java类型本地类型描述boolean jboolean C/C++8位整型byte jbyte C/C++带符号的8位整型c ...
Drawable(5)关于从资源文件构造的Drawable不显示
要给它设置个bounds才可以 TextView noticeHeaderView; TextView headerRefreshText; ProgressBar headerRefreshPgrs ...
爬虫技术（六）-- 使用HtmlAgilityPack获取页面链接（附c#代码及插件下载）
菜鸟HtmlAgilityPack初体验...弱弱的代码... Html Agility Pack是一个开源项目,为网页提供了标准的DOM API和XPath导航.使用WebBrowser和HttpW ...
Java关键字static、final使用小结
static 1. static变量按照是否静态的对类成员变量进行分类可分两种:一种是被static修饰的变量,叫静态变量或类变量:另一种是没有被static修饰的变量,叫实例变量.两者的 ...
poj 2528 Mayor's posters(线段树)
题目:http://poj.org/problem?id=2528 题意:有一面墙,被等分为1QW份,一份的宽度为一个单位宽度.现在往墙上贴N张海报,每张海报的宽度是任意的, 但是必定是单位宽度的整数 ...
HDU 4632 Palindrome subsequence (区间DP)
题意给定一个字符串,问有多少个回文子串(两个子串可以一样). 思路注意到任意一个回文子序列收尾两个字符一定是相同的,于是可以区间dp,用dp[i][j]表示原字符串中[i,j]位置中出现的回文子序 ...
OK335xS 网络连接打印信息 hacking
/*********************************************************************** * OK335xS 网络连接打印信息 hacking ...
【多媒体封装格式详解】---MP4【4】
前面介绍过的几种格式flv.mkv.asf等.他们音视频的数据包一般都是按照文件的顺序交叉安放.你解析完头部信息后.剩下的一般就按照文件顺序一个数据包一个数据包的解析就行了.但是MP4完全不是这种概念 ...

MapReducer Counter计数器的使用,Combiner ,Partitioner,Sort,Grop的使用,