Hadoop基础-MapReduce的Partitioner用法案例

　　　　　　　　　　　　　　　　　　　　　　Hadoop基础-MapReduce的Partitioner用法案例

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　作者：尹正杰

一.Partitioner关键代码剖析

1>.返回的分区号

2>.partitioner默认是通过hash方法实现的

　　返回的是一个int类型的数组：

3>.HashPartitioner

　　接下来咱们就看看Partition在MapReduce的作用是什么吧。

二.Partitioner在MapReduce的位置

1>.什么是Partitioner

　　通过查看Partitioner的源码估计你也发现了Partitioner跟hash有关系，那么它到底是什么呢？能帮我们做什么事情呢？说白了它就是帮我们实现分发Key和value的一个过程，它负责将同一个key发给同一个Reduce。还记得我们之前说的Combiner吗？Combiner相当于Map端的Reduce，用于减少网络间分发。说直白点就是将key进行折叠的一个操作，将一个key的初始值为1方给Reduce端100万次，和将key进行折叠后形成key的初始值为100万，在发送给Reduce可以街上带宽资源，还可以减少网络带宽。而这个分发过程就是Partitioner程序完成的，当然我们是不定义Partitioner也不会报错。

2>.Partitioner在MapReduce的位置

　　接下来我们大致看一下Partitioner在MapReduce的大致位置，如下：

　　接下来我们就一起体验一下设置Partitioner和不设置Partitioner的明显区别。

三.未定义Partitioner的情况

1>.测试数据文件内容（partitioner.txt）

yinzhengjie    1

golang    2

python    3

shell    4

java    5

linux    6

vbs    7

c++    8

css    9

html    10

javascript    11

尹正杰    12

yinzhengjie    13

golang    14

python    15

shell    16

java    17

linux    18

vbs    19

c++    20

css    21

html    22

javascript    23

尹正杰    24

2>.KVMapper.java 文件内容

 /*

 @author :yinzhengjie

 Blog:http://www.cnblogs.com/yinzhengjie/tag/Hadoop%E8%BF%9B%E9%98%B6%E4%B9%8B%E8%B7%AF/

 EMAIL:y1053419035@qq.com

 */

 package cn.org.yinzhengjie.mapreduce.partition;

 import org.apache.hadoop.io.IntWritable;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Mapper;

 import java.io.IOException;

 public class KVMapper extends Mapper<Text,Text,Text,IntWritable> {

     @Override

     protected void map(Text key, Text value, Context context) throws IOException, InterruptedException {

             //将value转换成int类型

             int val = Integer.parseInt(value.toString());

             context.write(key,new IntWritable(val));

     }

 }

3>.KVReduce.java 文件内容

 /*

 @author :yinzhengjie

 Blog:http://www.cnblogs.com/yinzhengjie/tag/Hadoop%E8%BF%9B%E9%98%B6%E4%B9%8B%E8%B7%AF/

 EMAIL:y1053419035@qq.com

 */

 package cn.org.yinzhengjie.mapreduce.partition;

 import org.apache.hadoop.io.IntWritable;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Reducer;

 import java.io.IOException;

 public class KVReduce extends Reducer<Text,IntWritable,Text,IntWritable> {

     @Override

     protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

         int count = 0;

         for (IntWritable value : values) {

             count += value.get();

         }

         context.write(key,new IntWritable(count));

     }

 }

4>.KVApp.java 文件内容

 /*

 @author :yinzhengjie

 Blog:http://www.cnblogs.com/yinzhengjie/tag/Hadoop%E8%BF%9B%E9%98%B6%E4%B9%8B%E8%B7%AF/

 EMAIL:y1053419035@qq.com

 */

 package cn.org.yinzhengjie.mapreduce.partition;

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.fs.FileSystem;

 import org.apache.hadoop.fs.Path;

 import org.apache.hadoop.io.IntWritable;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Job;

 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat;

 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 public class KVApp {

     public static void main(String[] args) throws Exception {

         //实例化一个Configuration，它会自动去加载本地的core-site.xml配置文件的fs.defaultFS属性。(该文件放在项目的resources目录即可。)

         Configuration conf = new Configuration();

         //将hdfs写入的路径定义在本地，需要修改默认为文件系统，这样就可以覆盖到之前在core-site.xml配置文件读取到的数据。

         conf.set("fs.defaultFS","file:///");

         //创建一个任务对象job，别忘记把conf穿进去哟！

         Job job = Job.getInstance(conf);

         //给任务起个名字

         job.setJobName("WordCount");

         //设置输入格式以K-V的类型传入，这样K的类型就是Mapper输入端的key，而V的类型就是Mapper输入端的value

         job.setInputFormatClass(KeyValueTextInputFormat.class);

         //指定main函数所在的类，也就是当前所在的类名

         job.setJarByClass(KVApp.class);

         //指定map的类名,这里指定咱们自定义的map程序即可

         job.setMapperClass(KVMapper.class);

         //指定reduce的类名，这里指定咱们自定义的reduce程序即可

         job.setReducerClass(KVReduce.class);

         //设置输出key的数据类型

         job.setOutputKeyClass(Text.class);

         //设置输出value的数据类型

         job.setOutputValueClass(IntWritable.class);

         //设置输入路径，需要传递两个参数，即任务对象(job)以及输入路径

         FileInputFormat.addInputPath(job,new Path("D:\\10.Java\\IDE\\yhinzhengjieData\\MyHadoop\\Partitioner\\partitioner.txt"));

         //初始化HDFS文件系统，此时我们需要把读取到的fs.defaultFS属性传给fs对象。我的目的是调用该对象的delete方法，删除已经存在的文件夹

         FileSystem fs = FileSystem.get(conf);

         //通过fs的delete方法可以删除文件，第一个参数指的是删除文件对象，第二参数是指递归删除，一般用作删除目录

         Path outPath = new Path("D:\\10.Java\\IDE\\yhinzhengjieData\\MyHadoop\\Partitioner\\out");

         if (fs.exists(outPath)){

             fs.delete(outPath,true);

         }

         //设置输出路径，需要传递两个参数，即任务对象(job)以及输出路径

         FileOutputFormat.setOutputPath(job,outPath);

         //Reduce的个数，咱们是可以自己设置的

         job.setNumReduceTasks(2);

         //等待任务执行结束,将里面的值设置为true。

         job.waitForCompletion(true);

     }

 }

WARNING: An illegal reflective access operation has occurred

WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/C:/Users/Administrator/.m2/repository/org/apache/hadoop/hadoop-auth/2.7.3/hadoop-auth-2.7.3.jar) to method sun.security.krb5.Config.getInstance()

WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil

WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations

WARNING: All illegal access operations will be denied in a future release

18/06/18 08:51:19 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id

18/06/18 08:51:19 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=

18/06/18 08:51:19 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.

18/06/18 08:51:19 WARN mapreduce.JobResourceUploader: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).

18/06/18 08:51:19 INFO input.FileInputFormat: Total input paths to process : 1

18/06/18 08:51:19 INFO mapreduce.JobSubmitter: number of splits:1

18/06/18 08:51:19 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1080098995_0001

18/06/18 08:51:19 INFO mapreduce.Job: The url to track the job: http://localhost:8080/

18/06/18 08:51:19 INFO mapreduce.Job: Running job: job_local1080098995_0001

18/06/18 08:51:19 INFO mapred.LocalJobRunner: OutputCommitter set in config null

18/06/18 08:51:19 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1

18/06/18 08:51:19 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter

18/06/18 08:51:19 INFO mapred.LocalJobRunner: Waiting for map tasks

18/06/18 08:51:19 INFO mapred.LocalJobRunner: Starting task: attempt_local1080098995_0001_m_000000_0

18/06/18 08:51:19 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1

18/06/18 08:51:19 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.

18/06/18 08:51:19 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@59eafdde

18/06/18 08:51:19 INFO mapred.MapTask: Processing split: file:/D:/10.Java/IDE/yhinzhengjieData/MyHadoop/Partitioner/partitioner.txt:0+241

18/06/18 08:51:19 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)

18/06/18 08:51:19 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100

18/06/18 08:51:19 INFO mapred.MapTask: soft limit at 83886080

18/06/18 08:51:19 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600

18/06/18 08:51:19 INFO mapred.MapTask: kvstart = 26214396; length = 6553600

18/06/18 08:51:19 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer

18/06/18 08:51:19 INFO mapred.LocalJobRunner:

18/06/18 08:51:19 INFO mapred.MapTask: Starting flush of map output

18/06/18 08:51:19 INFO mapred.MapTask: Spilling map output

18/06/18 08:51:19 INFO mapred.MapTask: bufstart = 0; bufend = 252; bufvoid = 104857600

18/06/18 08:51:19 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214304(104857216); length = 93/6553600

18/06/18 08:51:19 INFO mapred.MapTask: Finished spill 0

18/06/18 08:51:19 INFO mapred.Task: Task:attempt_local1080098995_0001_m_000000_0 is done. And is in the process of committing

18/06/18 08:51:19 INFO mapred.LocalJobRunner: file:/D:/10.Java/IDE/yhinzhengjieData/MyHadoop/Partitioner/partitioner.txt:0+241

18/06/18 08:51:19 INFO mapred.Task: Task 'attempt_local1080098995_0001_m_000000_0' done.

18/06/18 08:51:19 INFO mapred.LocalJobRunner: Finishing task: attempt_local1080098995_0001_m_000000_0

18/06/18 08:51:19 INFO mapred.LocalJobRunner: map task executor complete.

18/06/18 08:51:19 INFO mapred.LocalJobRunner: Waiting for reduce tasks

18/06/18 08:51:19 INFO mapred.LocalJobRunner: Starting task: attempt_local1080098995_0001_r_000000_0

18/06/18 08:51:19 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1

18/06/18 08:51:19 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.

18/06/18 08:51:19 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@5d916cac

18/06/18 08:51:19 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@6ef6c4ad

18/06/18 08:51:19 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=2996200960, maxSingleShuffleLimit=749050240, mergeThreshold=1977492736, ioSortFactor=10, memToMemMergeOutputsThreshold=10

18/06/18 08:51:19 INFO reduce.EventFetcher: attempt_local1080098995_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events

18/06/18 08:51:19 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1080098995_0001_m_000000_0 decomp: 144 len: 148 to MEMORY

18/06/18 08:51:19 INFO reduce.InMemoryMapOutput: Read 144 bytes from map-output for attempt_local1080098995_0001_m_000000_0

18/06/18 08:51:19 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 144, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->144

18/06/18 08:51:19 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning

18/06/18 08:51:19 INFO mapred.LocalJobRunner: 1 / 1 copied.

18/06/18 08:51:19 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs

18/06/18 08:51:19 INFO mapred.Merger: Merging 1 sorted segments

18/06/18 08:51:19 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 138 bytes

18/06/18 08:51:19 INFO reduce.MergeManagerImpl: Merged 1 segments, 144 bytes to disk to satisfy reduce memory limit

18/06/18 08:51:19 INFO reduce.MergeManagerImpl: Merging 1 files, 148 bytes from disk

18/06/18 08:51:19 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce

18/06/18 08:51:19 INFO mapred.Merger: Merging 1 sorted segments

18/06/18 08:51:19 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 138 bytes

18/06/18 08:51:19 INFO mapred.LocalJobRunner: 1 / 1 copied.

18/06/18 08:51:19 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords

18/06/18 08:51:19 INFO mapred.Task: Task:attempt_local1080098995_0001_r_000000_0 is done. And is in the process of committing

18/06/18 08:51:19 INFO mapred.LocalJobRunner: 1 / 1 copied.

18/06/18 08:51:19 INFO mapred.Task: Task attempt_local1080098995_0001_r_000000_0 is allowed to commit now

18/06/18 08:51:19 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1080098995_0001_r_000000_0' to file:/D:/10.Java/IDE/yhinzhengjieData/MyHadoop/Partitioner/out/_temporary/0/task_local1080098995_0001_r_000000

18/06/18 08:51:19 INFO mapred.LocalJobRunner: reduce > reduce

18/06/18 08:51:19 INFO mapred.Task: Task 'attempt_local1080098995_0001_r_000000_0' done.

18/06/18 08:51:19 INFO mapred.LocalJobRunner: Finishing task: attempt_local1080098995_0001_r_000000_0

18/06/18 08:51:19 INFO mapred.LocalJobRunner: Starting task: attempt_local1080098995_0001_r_000001_0

18/06/18 08:51:19 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1

18/06/18 08:51:19 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.

18/06/18 08:51:19 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@378ab11d

18/06/18 08:51:19 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@10246d23

18/06/18 08:51:19 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=2996200960, maxSingleShuffleLimit=749050240, mergeThreshold=1977492736, ioSortFactor=10, memToMemMergeOutputsThreshold=10

18/06/18 08:51:19 INFO reduce.EventFetcher: attempt_local1080098995_0001_r_000001_0 Thread started: EventFetcher for fetching Map Completion Events

18/06/18 08:51:19 INFO reduce.LocalFetcher: localfetcher#2 about to shuffle output of map attempt_local1080098995_0001_m_000000_0 decomp: 160 len: 164 to MEMORY

18/06/18 08:51:19 INFO reduce.InMemoryMapOutput: Read 160 bytes from map-output for attempt_local1080098995_0001_m_000000_0

18/06/18 08:51:19 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 160, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->160

18/06/18 08:51:19 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning

18/06/18 08:51:19 INFO mapred.LocalJobRunner: 1 / 1 copied.

18/06/18 08:51:19 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs

18/06/18 08:51:19 INFO mapred.Merger: Merging 1 sorted segments

18/06/18 08:51:19 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 151 bytes

18/06/18 08:51:19 INFO reduce.MergeManagerImpl: Merged 1 segments, 160 bytes to disk to satisfy reduce memory limit

18/06/18 08:51:19 INFO reduce.MergeManagerImpl: Merging 1 files, 164 bytes from disk

18/06/18 08:51:19 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce

18/06/18 08:51:19 INFO mapred.Merger: Merging 1 sorted segments

18/06/18 08:51:19 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 151 bytes

18/06/18 08:51:19 INFO mapred.LocalJobRunner: 1 / 1 copied.

18/06/18 08:51:19 INFO mapred.Task: Task:attempt_local1080098995_0001_r_000001_0 is done. And is in the process of committing

18/06/18 08:51:19 INFO mapred.LocalJobRunner: 1 / 1 copied.

18/06/18 08:51:19 INFO mapred.Task: Task attempt_local1080098995_0001_r_000001_0 is allowed to commit now

18/06/18 08:51:19 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1080098995_0001_r_000001_0' to file:/D:/10.Java/IDE/yhinzhengjieData/MyHadoop/Partitioner/out/_temporary/0/task_local1080098995_0001_r_000001

18/06/18 08:51:19 INFO mapred.LocalJobRunner: reduce > reduce

18/06/18 08:51:19 INFO mapred.Task: Task 'attempt_local1080098995_0001_r_000001_0' done.

18/06/18 08:51:19 INFO mapred.LocalJobRunner: Finishing task: attempt_local1080098995_0001_r_000001_0

18/06/18 08:51:19 INFO mapred.LocalJobRunner: reduce task executor complete.

18/06/18 08:51:20 INFO mapreduce.Job: Job job_local1080098995_0001 running in uber mode : false

18/06/18 08:51:20 INFO mapreduce.Job:  map 100% reduce 100%

18/06/18 08:51:20 INFO mapreduce.Job: Job job_local1080098995_0001 completed successfully

18/06/18 08:51:20 INFO mapreduce.Job: Counters: 30

    File System Counters

        FILE: Number of bytes read=2721

        FILE: Number of bytes written=886698

        FILE: Number of read operations=0

        FILE: Number of large read operations=0

        FILE: Number of write operations=0

    Map-Reduce Framework

        Map input records=24

        Map output records=24

        Map output bytes=252

        Map output materialized bytes=312

        Input split bytes=139

        Combine input records=0

        Combine output records=0

        Reduce input groups=12

        Reduce shuffle bytes=312

        Reduce input records=24

        Reduce output records=12

        Spilled Records=48

        Shuffled Maps =2

        Failed Shuffles=0

        Merged Map outputs=2

        GC time elapsed (ms)=0

        Total committed heap usage (bytes)=805306368

    Shuffle Errors

        BAD_ID=0

        CONNECTION=0

        IO_ERROR=0

        WRONG_LENGTH=0

        WRONG_MAP=0

        WRONG_REDUCE=0

    File Input Format Counters

        Bytes Read=241

    File Output Format Counters

        Bytes Written=138

以上代码输出结果

5>.查看生成的文件内容

四.定义Partitioner的情况

1>.Mapper和Reduce端代码不变

 /*

 @author :yinzhengjie

 Blog:http://www.cnblogs.com/yinzhengjie/tag/Hadoop%E8%BF%9B%E9%98%B6%E4%B9%8B%E8%B7%AF/

 EMAIL:y1053419035@qq.com

 */

 package cn.org.yinzhengjie.mapreduce.partition;

 import org.apache.hadoop.io.IntWritable;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Mapper;

 import java.io.IOException;

 public class KVMapper extends Mapper<Text,Text,Text,IntWritable> {

     @Override

     protected void map(Text key, Text value, Context context) throws IOException, InterruptedException {

             //将value转换成int类型

             int val = Integer.parseInt(value.toString());

             context.write(key,new IntWritable(val));

     }

 }

KVMapper.java 文件内容

 /*

 @author :yinzhengjie

 Blog:http://www.cnblogs.com/yinzhengjie/tag/Hadoop%E8%BF%9B%E9%98%B6%E4%B9%8B%E8%B7%AF/

 EMAIL:y1053419035@qq.com

 */

 package cn.org.yinzhengjie.mapreduce.partition;

 import org.apache.hadoop.io.IntWritable;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Reducer;

 import java.io.IOException;

 public class KVReduce extends Reducer<Text,IntWritable,Text,IntWritable> {

     @Override

     protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

         int count = 0;

         for (IntWritable value : values) {

             count += value.get();

         }

         context.write(key,new IntWritable(count));

     }

 }

KVReduce.java 文件内容

2>.Partitioner端代码

 /*

 @author :yinzhengjie

 Blog:http://www.cnblogs.com/yinzhengjie/tag/Hadoop%E8%BF%9B%E9%98%B6%E4%B9%8B%E8%B7%AF/

 EMAIL:y1053419035@qq.com

 */

 package cn.org.yinzhengjie.mapreduce.partition;

 import org.apache.hadoop.io.IntWritable;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Partitioner;

 public class Partition extends Partitioner<Text,IntWritable> {

     @Override

     public int getPartition(Text text, IntWritable intWritable, int numPartitions) {

         //此处使用了一个取巧的方式，如果字符串不能转换成数字，说明该字符串是数字类型，就会被分到0号分区，反之就会分到1号分区。

         try {

             Integer.parseInt(text.toString());

             return 0;

         } catch (Exception e) {

             return 1;

         }

     }

 }

3>.KVApp.java 端代码

 /*

 @author :yinzhengjie

 Blog:http://www.cnblogs.com/yinzhengjie/tag/Hadoop%E8%BF%9B%E9%98%B6%E4%B9%8B%E8%B7%AF/

 EMAIL:y1053419035@qq.com

 */

 package cn.org.yinzhengjie.mapreduce.partition;

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.fs.FileSystem;

 import org.apache.hadoop.fs.Path;

 import org.apache.hadoop.io.IntWritable;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Job;

 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat;

 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 public class KVApp {

     public static void main(String[] args) throws Exception {

         //实例化一个Configuration，它会自动去加载本地的core-site.xml配置文件的fs.defaultFS属性。(该文件放在项目的resources目录即可。)

         Configuration conf = new Configuration();

         //将hdfs写入的路径定义在本地，需要修改默认为文件系统，这样就可以覆盖到之前在core-site.xml配置文件读取到的数据。

         conf.set("fs.defaultFS","file:///");

         //创建一个任务对象job，别忘记把conf穿进去哟！

         Job job = Job.getInstance(conf);

         //给任务起个名字

         job.setJobName("WordCount");

         //设置输入格式以K-V的类型传入，这样K的类型就是Mapper输入端的key，而V的类型就是Mapper输入端的value

         job.setInputFormatClass(KeyValueTextInputFormat.class);

         //指定main函数所在的类，也就是当前所在的类名

         job.setJarByClass(KVApp.class);

         //指定map的类名,这里指定咱们自定义的map程序即可

         job.setMapperClass(KVMapper.class);

         //指定reduce的类名，这里指定咱们自定义的reduce程序即可

         job.setReducerClass(KVReduce.class);

         //指定Partitioner的类名，这里指定咱们自定义的Partition程序即可

         job.setPartitionerClass(Partition.class);

         //设置输出key的数据类型

         job.setOutputKeyClass(Text.class);

         //设置输出value的数据类型

         job.setOutputValueClass(IntWritable.class);

         //设置输入路径，需要传递两个参数，即任务对象(job)以及输入路径

         FileInputFormat.addInputPath(job,new Path("D:\\10.Java\\IDE\\yhinzhengjieData\\MyHadoop\\Partitioner\\partitioner.txt"));

         //初始化HDFS文件系统，此时我们需要把读取到的fs.defaultFS属性传给fs对象。我的目的是调用该对象的delete方法，删除已经存在的文件夹

         FileSystem fs = FileSystem.get(conf);

         //通过fs的delete方法可以删除文件，第一个参数指的是删除文件对象，第二参数是指递归删除，一般用作删除目录

         Path outPath = new Path("D:\\10.Java\\IDE\\yhinzhengjieData\\MyHadoop\\Partitioner\\out");

         if (fs.exists(outPath)){

             fs.delete(outPath,true);

         }

         //设置输出路径，需要传递两个参数，即任务对象(job)以及输出路径

         FileOutputFormat.setOutputPath(job,outPath);

         //Reduce的个数，咱们是可以自己设置的

         job.setNumReduceTasks(2);

         //等待任务执行结束,将里面的值设置为true。

         job.waitForCompletion(true);

     }

 }

4>.测试结果如下：

Hadoop基础-MapReduce的Partitioner用法案例的更多相关文章

Hadoop基础-MapReduce的Combiner用法案例
Hadoop基础-MapReduce的Combiner用法案例作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.编写年度最高气温统计如上图说所示:有一个temp的文件,里面存放 ...
Hadoop基础-MapReduce入门篇之编写简单的Wordcount测试代码
Hadoop基础-MapReduce入门篇之编写简单的Wordcount测试代码作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 本文主要是记录一写我在学习MapReduce时的一些 ...
Hadoop基础-MapReduce的排序
Hadoop基础-MapReduce的排序作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.MapReduce的排序分类 1>.部分排序部分排序是对单个分区进行排序,举个 ...
Hadoop基础-MapReduce的数据倾斜解决方案
Hadoop基础-MapReduce的数据倾斜解决方案作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.数据倾斜简介 1>.什么是数据倾斜答:大量数据涌入到某一节点,导致 ...
Hadoop基础-MapReduce的常用文件格式介绍
Hadoop基础-MapReduce的常用文件格式介绍作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.MR文件格式-SequenceFile 1>.生成SequenceF ...
Hadoop基础-MapReduce的Join操作
Hadoop基础-MapReduce的Join操作作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.连接操作Map端Join(适合处理小表+大表的情况) no001 no002 ...
Hadoop基础-MapReduce的工作原理第二弹
Hadoop基础-MapReduce的工作原理第二弹作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.Split(切片) 1>.MapReduce处理的单位(切片) 想必 ...
Hadoop基础-MapReduce的工作原理第一弹
Hadoop基础-MapReduce的工作原理第一弹作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 在本篇博客中,我们将深入学习Hadoop中的MapReduce工作机制,这些知识 ...
Hadoop基础--统计商家id的标签数案例分析
Hadoop基础--统计商家id的标签数案例分析作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.项目需求将“temptags.txt”中的数据进行分析,统计出商家id的评论标 ...

随机推荐

shell脚本事例 -- 获取当前日期的前一天日期
记录一个shell脚本事例,事例中包括shell的一些语法(函数定义.表达式运算.if.case...) #!/bin/sh #获取当前时间 RUN_TIME=`date +%H%M%S` #取当前日 ...
LOJ#2799. 「CCC 2016」生命之环
题意给你一个 \(n\) 个 \(\rm 01\) 组成的环,每次操作之后每个位置为1当且仅当他的左右恰好有1个1.输出进行 \(T\) 次操作之后的环. \(n\leq 10^5, T\leq 1 ...
我的SQL SERVER数据库会装满吗？
概述今天有个客户问我一个蛮有意思的问题.我使用的SQL SERVER 2008数据库,目前数据库130多G,其中某个表的记录条数就有3亿1千多万,占用了50多G.那SQL SERVER 数据库中的表 ...
ASP.NetCore2.0概览
微软为了统一微软平台,造就了.netStandard,不管之前的Framework还是最新的.netCore都必须支持.netStandard标准来统一各个平台的开发api. 以下是之前的微软各个 ...
linux之grep 基础
第一章 -a 将binary文件以text文件的方式搜寻数据-c 只输出匹配行的计数,计算找到匹配的次数-I(大写i) 不区分大小写(只适合用于单字符)-h 查询多文件时不显示 ...
Json和Map互转，四个包（org.json/net.sf.json/com.google.gson/com.alibaba.fastjson）
目前使用的(org.json/net.sf.json/com.google.gson/com.alibaba.fastjson)这四种json-map互转,其他的以后在补充.............. ...
EOS开发基础之一：源代码下载与开发环境搭建
区块链最近挺火的,我又是个非常缺钱的人,所以紧跟了潮流一头扎进区块链的研究中. 这EOS项目是目前比较火的一个项目,相信很多朋友拿到这份EOS的源代码后都会一脸懵逼,因为……这代码写得太高级了,老纸看 ...
CMS漏洞检测工具 – CMSmap
CMSmap是一个Python编写的针对开源CMS(内容管理系统)的安全扫描器,它可以自动检测当前国外最流行的CMS的安全漏洞. CMSmap主要是在一个单一的工具集合了不同类型的CMS的常见的漏洞. ...
[T-ARA][Tic Tic Toc]
歌词来源:http://music.163.com/#/song?id=22704478 Tic Tic Toc RA Tic Tic Toc RA [Tic Tic Toc RA Tic Tic T ...
Android 公共库的建立方法
本文主要介绍在android工程中如何将共用代码建成公共包方便其他工程引用.引用后的工程结构分析.library引入方式的优缺点. 自己也写了一些android公共的库,有兴趣的可以参考 Trinea ...

Hadoop基础-MapReduce的Partitioner用法案例

Hadoop基础-MapReduce的Partitioner用法案例的更多相关文章

随机推荐

热门专题