Hadoop日记Day17---计数器、map规约、分区学习
一、Hadoop计数器
1.1 什么是Hadoop计数器
Haoop是处理大数据的,不适合处理小数据,有些大数据问题是小数据程序是处理不了的,他是一个高延迟的任务,有时处理一个大数据需要花费好几个小时这都是正常的。下面我们说一下Hadoop计数器,Hadoop计数器就相当于我们的日志,而日志可以让我们查看程序运行时的很多状态,而计数器也有这方面的作用。那么就研究一下Hadoop自身的计数器。计数器的程序如代码1.1所示,下面代码还是以内容为“hello you;hell0 me”的单词统计为例。
package counter; import java.net.URI; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Counter;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner; public class WordCountApp {
static final String INPUT_PATH = "hdfs://hadoop:9000/input";
static final String OUT_PATH = "hdfs://hadoop:9000/output"; public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); final FileSystem fileSystem = FileSystem.get(new URI(INPUT_PATH), conf);
final Path outPath = new Path(OUT_PATH); if(fileSystem.exists(outPath)){
fileSystem.delete(outPath, true);
}
final Job job = new Job(conf , WordCountApp.class.getSimpleName()); //1.1指定读取的文件位于哪里
FileInputFormat.setInputPaths(job, INPUT_PATH);
job.setInputFormatClass(TextInputFormat.class);//指定如何对输入文件进行格式化,把输入文件每一行解析成键值对 //1.2 指定自定义的map类
job.setMapperClass(MyMapper.class);
job.setMapOutputKeyClass(Text.class);//map输出的<k,v>类型。
job.setMapOutputValueClass(LongWritable.class);//如果<k3,v3>的类型与<k2,v2>类型一致,则可以省略 //1.3 分区
job.setPartitionerClass(HashPartitioner.class);
job.setNumReduceTasks(1);//有一个reduce任务运行 //2.2 指定自定义reduce类
job.setReducerClass(MyReducer.class); job.setOutputKeyClass(Text.class);//指定reduce的输出类型
job.setOutputValueClass(LongWritable.class); //2.3 指定写出到哪里
FileOutputFormat.setOutputPath(job, outPath);
job.setOutputFormatClass(TextOutputFormat.class);//指定输出文件的格式化类 job.waitForCompletion(true);//把job提交给JobTracker运行
} /**
* KEYIN 即k1 表示行的偏移量
* VALUEIN 即v1 表示行文本内容
* KEYOUT 即k2 表示行中出现的单词
* VALUEOUT 即v2 表示行中出现的单词的次数,固定值1
*/
static class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable>{
protected void map(LongWritable k1, Text v1, Context context) throws java.io.IOException ,InterruptedException {
final String line = v1.toString();
final String[] splited = line.split("\t");
for (String word : splited) {
context.write(new Text(word), new LongWritable(1));
}
};
} /**
* KEYIN 即k2 表示行中出现的单词
* VALUEIN 即v2 表示行中出现的单词的次数
* KEYOUT 即k3 表示文本中出现的不同单词
* VALUEOUT 即v3 表示文本中出现的不同单词的总次数
*
*/
static class MyReducer extends Reducer<Text, LongWritable, Text, LongWritable>{
protected void reduce(Text k2, java.lang.Iterable<LongWritable> v2s, Context ctx) throws java.io.IOException ,InterruptedException {
long times = 0L;
for (LongWritable count : v2s) {
times += count.get();
}
ctx.write(k2, new LongWritable(times));
};
} }
代码 1.1
运行结果如下图1.1所示。
Counters: 19//Counter表示计数器,19表示有19个计数器(下面一共4计数器组)
File Output Format Counters //文件输出格式化计数器组
Bytes Written=19 //reduce输出到hdfs的字节数,一共19个字节
FileSystemCounters//文件系统计数器组
FILE_BYTES_READ=481
HDFS_BYTES_READ=38
FILE_BYTES_WRITTEN=81316
HDFS_BYTES_WRITTEN=19
File Input Format Counters //文件输入格式化计数器组
Bytes Read=19 //map从hdfs读取的字节数
Map-Reduce Framework//MapReduce框架
Map output materialized bytes=49
Map input records=2 //map读入的记录行数,读取两行记录,”hello you”,”hello me”
Reduce shuffle bytes=0//规约分区的字节数
Spilled Records=8
Map output bytes=35
Total committed heap usage (bytes)=266469376
SPLIT_RAW_BYTES=105
Combine input records=0//合并输入的记录数
Reduce input records=4 //reduce从map端接收的记录行数
Reduce input groups=3 //reduce函数接收的key数量,即归并后的k2数量
Combine output records=0//合并输出的记录数
Reduce output records=3 //reduce输出的记录行数。<helllo,{1,1}>,<you,{1}>,<me,{1}>
Map output records=4 //map输出的记录行数,输出4行记录
图 1.1
通过上面我们对计数器的分析,可以知道,我们可以通过计数器来分析MapReduece程序的运行状态。
1.2 自定义计数器
通过上面的分析,我们了解了计数器的作用,那么我们可以自定义一个计数器,来实现我们自己想要的功能。如定义一个记录敏感词的计数器,记录敏感词在一行所出现的次数,如代码2.1所示。我们处理文件内容为“hello you”,“hello me”。
Counters: 19//Counter表示计数器,19表示有19个计数器(下面一共4计数器组)
File Output Format Counters //文件输出格式化计数器组
Bytes Written=19 //reduce输出到hdfs的字节数,一共19个字节
FileSystemCounters//文件系统计数器组
FILE_BYTES_READ=481
HDFS_BYTES_READ=38
FILE_BYTES_WRITTEN=81316
HDFS_BYTES_WRITTEN=19
File Input Format Counters //文件输入格式化计数器组
Bytes Read=19 //map从hdfs读取的字节数
Map-Reduce Framework//MapReduce框架
Map output materialized bytes=49
Map input records=2 //map读入的记录行数,读取两行记录,”hello you”,”hello me”
Reduce shuffle bytes=0//规约分区的字节数
Spilled Records=8
Map output bytes=35
Total committed heap usage (bytes)=266469376
SPLIT_RAW_BYTES=105
Combine input records=0//合并输入的记录数
Reduce input records=4 //reduce从map端接收的记录行数
Reduce input groups=3 //reduce函数接收的key数量,即归并后的k2数量
Combine output records=0//合并输出的记录数
Reduce output records=3 //reduce输出的记录行数。<helllo,{1,1}>,<you,{1}>,<me,{1}>
Map output records=4 //map输出的记录行数,输出4行记录
代码2.1
运行结果如下图2.1所示。
Counters: 20
Sensitive Words
hello=2
File Output Format Counters
Bytes Written=21
FileSystemCounters
FILE_BYTES_READ=359
HDFS_BYTES_READ=42
FILE_BYTES_WRITTEN=129080
HDFS_BYTES_WRITTEN=21
File Input Format Counters
Bytes Read=21
Map-Reduce Framework
Map output materialized bytes=67
Map input records=2
Reduce shuffle bytes=0
Spilled Records=8
Map output bytes=53
Total committed heap usage (bytes)=391774208
SPLIT_RAW_BYTES=95
Combine input records=0
Reduce input records=4
Reduce input groups=3
Combine output records=0
Reduce output records=3
Map output records=4
图 2.1
二、Combiners编程
2.1 什么是Combiners
从上面程序运行的结果我们可以发现,在Map-Reduce Framework即MapReduce框架的输出中,Combine input records这个字段为零, 那么combine怎么使用呢?其实这是MapReduce程序中Mapper任务中第五步,这是可选的一步,使用方法非常简单,以上面单词统计为例,只需添加下面一行代码即可,如下: job.setCombinerClass(MyReducer.class);
combine操作是一个可选的操作,使用时需要我们自己设定,我们用MyReducer类来设置Combiners,表示Combiners与Reduce功能相同,带有combine功能的MapRduce程序如代码3.1所示。
package combine; import java.net.URI; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Partitioner;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.jasper.tagplugins.jstl.core.If; public class WordCountApp2 {
static final String INPUT_PATH = "hdfs://hadoop:9000/hello";
static final String OUT_PATH = "hdfs://hadoop:9000/out"; public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
final FileSystem fileSystem = FileSystem.get(new URI(INPUT_PATH), conf);
final Path outPath = new Path(OUT_PATH);
if(fileSystem.exists(outPath)){
fileSystem.delete(outPath, true);
}
final Job job = new Job(conf , WordCountApp2.class.getSimpleName());
job.setJarByClass(WordCountApp2.class); //1.1指定读取的文件位于哪里
FileInputFormat.setInputPaths(job, INPUT_PATH);
job.setInputFormatClass(TextInputFormat.class);//指定如何对输入文件进行格式化,把输入文件每一行解析成键值对 //1.2 指定自定义的map类
job.setMapperClass(MyMapper.class);
job.setMapOutputKeyClass(Text.class);//map输出的<k,v>类型。
job.setMapOutputValueClass(LongWritable.class);//如果<k3,v3>的类型与<k2,v2>类型一致,则可以省略 //1.3 分区
job.setPartitionerClass(MyPartitioner.class);
//有几个reduce任务运行
job.setNumReduceTasks(2); //1.4 TODO 排序、分组 //1.5 规约
job.setCombinerClass(MyCombiner.class); //2.2 指定自定义reduce类
job.setReducerClass(MyReducer.class);
//指定reduce的输出类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class); //2.3 指定写出到哪里
FileOutputFormat.setOutputPath(job, outPath);
//指定输出文件的格式化类
//job.setOutputFormatClass(TextOutputFormat.class); //把job提交给JobTracker运行
job.waitForCompletion(true);
} static class MyPartitioner extends Partitioner<Text, LongWritable>{
@Override
public int getPartition(Text key, LongWritable value, int numReduceTasks) {
return (key.toString().equals("hello"))?0:1;
}
} /**
* KEYIN 即k1 表示行的偏移量
* VALUEIN 即v1 表示行文本内容
* KEYOUT 即k2 表示行中出现的单词
* VALUEOUT 即v2 表示行中出现的单词的次数,固定值1
*/
static class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable>{
protected void map(LongWritable k1, Text v1, Context context) throws java.io.IOException ,InterruptedException {
final String[] splited = v1.toString().split("\t");
for (String word : splited) {
context.write(new Text(word), new LongWritable(1));
System.out.println("Mapper输出<"+word+","+1+">");
}
};
} /**
* KEYIN 即k2 表示行中出现的单词
* VALUEIN 即v2 表示行中出现的单词的次数
* KEYOUT 即k3 表示文本中出现的不同单词
* VALUEOUT 即v3 表示文本中出现的不同单词的总次数
*
*/
static class MyReducer extends Reducer<Text, LongWritable, Text, LongWritable>{
protected void reduce(Text k2, java.lang.Iterable<LongWritable> v2s, Context ctx) throws java.io.IOException ,InterruptedException {
//显示次数表示redcue函数被调用了多少次,表示k2有多少个分组
System.out.println("MyReducer输入分组<"+k2.toString()+",...>");
long times = 0L;
for (LongWritable count : v2s) {
times += count.get();
//显示次数表示输入的k2,v2的键值对数量
System.out.println("MyReducer输入键值对<"+k2.toString()+","+count.get()+">");
}
ctx.write(k2, new LongWritable(times));
};
} static class MyCombiner extends Reducer<Text, LongWritable, Text, LongWritable>{
protected void reduce(Text k2, java.lang.Iterable<LongWritable> v2s, Context ctx) throws java.io.IOException ,InterruptedException {
//显示次数表示redcue函数被调用了多少次,表示k2有多少个分组
System.out.println("Combiner输入分组<"+k2.toString()+",...>");
long times = 0L;
for (LongWritable count : v2s) {
times += count.get();
//显示次数表示输入的k2,v2的键值对数量
System.out.println("Combiner输入键值对<"+k2.toString()+","+count.get()+">");
} ctx.write(k2, new LongWritable(times));
//显示次数表示输出的k2,v2的键值对数量
System.out.println("Combiner输出键值对<"+k2.toString()+","+times+">");
};
}
}
代码 3.1
运行结果如下图3.1所示。
Counters: 20
Sensitive Words
hello=2
File Output Format Counters
Bytes Written=21
FileSystemCounters
FILE_BYTES_READ=359
HDFS_BYTES_READ=42
FILE_BYTES_WRITTEN=129080
HDFS_BYTES_WRITTEN=21
File Input Format Counters
Bytes Read=21
Map-Reduce Framework
Map output materialized bytes=67
Map input records=2
Reduce shuffle bytes=0
Spilled Records=8
Map output bytes=53
Total committed heap usage (bytes)=391774208
SPLIT_RAW_BYTES=95
Combine input records=
Reduce input records=
Reduce input groups=3
Combine output records=
Reduce output records=3
Map output records=4
图 3.1
从上面的运行结果我们可以发现,此时Combine input records=4,Combine output records=3,Reduce input records=3,因为Combine阶段在Ma pper结束与Reducer开始之间,Combiners处理的数据,就是在不设置Combiners时,Reduce所应该接受的数据,所以为4,然后再将Combiners的输出作为Re duce端的输入,所以Reduce input records这个字段由4变成了3。注意,combine操作是一个可选的操作,使用时需要我们自己设定,在本代码中我们用MyRed ucer类来设置Combiners,Combine方法的使用的是Reduce的方法,这说明归约的方法是通用的,Reducer阶段的方法也可以用到Mapper阶段。
2.1 自定义Combiners
为了能够更加清晰的理解Combiners的工作原理,我们自定义一个Combiners类,不再使用MyReduce做为Combiners的类,如代码3.2所示。
package combine; import java.net.URI; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Partitioner;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.jasper.tagplugins.jstl.core.If; /**
* 问:为什么使用Combiner?
* 答:Combiner发生在Map端,对数据进行规约处理,数据量变小了,传送到reduce端的数据量变小了,传输时间变短,作业的整体时间变短。
*
* 问:为什么Combiner不作为MR运行的标配,而是可选步骤哪?
* 答:因为不是所有的算法都适合使用Combiner处理,例如求平均数。
*
* 问:Combiner本身已经执行了reduce操作,为什么在Reducer阶段还要执行reduce操作哪?
* 答:combiner操作发生在map端的,处理一个任务所接收的文件中的数据,不能跨map任务执行;只有reduce可以接收多个map任务处理的数据。
*
*/
public class WordCountApp2 {
static final String INPUT_PATH = "hdfs://hadoop:9000/hello";
static final String OUT_PATH = "hdfs://hadoop:9000/out"; public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
final FileSystem fileSystem = FileSystem.get(new URI(INPUT_PATH), conf);
final Path outPath = new Path(OUT_PATH);
if(fileSystem.exists(outPath)){
fileSystem.delete(outPath, true);
}
final Job job = new Job(conf , WordCountApp2.class.getSimpleName());
job.setJarByClass(WordCountApp2.class); //1.1指定读取的文件位于哪里
FileInputFormat.setInputPaths(job, INPUT_PATH);
job.setInputFormatClass(TextInputFormat.class);//指定如何对输入文件进行格式化,把输入文件每一行解析成键值对 //1.2 指定自定义的map类
job.setMapperClass(MyMapper.class);
job.setMapOutputKeyClass(Text.class);//map输出的<k,v>类型。
job.setMapOutputValueClass(LongWritable.class);//如果<k3,v3>的类型与<k2,v2>类型一致,则可以省略 //1.3 分区
job.setPartitionerClass(MyPartitioner.class);
//有几个reduce任务运行
job.setNumReduceTasks(2); //1.4 TODO 排序、分组 //1.5 规约
job.setCombinerClass(MyCombiner.class); //2.2 指定自定义reduce类
job.setReducerClass(MyReducer.class);
//指定reduce的输出类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class); //2.3 指定写出到哪里
FileOutputFormat.setOutputPath(job, outPath);
//指定输出文件的格式化类
//job.setOutputFormatClass(TextOutputFormat.class); //把job提交给JobTracker运行
job.waitForCompletion(true);
} static class MyPartitioner extends Partitioner<Text, LongWritable>{
@Override
public int getPartition(Text key, LongWritable value, int numReduceTasks) {
return (key.toString().equals("hello"))?0:1;
}
} /**
* KEYIN 即k1 表示行的偏移量
* VALUEIN 即v1 表示行文本内容
* KEYOUT 即k2 表示行中出现的单词
* VALUEOUT 即v2 表示行中出现的单词的次数,固定值1
*/
static class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable>{
protected void map(LongWritable k1, Text v1, Context context) throws java.io.IOException ,InterruptedException {
final String[] splited = v1.toString().split("\t");
for (String word : splited) {
context.write(new Text(word), new LongWritable(1));
System.out.println("Mapper输出<"+word+","+1+">");
}
};
} /**
* KEYIN 即k2 表示行中出现的单词
* VALUEIN 即v2 表示行中出现的单词的次数
* KEYOUT 即k3 表示文本中出现的不同单词
* VALUEOUT 即v3 表示文本中出现的不同单词的总次数
*
*/
static class MyReducer extends Reducer<Text, LongWritable, Text, LongWritable>{
protected void reduce(Text k2, java.lang.Iterable<LongWritable> v2s, Context ctx) throws java.io.IOException ,InterruptedException {
//显示次数表示redcue函数被调用了多少次,表示k2有多少个分组
System.out.println("MyReducer输入分组<"+k2.toString()+",...>");
long times = 0L;
for (LongWritable count : v2s) {
times += count.get();
//显示次数表示输入的k2,v2的键值对数量
System.out.println("MyReducer输入键值对<"+k2.toString()+","+count.get()+">");
}
ctx.write(k2, new LongWritable(times));
};
} static class MyCombiner extends Reducer<Text, LongWritable, Text, LongWritable>{
protected void reduce(Text k2, java.lang.Iterable<LongWritable> v2s, Context ctx) throws java.io.IOException ,InterruptedException {
//显示次数表示redcue函数被调用了多少次,表示k2有多少个分组
System.out.println("Combiner输入分组<"+k2.toString()+",...>");
long times = 0L;
for (LongWritable count : v2s) {
times += count.get();
//显示次数表示输入的k2,v2的键值对数量
System.out.println("Combiner输入键值对<"+k2.toString()+","+count.get()+">");
} ctx.write(k2, new LongWritable(times));
//显示次数表示输出的k2,v2的键值对数量
System.out.println("Combiner输出键值对<"+k2.toString()+","+times+">");
};
}
}
代码 3.2
运行结果如图3.2所示。
14/10/07 18:56:32 INFO mapred.MapTask: record buffer = 262144/327680
Mapper输出<hello,1>
14/10/07 18:56:32 INFO mapred.MapTask: Starting flush of map output
Mapper输出<world,1>
Mapper输出<hello,1>
Mapper输出<me,1>
Combiner输入分组<hello,...>
Combiner输入键值对<hello,1>
Combiner输入键值对<hello,1>
Combiner输出键值对<hello,2>
Combiner输入分组<me,...>
Combiner输入键值对<me,1>
Combiner输出键值对<me,1>
Combiner输入分组<world,...>
Combiner输入键值对<world,1>
Combiner输出键值对<world,1>
14/10/07 18:56:32 INFO mapred.MapTask: Finished spill 0
14/10/07 18:56:32 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
14/10/07 18:56:32 INFO mapred.LocalJobRunner:
14/10/07 18:56:32 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
14/10/07 18:56:32 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/10/07 18:56:32 INFO mapred.LocalJobRunner:
14/10/07 18:56:32 INFO mapred.Merger: Merging 1 sorted segments
14/10/07 18:56:32 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 47 bytes
14/10/07 18:56:32 INFO mapred.LocalJobRunner:
MyReducer输入分组<hello,...>
MyReducer输入键值对<hello,2>
MyReducer输入分组<me,...>
MyReducer输入键值对<me,1>
MyReducer输入分组<world,...>
MyReducer输入键值对<world,1>
14/10/07 18:56:33 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
14/10/07 18:56:33 INFO mapred.LocalJobRunner:
14/10/07 18:56:33 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now
14/10/07 18:56:33 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://hadoop:9000/output
14/10/07 18:56:33 INFO mapred.LocalJobRunner: reduce > reduce
14/10/07 18:56:33 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
14/10/07 18:56:33 INFO mapred.JobClient: map 100% reduce 100%
14/10/07 18:56:33 INFO mapred.JobClient: Job complete: job_local_0001
14/10/07 18:56:33 INFO mapred.JobClient: Counters: 19
14/10/07 18:56:33 INFO mapred.JobClient: File Output Format Counters
14/10/07 18:56:33 INFO mapred.JobClient: Bytes Written=21
14/10/07 18:56:33 INFO mapred.JobClient: FileSystemCounters
14/10/07 18:56:33 INFO mapred.JobClient: FILE_BYTES_READ=343
14/10/07 18:56:33 INFO mapred.JobClient: HDFS_BYTES_READ=42
14/10/07 18:56:33 INFO mapred.JobClient: FILE_BYTES_WRITTEN=129572
14/10/07 18:56:33 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=21
14/10/07 18:56:33 INFO mapred.JobClient: File Input Format Counters
14/10/07 18:56:33 INFO mapred.JobClient: Bytes Read=21
14/10/07 18:56:33 INFO mapred.JobClient: Map-Reduce Framework
14/10/07 18:56:33 INFO mapred.JobClient: Map output materialized bytes=51
14/10/07 18:56:33 INFO mapred.JobClient: Map input records=2
14/10/07 18:56:33 INFO mapred.JobClient: Reduce shuffle bytes=0
14/10/07 18:56:33 INFO mapred.JobClient: Spilled Records=6
14/10/07 18:56:33 INFO mapred.JobClient: Map output bytes=53
14/10/07 18:56:33 INFO mapred.JobClient: Total committed heap usage (bytes)=391774208
14/10/07 18:56:33 INFO mapred.JobClient: SPLIT_RAW_BYTES=95
14/10/07 18:56:33 INFO mapred.JobClient: Combine input records=4
14/10/07 18:56:33 INFO mapred.JobClient: Reduce input records=3
14/10/07 18:56:33 INFO mapred.JobClient: Reduce input groups=3
14/10/07 18:56:33 INFO mapred.JobClient: Combine output records=3
14/10/07 18:56:33 INFO mapred.JobClient: Reduce output records=3
14/10/07 18:56:33 INFO mapred.JobClient: Map output records=4
图 3.2
从上面的运行结果我们可以得知,combine具体作用如下:
- 每一个map可能会产生大量的输出,combiner的作用就是在map端对输出先做一次合并,以减少传输到reducer的数据量。
- combiner最基本是实现本地key的归并,combiner具有类似本地的reduce功能。
- 如果不用combiner,那么,所有的结果都是reduce完成,效率会相对低下。使用combiner,先完成的map会在本地聚合,提升速度。
注意:Combiner的输出是Reducer的输入,Combiner绝不能改变最终的计算结果。所以从我的想法来看,Combiner只应该用于那 种Reduce的输入key/value与输出key/value类型完全一致,且不影响最终结果的场景。比如累加,最大值等。
解释一下
*问:为什么使用Combiner?
答:Combiner发生在Map端,对数据进行规约处理,数据量变小了,传送到reduce端的数据量变小了,传输时间变短,作业的整体时间变短。
* 问:为什么Combiner不作为MR运行的标配,而是可选步骤?
答:因为不是所有的算法都适合使用Combiner处理,例如求平均数。
* 问:Combiner本身已经执行了reduce操作,为什么在Reducer阶段还要执行reduce操作?
答:combiner操作发生在map端的,智能处理一个map任务中的数据,不能跨map任务执行;只有reduce可以接收多个map任务处理的数据。
三、Partitioner编程
4.1 什么是分区
在MapReuce程序中的Mapper任务的第三步就是分区,那么分区到底是干什么的呢?其实,把数据分区是为了更好的利用数据,根据数据的属性不同来分成不同区,再根据不同的分区完成不同的任务。MapReduce程序中他的默认分区是1个分区,我们看一下默认分区的代码,还是以单词统计为例如代码4.1所示。
package counter; import java.net.URI; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Counter;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner; public class WordCountApp {
static final String INPUT_PATH = "hdfs://hadoop:9000/input";
static final String OUT_PATH = "hdfs://hadoop:9000/output"; public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); final FileSystem fileSystem = FileSystem.get(new URI(INPUT_PATH), conf);
final Path outPath = new Path(OUT_PATH); if(fileSystem.exists(outPath)){
fileSystem.delete(outPath, true);
}
final Job job = new Job(conf , WordCountApp.class.getSimpleName()); //1.1指定读取的文件位于哪里
FileInputFormat.setInputPaths(job, INPUT_PATH);
job.setInputFormatClass(TextInputFormat.class);//指定如何对输入文件进行格式化,把输入文件每一行解析成键值对 //1.2 指定自定义的map类
job.setMapperClass(MyMapper.class);
job.setMapOutputKeyClass(Text.class);//map输出的<k,v>类型。
job.setMapOutputValueClass(LongWritable.class);//如果<k3,v3>的类型与<k2,v2>类型一致,则可以省略 //1.3 分区
job.setPartitionerClass(HashPartitioner.class);
job.setNumReduceTasks(1);//有一个reduce任务运行 job.setCombinerClass(MyReducer.class);
//2.2 指定自定义reduce类
job.setReducerClass(MyReducer.class); job.setOutputKeyClass(Text.class);//指定reduce的输出类型
job.setOutputValueClass(LongWritable.class); //2.3 指定写出到哪里
FileOutputFormat.setOutputPath(job, outPath);
job.setOutputFormatClass(TextOutputFormat.class);//指定输出文件的格式化类 job.waitForCompletion(true);//把job提交给JobTracker运行
} /**
* KEYIN 即k1 表示行的偏移量
* VALUEIN 即v1 表示行文本内容
* KEYOUT 即k2 表示行中出现的单词
* VALUEOUT 即v2 表示行中出现的单词的次数,固定值1
*/
static class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable>{
protected void map(LongWritable k1, Text v1, Context context) throws java.io.IOException ,InterruptedException {
final Counter helloCounter = context.getCounter("Sensitive Words", "hello"); final String line = v1.toString();
if(line.contains("hello")){
//记录敏感词出现在一行中
helloCounter.increment(1L);
}
final String[] splited = line.split("\t");
for (String word : splited) {
context.write(new Text(word), new LongWritable(1));
}
};
} /**
* KEYIN 即k2 表示行中出现的单词
* VALUEIN 即v2 表示行中出现的单词的次数
* KEYOUT 即k3 表示文本中出现的不同单词
* VALUEOUT 即v3 表示文本中出现的不同单词的总次数
*
*/
static class MyReducer extends Reducer<Text, LongWritable, Text, LongWritable>{
protected void reduce(Text k2, java.lang.Iterable<LongWritable> v2s, Context ctx) throws java.io.IOException ,InterruptedException {
long times = 0L;
for (LongWritable count : v2s) {
times += count.get();
}
ctx.write(k2, new LongWritable(times));
};
} }
代码 4.1
在MapReduce程序中默认的分区方法为HashPartitioner,代码job.setNumReduceTasks(1)表示运行的Reduce任务数,他会将numReduceTask这个变量设为1. HashPartitioner继承自Partitioner,Partitioner是Partitioner的基类,如果需要定制partitioner也需要继承该类。 HashPartitioner计算方法如代码4.2所示。
public class HashPartitioner<K, V> extends Partitioner<K, V> { /** Use {@link Object#hashCode()} to partition. */
public int getPartition(K key, V value,
int numReduceTasks) {
return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
} }
代码 4.2
在上面的代码中K和V,表示k2和v2,该类中只有一个方法getPartition(),返回值如下”(key.hashCode()& Integer.MAX_VALUE)%numReduceTasks“其中key.hashCode()表示该关键是否属于该类。numReduceTasks的值在上面代码中设置为1,取模后只有一种结果那就是0。getPartition()的意义就是表示划分到不同区域的一个标记,返回0,就是表示划分到第0区,所以我们可以把它理解分区的下标,来代表不同的分区。
4.2 自定义分区
下面我们尝试自定义一个分区,来处理一下手机的日志数据(在前面学习中用过),手机日志数据如下图4.1所示。
图 4.1
从图中我们可以发现,在第二列上并不是所有的数据都是手机号,我们任务就是在统计手机流量时,将手机号码和非手机号输出到不同的文件中。我们的分区是按手机和非手机号码来分的,所以我们可以按该字段的长度来划分,如代码4.3所示。
package partition; import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner; public class KpiApp {
static final String INPUT_PATH = "hdfs://hadoop:9000/wlan";
static final String OUT_PATH = "hdfs://hadoop:9000/out";
public static void main(String[] args) throws Exception{
final Job job = new Job(new Configuration(), KpiApp.class.getSimpleName()); job.setJarByClass(KpiApp.class); //1.1 指定输入文件路径
FileInputFormat.setInputPaths(job, INPUT_PATH);
job.setInputFormatClass(TextInputFormat.class);//指定哪个类用来格式化输入文件 //1.2指定自定义的Mapper类
job.setMapperClass(MyMapper.class);
job.setMapOutputKeyClass(Text.class);//指定输出<k2,v2>的类型
job.setMapOutputValueClass(KpiWritable.class); //1.3 指定分区类
job.setPartitionerClass(KpiPartitioner.class);
job.setNumReduceTasks(2); //2.2 指定自定义的reduce类
job.setReducerClass(MyReducer.class);
job.setOutputKeyClass(Text.class);//指定输出<k3,v3>的类型
job.setOutputValueClass(KpiWritable.class); //2.3 指定输出到哪里
FileOutputFormat.setOutputPath(job, new Path(OUT_PATH));
job.setOutputFormatClass(TextOutputFormat.class);//设定输出文件的格式化类
job.waitForCompletion(true);//把代码提交给JobTracker执行
} static class MyMapper extends Mapper<LongWritable, Text, Text, KpiWritable>{
protected void map(LongWritable key, Text value, org.apache.hadoop.mapreduce.Mapper<LongWritable,Text,Text,KpiWritable>.Context context) throws IOException ,InterruptedException {
final String[] splited = value.toString().split("\t");
final String msisdn = splited[1];
final Text k2 = new Text(msisdn);
final KpiWritable v2 = new KpiWritable(splited[6],splited[7],splited[8],splited[9]);
context.write(k2, v2);
};
} static class MyReducer extends Reducer<Text, KpiWritable, Text, KpiWritable>{
/**
* @param k2 表示整个文件中不同的手机号码
* @param v2s 表示该手机号在不同时段的流量的集合
*/
protected void reduce(Text k2, java.lang.Iterable<KpiWritable> v2s, org.apache.hadoop.mapreduce.Reducer<Text,KpiWritable,Text,KpiWritable>.Context context) throws IOException ,InterruptedException {
long upPackNum = 0L;
long downPackNum = 0L;
long upPayLoad = 0L;
long downPayLoad = 0L; for (KpiWritable kpiWritable : v2s) {
upPackNum += kpiWritable.upPackNum;
downPackNum += kpiWritable.downPackNum;
upPayLoad += kpiWritable.upPayLoad;
downPayLoad += kpiWritable.downPayLoad;
} final KpiWritable v3 = new KpiWritable(upPackNum+"", downPackNum+"", upPayLoad+"", downPayLoad+"");
context.write(k2, v3);
};
} static class KpiPartitioner extends HashPartitioner<Text, KpiWritable>{
@Override
public int getPartition(Text key, KpiWritable value, int numReduceTasks) {
return (key.toString().length()==11)?0:1;
}
}
} class KpiWritable implements Writable{
long upPackNum;
long downPackNum;
long upPayLoad;
long downPayLoad; public KpiWritable(){} public KpiWritable(String upPackNum, String downPackNum, String upPayLoad, String downPayLoad){
this.upPackNum = Long.parseLong(upPackNum);
this.downPackNum = Long.parseLong(downPackNum);
this.upPayLoad = Long.parseLong(upPayLoad);
this.downPayLoad = Long.parseLong(downPayLoad);
} @Override
public void readFields(DataInput in) throws IOException {
this.upPackNum = in.readLong();
this.downPackNum = in.readLong();
this.upPayLoad = in.readLong();
this.downPayLoad = in.readLong();
} @Override
public void write(DataOutput out) throws IOException {
out.writeLong(upPackNum);
out.writeLong(downPackNum);
out.writeLong(upPayLoad);
out.writeLong(downPayLoad);
} @Override
public String toString() {
return upPackNum + "\t" + downPackNum + "\t" + upPayLoad + "\t" + downPayLoad;
}
}
代码 4.3
注意:分区的例子必须打成jar运行,运行结果如下图4.3,4.4所示,4.3表示手机号码流量,4.4为非手机号流量。
图 4.3
aaarticlea/png;base64," alt="" />
图4.4
我们知道一个分区对应一个Reducer任务是否是这样呢,我可以通过访问50030MapReduce端口来验证,在浏览器输入”http://hadoop:50030"可以看到MapReduce界面,如图4.5,4.6所示。
图 4.5
aaarticlea/png;base64," alt="" width="646" height="323" />
图4.6
从图中可以知道,该MapReduce任务有一个Mapper任务,两个Reducer任务,那么我们细看一下Reducer的两个任务到底是什么?如图4.7,4.8,4.9所示。task_201410070239_0002_r_000000表示第一个分区的输出,有20条记录,task_201410070239_0002_r_000001表示第二分区,有一条输出记录。和我们程序运行结果一样。
图 4.7
aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAt8AAAEnCAIAAAD+Qbh6AAAgAElEQVR4nO2d65WEIAxG7ciKKIhuLMZ+3B+OGCBBdB6Ce+/xnN1xNAbU8AlxGBYAAACAlhjudgAAAAAgAnUCAAAAbYE6AQAAgLZAnQAAAEBboE4AAACgLZ6gTsZxdM7d7cWjGIZhmqa7vQAAgH9Kqk6GYRiGYRzH9eM8z8maD+K9HzLmeb5g4ffqRHX+rP8NEsqFOgEAgLtQ+k5kKxvaqi8dfrXvvV+WZZqmC8da97ql78Q5J1vx1ZO1LF2znhTUCQAA3IWpTtbGaW2Af6NOwqFPWWhEnawF+b0P3wB1AgAA92Kqk1UxhAGLLx0+7zs5O4TUiDpZ//+9D98AdQIAAPeiNKjjOK4qYZ7n9f/Q7oY0lCTbY23PQkdLfQZGkr2RSJPkW/lVcqzVE5kzEcaJ5FBLwWDhK4vEh7CXzIYJboQKseowVPVaD7knqofJ9vmO0slQvWUPgzqR+9ZUCAAAwEfQ1cnaLHnvk5GdvNkLEkRuKXtBrOzRVTSEvpO8CcwHfULjuh5rbdfl/4s22hIsrB+lV8lX6rEKlPtOQoV471fRsG5p1WFQLUHqrXtJxZB7uO41z/Na56Hy171k5QSzhx4GdbL+03ueLwAAdIeuTkJDnmeqJo/+iToJwwGyZS0QGt2gh6QbuaZ5OS3+T0Z2CuokcTgpsnWsAuW8k0Q2SdQ6XFc654I4kHVY8HA9X7LDQwoXKbOSk2J5GM7+8IgkXwAA6A5dnQRREtrR8FVo8NS+k8vqJG9KCxLhHXVyaLCepMiV31p1WFYnBQ/dxrD1u1h5PLKDpOCh7O76xpvkAAAAZXR1IrVCaBdD87lu9ll1smRtZ8HCx9VJpbcJF9RJoQ5r+k5UD2WGzfrPetDKvhNLnYS8E7pPAADgx+jqZBE/wBoa9WSUJzxey/wPmdVR89idP+snmbbBiPxBWNlqri2u+lVos+VXMgs1yRJVj1Xggjop1GFZnZQ9HIxhOFkbVt6JpU5knw2pJwAA8Ev034pdRwpCAznEIxFhjXxSt15gKSBHEGTGaCI1ArKNzN9SWdtgmdKRfFU2ePaFo9x59at8A6sO1cySIcsFtmojCLJEFMq9EuWnepiUK/xPDwoAAPyMj70pGpIxAQAAAN7hM+pEPnDzkA0AAADvwK9sAQAAQFugTgAAAKAtUCcAAADQFqgTAAAAaAvUCQAAALQF6gQAAADaAnUCAAAAbYE6AQAAgLZAnQAAAEBboE4AAACgLf6hOpm912cVBgAAgBboT53MPpnLd/QH0wln+7qfqpPZj2d8BIArTE4PCSFicBcCdER/6mRZltmPe6CZ3Dm5MTnUCcDDkLd1FBL2D2cjBQDcSf/q5Kzc+Lk6+TaTR/rAPye+q/cAMftRfDE5nhMAeqF/dXI24jxMncRKDeBfMjk5cLPfFHF0iLUKALRMr+rESjsRX0VhKIxJO/dSJ+uWr40ml1iy7Fjrdbaj5pbdlI6Gb+uDp8n63M99nL3aoTRpZ/RzWOWmYDIeN4vLYPlj+Z/aYfgfvszsx9f1mcoROk8AuqFXdTL6Oe8GSUd8ZDO+bZj8H/aX+8pMkWS9ar/O2x2pMWT8nNwQOSf8V/3UrdsIk+lukwu6bfdI1lXip+qP5f8yuf08zd7RPsD32LUJ6gSgY7pWJ0q/rdabEIsY8cloZa0QZtmv9VZzwfQn6Z7+gjpJw3ZaGvtIBXWi+v+00TRoluSiRZ0AdEvf6iTLQFHbwAvqpMLOBW9VQz9UJ1JglVwKq9IjKXKvSp0s+iAXwCdR7gXyTgB6pXN1EsUbq6H+lDq5mIDajjopjEUpZf5g30lslQYCvkB0dW8/ucg7OwC90rs6iZrE+JdF9t+ETUY0jNQOmSYhhzn2RAnL/glvU5dTdRI5qgw1xe8mLFH4naZDf8zkVFWRyWCe5J2Y9ab7H/eu00LAh5HJJkuU3MTvnQD0SX/qJHrHZFnyXEyt9RUvyPgkJfVlzPtxSBrag3dPKtrY+K0a+W5L/I7MENJoXn4kSS22n/IgR5E3USCvgJ5k0yRG9hIku2r+WP7Hh0CbwKcppoTxW7EAPdKfOnkw300eVTIEP3w0kl8BAOAjoE5a4RdPeIUfivmcbZ5QAQDgTVAnAAAA0BaoEwAAAGgL1AkAAAC0BeoEAAAA2gJ1AgAAAG2BOgEAAIC2QJ0AAABAW6BOAAAAoC1QJwAAANAWqBMAAABoi3+uTiqm9YWbqZ0LGgAAHkO36iSa/vfK3HPrvDCFPWc/MmfMvbzm7mFqQXgDZoAC6JE+1cnkRJM1uathpzynLuqkCZj4GN5hjxVR1ACAxulRnWRR5mrff18N3+Q/I5U+ZedH9HWSoC1mP4rL5/qDDAD8mg7VyecegXpq+GY/fiSwfsrOz+jpJEFrxHok1ioA0DL9qZNyhNnTUV5BafbjMIzOrQkMPkpkmNzgJm1UerMi2/EtT0UfxQ5rK5VT5mecBzM58U2UYhO22bYP38oCn7BTquew2WvXvchWea31Vnkt+2F75yJ1IsvQl8qCG0iDBZ0nAN3wKHUie1X2zbb8EfH3FaHStjsOXHkvg9xeuhFvObmjAKj7GXcTpEfX+jwmJ0SA8P+snbKrmz6IalQtr8zUkdtY5bXsy+2TFKN919k72hkogzoB6JYHqROrHbb+poMG6RCCqk6iMezXB9FfUNMrYeuFC+pEHOm6nQLaMJpVXiP0Fz3Q7NunhVEeOAXqBKBb+lMnZoRJ12+RqVadHLfjhjo52WZafraqTrKyWeU11tvlrbKfbKENugEYkHcC0CsdqpP8Td/1nZ07+07OtJW99Z1k4dwyYaiTo76Tc+pEWKGhgSN4ZwegVzpUJ+vzs2zgtNSRPEtCVSdWPsRyQp0keun4/WbdzzR1xOxxCD9wKzeSPpy1U3Y11wBWeeNhmj0xxCqvZV9sP/tR5qDEvSi0NHAIv3cC0CddqpOl5t2QOClke2sn/PXzskzeT8GObOrit1vWb7R3TPadxB5VbWbmZ1Kq19tFqlPRsIjfd5mu2jmqYWVjq7zH66WkM+2LF6P8nooc74I2gSr4rViAHulVncBCkigAADwU1Emv8EQIAABPBXUCAAAAbYE6AQAAgLZAnQAAAEBboE4AAACgLVAnAAAA0BaoEwAAAGgL1AkAAAC0BeoEAAAA2gJ1AgAAAG2BOgEAAIC2+Ifq5HgO4e9TMz1wC/TiJwAAPIr+1Ekyt+2peWZe+946dd7qwxUXZnU+ZWX9R6bgue4nwBfZp7u2pxWPrlx9BwBom/7UybIssx/3ODO5c21oAxP7XnJhcltw3f8rrY9r6Xd+AnyP2Y/hmkz+F5fq7N125ctr+GyoAIAb6V+dnG1DG2hyr7hgaQ1bg9ylTibPEyp8h+Sa3j9GF93kpCCJVMu7twQA/Ir+1UnSY3AI6qSaT/oJ8D7pFaldoaLjZO0u2a9HLk6AfuhVnVhpJ+KrNIy91rpXQIvyKiaXWLLsWOst5HD4GPU2p8khpj9Wos1RAo4aik/5r/oZiuSm3dzo53zoXx7g8LyQFQDH5IOX2VUjOk5S4uEfAGiaXtXJ1hym0iEa8ZHt6bZh8n/YX+4rg16yXrVvIiNlPBYeaYwjf7TPR+u1b876H2k22TDEg/mFc3BwXKN+AAzi23ccM3ViixO0CUBXdK1OkkeprDPhFcdiESM+GWrAGiuy7NsYQyPR6mN/1M9H65VvTvtvVpz4P80yUfyxj9vAKBt0xn41rf16qTbWLyjGdAB6o291kmWgqKHpgjqpsFPJNnSRvP14hzo57X8hpXBrBrIUWMWf4nG1+gGoIn2OMG4GpAlAh3SuTtK3CtUg9Cl18laQs0Zwftt3cs5/u+/kJU+U0Ri97+T4uNljMMABWSen2usZjWC28FOMAFBD7+okajTjJLk9EImgNfsxGliQqSkyrUK0lHsTbNm3iMfAozwYS52o/mRlturi4Jvz/g9R3WYJJmNuQWwXfmjWPi96/QBYRD/vk+hZJUk2GeohuQmgG/pTJ2HcWea5xh/ysQLx4olXX4dx3o9DIlC0V0ms9WVX5faR/8m4hu6PasYyn39h/aTmsf+T95Px+7SrLWsALD5B5nHNAgBY7FeS8i5xqk7Op4oBQCP0p06gEegkBwCAL4E6gbO8Hkjp6wAAgC+BOgEAAIC2QJ0AAABAW6BOAAAAoC1QJwAAANAWqBMAAABoC9QJAAAAtAXqBAAAANoCdQIAAABtgToBAACAtkCdwLvcPH0fM7sBADyOh6mTMC0u/Ih4Eth76t+ep7k1uD4BAKroT51k045G8/t+cApSZT72CxaePf2u0CYfr/9T3Nx/U8G99VOBmEY68vLUtNbLYl72hn2x+twcwokmLdiZCxNtA0Cj9KdOlmWZnAg+sx+3T9H6t3lfnQQ7Dw2Kack+W/+nfWm45V+5s34OkKdSXvjRTTC5CvWwbyQ3t+wvy+Qv3RyzH2OxYdmJ40OzJwAAEvpXJ+Jjm9H/seokK9i1+r/YPB160x631k8Z0YZHVRm7HG11bGeZnOzZ1Owvs/cXbtnZO++dVCeWnTZjAgAcgjr5Oh00m5fIy3Wl/j9XO+3X8731c+aQqpCo6aCaEsmgqZnIyhV1suZBp4dS7bQZEgDgmP7VieiundzgpmzUexuQdtM+Ah11Vuej6tvavF0QWS/JM6U5Nl/Xvqw91c6NwzAMzq8G4/Gr6LgH2+/lkn30Yf/X16OfD+qn7LGmTurrP8kUMFIFzCSI1MlT7XhuZ8sL0bImtPO7bR8spYNcuZ96/STb51ehWhFfYT1iXIt7vkzF4FkqR2IFodmfvTeq0PZy7UzK1UluZ/VnP30oFYBu6FWdqEF7csOgdiVLOTM5uYNs/91Ra5cMyKu93/nDWm2ruVkXf/eyCAtbAqi9fTLgLx2a3OCcC0pBSdiRdXLkb65OIlESN8u6fa129PIup8+XiWHHuH7M8zs5cfnF5T1jv3S+buo7SaS3c4psqdg1VSfKRrLslXkt+2mJrhLNzqpL1CQaAGibXtWJGsRsmbB/iEbxi/2+WlqFHdush/1T6mT0c/43f0tpPYK1fXq86LPRABj1U+WvZiX9ZNtXascob3aAQ39MDDslmamd3ywpQ8/WOLBfPF/3jFfpwriica9SJ3ahjvNahO4r3I7BTgNaDwCu8U/USeiVzhrf4iBOXdJn9PrAW30nusowjmttn0ZtsxcpMabXT4XDAbO1LthXaqeoQczzdbLh0ewY1495fgvlPWG/eL5uak8359LBnMM0jpq8k4KdA/u5bjV6dIT/qBOAPvkv6mSNs4WfFc0Daa06iSP4F9SJYeBq34ke/o/qp+Txht13YtvX+07OjyFcbXeSERa9b+OwR8j2oMr+/X0ncZ+aKYwP1Ul8WnatYtg319e5bPUJWvaP/QeAVvg36mSZ/TiOcV5/nAKR/qCXnleRJxREa9cP0QsEb6uTpEt9M25vL+O2kneih2elfqp8jo2b+ROWfZnfMYl98/IWz9epX2Oz7Jh9G8b5nZye03DOfvF8qfXzBdKmXri8uyN8ifqTUktKxRXsa2KmYN90WbWzGqoaGQSApuhPnYTe3byLYl+v9v9n+Z5xR7HSLa92zuerxcrX2zMyJ6/YBZ36sr2FE/6KViIyU729lAsCNfnkQuiW+mTyfir8LKdhP7iaJCzk1VY4X6f6F1Q7hevHOr+TC29LDcmj+yn7i3a+yvXzBXYXsi5D7Qs7hTWU1LJj2k+vAKvEUQUL6aHXoXVgAGiZ/tTJZa797tP/4XL9VGqDb9V/mhvxO/73o/glMduQfQBomv+gTrZuBh6cdL5dP8+sf6OH4H9w/HJN2/YBoHn+gzoBeB5DSwsAwIchsgC0ye2aAykDALdBsKiHEAw/4Hb10OYCAP8LbvsC/zPa0gjdyO0ioLsFAJ4Jt/dyd3hl+cHSOLfXz2MWAHgIT7qfb4+MLF0st3N7DfyTBQA6ppd7+PZIx/KYhQuYZbjpMgCAWnq5S2+PZSzPWw6vK67b/7MAQFv0clveHrxY/uHC5fqfFwC4k15uwttDFcs/XLhQWcICAD+ll7vuk4HG+9sjHYuytHpefnpxsjS8AMDv6OWW22PE7IcEP9fGl9e+7vYwxxItnBeW9xcxF/Ewi/UhYozvyl8A+B293HJRmJi9CDTTMAzDVB9iJlrBJhfOC8sby+QGN+3/79eSiA9u2Le5tADA7+jllovCRKROlsE9S51MPnryu93Oj5bmzwtLy0sUBOZh3LpP/CgUyZR2q5xcAOB39HLLRWEi7Ts5FXEabwVFYG3Czs+Wxs8LS9uLkwM34uJ3cuR3HsZTTzLpAgC/o5dbLgoTUeqJMcacDPeEMWm39fquW76eq6Z0WNqyY623lnwsvHBcN0Ss24Ttw7drtD1rx1pCidz0siNTec7Wg1Vey35+XlI7H0gXYPlfy+y3az6TI+5Mmlq2AMDv6OWWS6PP2mK5rMmMRny2BlKOQyf/h5Zb7jv7qHGV61X71iKPNfvj477iaWZ2bapfJRV9RWftlF0N+iB0hlvlterHKq9l3zovyyQU1Tw41Mm3Fv3+6nuZh9GJ/1EnAH3Syy0XhYnQIk4uaj79OCSEbLg9Qk3HKsEKYZZ9M0rG+sCPL7MX1Ins/7hsp7BMThlYKdSnUj92eS371nlhlOfk0sqNef+SXISoE4Bu6eWWi8LE3hLHwchKj72gTo7tHC5ZQozsM2hQneQDQOfqwS5vlf1EkWxjQAzrbMsD+HIVadc8eScAndLLLReFCdkSyyZQPqwnEeoj6sSyXxkrG+87ydVDVX1WlNeyX1In4lz/736U/8y5upIjrbN/XVq8swPQKb3ccmmLFb2zI3MdRPTxIh9CqhmpTqQiUfMnlmXwTsm3WCp+21TaT/JO1OMuS/R4N8nfb9ByPs7aKbuaq4dCfar1Y5XXsl84L1FvfF8vH31mgQJKjfkxumbCNcnvnQB0Si+33B4jondAtqATJVduyOf+kELhvRgvmIeQWRGtt+1Y661l3152ANjHlbtEwyJ+32W6akdd5ji7JNn4bD3k5U1+2zexr56XZJc3EgV6XKAe5UbIbzd+KxagR3q85W5vP369qB0P34mw9xf2/y3wKThTAM+hx1vu9ubkp8sbT35UcssL3AKnDKAPerzlbm9X2lw4I+0v0BqcPoBG6fGWu72NaWHpkXtrg5MFZTiPAA3R4y13uzKgefsgt9cD5w4AoDl6jJ63awXas/8DZxAA4AZ6jKS3qwfEBwAAwBd5TGOJaAAAAHgItMoAAADQFqgTAAAAaAvUCQAAALQF6gQAAADaAnXycWbvp7t9qKEXPwEA4N/RqTqRU+QOw+jnin1mPyZbhul5EwPW+qpDDMPgSq1+Mifw+YN8gBo/AfpiDwrKLTX5399mAPAGPaqT2Y8i/EyuspnN1Ylm7nj9AZM7dCeyXO3/h6nwE6AX5OUc3VLb08ANDwEA8AYdqpPZj3Gz+uYQxc3q5C6ZgDqB5xBfzdmte/FeBoD76FOdfDTS3N93ckvgRJ3Ac5ic7B1BnQD0T4fqZOusTdrWbWUYfU5Gf8ze3VPqRGSNRMcPR3WuSp1YaSeW/cPjBjthSzeFr+NxJM1PmchDHIeuyXpXUScA/dGlOlmWJbTCMuhMTjTeWZ/E2T6SfH06IrN9kOPcNWkkm520+8KyLzNm5DbyWLMfE7Wx6Y/9G9PPye27zt4Rx6Ffcm2COgHokH7VyUqU6loefH5bnWRv2+yNfqILakd2YgVVsF+TzXuYa2v7ySgPPAPjZkadAHRH7+pkkU3rl9WJ1YZfVyeZoqiwL1en5SvrjaKfxcEvgA6wNQjqBKA7OlQn6Ss6P1Mn1oZvqJNIUtTZN905VDpVfiYjRACdsI+ELlmQQJ0AdEeH6iTuNJCtqUzcV3977f28k+jIfj9ulJN69p0dmTti2Bc298QQWRFK3knmheWnTDtJojxADyTJJmnyFJoboDt6VCdxfkbyyyE+fGm8kTLELfqZ9bEluVr8uqx35fGR6J2aYDL6oIyxHK+XEidLXSn7aVYoQBdYKVvZV0gUgF7oUZ2YkNwJAADwAJ6jTi5PjgMAAABN8Rx1AgAAAM8AdQIAAABtgToBAACAtkCdAAAAQFugTgAAAKAtUCcAAADQFqgTAAAAaAvUCQAAALQF6gQAAADaAnUCtVycHjCdkQ0AAOAA1AlUkUwCe3ZfphcAAIB6ulUn0bTDblKmKU3m3321ret+7zSWsx//3Vw+72iTZVku97sAJMTzjSvTfmZamBm4AHqkT3UyORGWJreFnWiO4tmPr09JuJq9f6uh/X/q5BN9H2/rG4BlWZbJFy/F2Y+xDNljRRQ1AKBxelQnWZTZ9EakTvaPF9XJQRjsltPl+szADMM78AHKd+/snfdOqJP9GWVZ5IMMALROh+rEfgT6pDp5amN6vlyfqomn1ij8ktLdu+ZfRxIk1iOxVgGAlulPnRQiTKROdhFzVp0kI9uJzWQAe+1Jdm4chmFwfh3ijseXzBHyg2PLxBk3hW+iJj63v22fj7bb5SoSV99BeXP/DTsAV5i9N26FrVNQCpI0WNB5AtANT1MnWuP74b6T9JstD0X8fX0fb1mRGDo5qWtcJCu2b0SAtexPbo/caXW933dil9fy/+qRAVJkz2mSfrbnvaNOAPrnaepE++b76mT0c/43f4vosLfCKEC8+th+2of0cXWil9f0/+qRAYrssUDoYtQJwCPoT50UIkxj6qTYVlsog0eWOjHtf1ad5PVnqhPD/6sHBijzurhznb5dgeSdAPRKh+okf6PXeGdH7HBX38n19lgGUrvvRLf/YXWSZ54U1Inm/9XjAiTEKfHqRTXxzg7AE+hQnawhSqblHw0sJH29VQFKhLVpmpJvKkc6Yh11rIpk2kaaRyJyTdT/pf0DdWKUq4T8tRK7vJb/6QeAi6SXtnJVJWv5vROAPulSnSz2uypGcsfZV2eWReTY6m+kvATDy/L2Fkv4KxNUjbGOkpdDrL/C2zFpAXL7UT0Y40Qna+Jldns92yyv5T/aBD6HuOTTKzi6AKWcrr0DAaAZelUn/4pLCSzfcONSeOdXYgEA4CSok9bhyQ8AAP4bqBMAAABoC9QJAAAAtAXqBAAAANoCdQIAAABtgToBAACAtkCdAAAAQFugTgAAAKAtUCcAAADQFqgTAAAAaAvUCcBnuHk2odk7fk0YAJ4C6gQSjudSPjW98T8hnk3onvqpnX8bAKB5OlQn8Uy452fbPXGEpmL97MdvO/QqeLE6122Y1y9CaJN76+ft/pvJJ/vH83JXFGzfIXbFWn/udkvcEQ4dTQeOdgPoiv7USejA3qPN5L7RGLQWzX6gTpalakLkRuZMbob0Srmzfi7PCL1phPQSy+TKoRWp047+X0XF+mn/r2hfbLIPZkU3h2Zo9mNjTxsAUKQ/dRI4qx7Ohdn21ImFVa6z5d12Q52cJLtQrtXPxfN16M17O9eM85n7h4/W+lypHPgeVZJ4KomrPLa6rDLGHxoHgIb4N+rkfNDuQ51YXl72HnVykrymr9TP5662dyy9q07Skm+frfWJHslURdlXF+0Z25Fm1i2PpQ8ANMTT1Ik2uJ0OVe+BS6awZLZORXlhKQ3Dsf21h9m5NcHDh0QPa31kxRrIjw9sl7fCT+cq1YmeLaDb1+p5y88IRz6o7CifY3LJDrLM+3rz/Ka5S+FLq37KjmnqJKufzUU3JcMob52vvOI+r05qT1HW+RHGW6z1qRw5oSDS4dz9+kgHt179LagTgK54lDqR481p3FNidvRAlz/v1kd5Oegt99L92bYWf0Ok1taX/TnTdxKv2xMopZ81g/9SGyQJBZp9s54nF2uqo+qW+0bHki3V/khdOq52usz6KaNfh1KUyAtCOJGMPdSfL6O8BUu1lO+pimsj3nwctxvDWH9dneS5ZrMfnXOZiAobok4AuuJB6qT8udCKGw/L1VHeCHuWP2f/lv05oU7yl51C77p8GfbkyM7+wbKffmknCxwlHdjqxPTZOK7swYnftrF6MUyORnYSXbS1lf5QU9j+FM/RZ9VJ5lFF3mrwVq9dsf6qOslyf6Xj8mEhTk1BnQD0w4PUSRp94shnRX/xiPuGOtFCtuXPberEatM+pU6sPc16/pg6WdTBr8L53ZvKqHPiWjpN5nihXFuzmqXAnjhf25f6YMtbCSxHO5+sIksQ7Ouv5Z1kbmaDOS4aQItAogB0wYPUydm+kziifV6dtNh3YrUVn+o7MfqQDnsSSu7pxy31hYURNPO41YNhVShKyazQ2Y9uUn7W9cT5Sjc6GsE8gSa0yhK/xLE4Sdyv7d7ItjOzbiu8AYAmeZA6ieOPErXDM/60DWjEKRBuil5QqA/FaQgXiQ+KPx9XJ0m57PWyy3tZ9sImrcVVdWLYt+t5cnGSbEXeiegKkekc8ehAnuqQnl87iVarn2MyJWTmPy2zH8cxt3zufGnlzT+cJ/M2PdXRdaIdKuoWUfdNslf2j2m1mdeDcq1ERvUuGNQJQFf0qk6s1wj2Zkd7dsrzD8LWXrwSknQI18Q0YUodcAhHfdne3s45/BsaWtOfvFyF9YafocSj9+nrMAlhUzcp4wuqfaueJxfeTtJOWOHY237rMeLzdXjc5ME6UhbWeaxxbb+uvJ+0oaNwDGuArfJ8WeV9S5vERhP5YHhmDGiqJ9S+M8XFl9w5xiWhK9nCqYvKdmn0DgB+Ta/qBB7A1USP91AyMT/jRaU2OPcLIvVkmaJf5ju/0Pw7+wDQNKgTuAf9ifnHx/7p8bduswcML9Smr7ZqHwCaB3UCAAAAbYE6AQAAgLZAnQAAAEBboE4AAACgLVAnAAAA0BaoEwAAAGgL1AkAAAC0BeoEAAAA2gJ1AgAAAG2BOoFa3pxh7l2UiX0BAOCZoE6gimQW3umOnxmvnzX6bu6pHwCAx1ekMqwAAApqSURBVNCdOommyF22mUnT+XZ/24jdOWXMbxDaZBaTDP+em/tvKri3firQ5xyOp8FO7y999uIj89rmk8/P31yY0FnBuN2MuZTlF61fOwCw0506WbJH6HQu09+rE9WtB5GW7J65hYMvDbf8K3fWzwHyVMobJa7WfRDNWm+jyY/taJpCEBP+Ta5C1u0byc2tckXnoso+ALRBj+pkmZyIcW9PSG+F07M8Vp1kBbvW+n6onjuo5lvrp0w0+a+oyujgQu9b6wsHKN2O+ck7WVfx5MV7IDDLFZnv4NoBgBddqhMZc94VJ58LWE8NfW+3KIaVz/nTGvfWz5lDam5aHSR1Wckn1cnZqooeTBKtItZG3SXySab1SwcANvpUJyKoJQ93+fhylKjy2mD9Ph1SXy1GeQOTG5LoFrbOwlxd6Jv9OAyjc+MwDIPzq8E9woojCHfs7fMxdbO826ZusrvZDY81dZKN/pv29Xo2ypuVK3PyVBOT29nOr5a9oJ3fbftgKR3kyv3U6yfZfl9dqp/vkCdrye/049d0nCyrOtGrav02PnervNir/egQqRyJxUqxXEsfo4IA8KJTdRKinNIdrrZekxucc6HljORAtrV8oLN6ifOHvtpWcxsWF3/37mlhYUsAtbdPBt6TJEelvPEgfGWkVtVJJEqMQf7IvlY7enmTfbNH9hPqxLAj/TdqJ8tXCJ/i8p6xXzpfN/Wd5HLgTXESlTHP89DUSSwEy5VwoE70jfbVaBOAfuhUnWxhTouZljrRH8xOqJPts/GMe0KdjH7O/0ZP7uII1vaKZ/HzuFJeo8+pyl/NSvrJtq/UjlHe7ACH/pgYdkoyUzu/ZvLCKfvF83XPoEN2UKsFv9iyZzpB7zspOHRgUFUnmhnGdAB6o1d1svWeK4PcZt+JGl5PqJN1kMVsmN5WJ4aP1vZpZI4it1XerZk5k4J5kCwQfW3aV2qnqEG0QTrLUAnNjqFOzPNbSq2st188Xzc1n7k0U5247Fx6it9VJ1V5J9lxkSYAHdKtOlkmN4yjloH3LXUSPz9+QZ0YBq72nZjlddPJn11Nj2T3ndj29b6T4wr71AhIMsKi920c9gjZHlTZv7/vJO5TU06s6oS1/qx9Veja15JCfDnsfhWOG/1Ezttv+AHAj+hXnZiD1KfVSRiMmPbWRTxCqwFw/RDFurfVSVKkzbi9vWw0lLwTPQzPftRV3YHPsXEzf8Kyr9WzXt40xyFqXU79Gptlx+zbMM6vfPFD+nzOfvF8qfXzBSKdofXmGPpYSU1W00/Toiu7ZZko2oieZV+epiTFRS1XMiTFbAgA3dCvOtEiTfz2g5rNkbfZYafo2Wtb50XenjD/Wh+eivPjml6Pw7C/hRP+CjWkuX+8vZQLxfLW58Mmfu9Nt/dT4ec9DftKPSvlzYtwVZsYdsJKN6XjMtb5nVx4W8q4TOrsL9r5KtfPF9BeG9rLU6lOCplcshb1hB7zGkiGxwoPFIr/mhk7tQkAGqdjdQLXuNy5XakNvtV5fjEz8wMcjjf8Sy6J3IbsA0DToE7+D1s3zLf6tr9t/x6MJ/X/jZmO2ol9AGge1AkAAAC0BeoEAAAA2gJ1AgAAAG2BOgEAAIC2QJ0AAABAW6BOAAAAoC1QJwAAANAWqBMAAABoC9QJAAAAtAXqBAAAANrin6uTb04HC+dgcnsAAHjRrTqJpiO+MiXHLCahNTf45twq37bfEa+5bJhZBb4AMyUB9Eif6iSaXX1yV8NOee5Z1MlPYSJg+AZ7rIiiBgA0To/qJIsyV8cEnt0gTr4r6fPskwH3EE92fP1BBgB+TYfq5HOPQE9uEGc/9hWIn3wy4C5iPRJrFQBomf7USTnC7Okor6A0+3EYRufWxAYfJThMbnCTNiq9WZHt+5anoo9ih7VVyumM/W19KFmyfje4WYtScg4dCsdzU9h198sql7U+q/8D+2F75yJ1IsvQl8qChkiDBZ0nAN3wKHUie1X2zbb8DvH3FaHSNj0OXHnvg9xeuhFvObm6AFhvf3JCBAg/ZXdDau1k38nkgj6Iak4tl8yYkdvo9W/bl9snqUT7rrN3tCdwDdQJQLc8SJ1Y7bP1Nx1MSIcWVPUQjWG/Poh+hDPvEFXbTzzT/X9fnWReW+UyQnzRA82+Xf2M8sBHQJ0AdEt/6sSMMOn6LTLVqpPj9t1QDxfb0tbUSVYGq1zGeqv+K+0nW2iDXwAnIe8EoFc6VCf5m7jrOzt39p1caUObVyeWCUOdHPWdnFMnwgoNClyFd3YAeqVDdbI+V8uGT0sdybMnVHVi5UksZ9RDrJdq328+pU6iJFOlvHGVxOWp+UFcVRtY5YqHafbEEL3+bfti+9mPMgcl7kWhRYHL8HsnAH3SpTpZat4ZiZNCtrd2wl8/L8vk/RTsyCYwfutl/UZ792TfSexR0ZaetD+58LZRXN69Fl7fq4WofGdH3dgq1/F6Kd1M++IFJb+nBMe7oE3gLfitWIAe6VWd/CtIEgUAgH8F6qR1ePIDAID/BuoEAAAA2gJ1AgAAAG2BOgEAAIC2QJ0AAABAW6BOAAAAoC1QJwAAANAWqBMAAABoC9QJAAAAtAXqBAAAANoCdQIAAABt8Q/VSe0cwvAdqH8AADigP3WSzHl7av6Z176/nVJv9iNz5KzcUv/wT9jnx1Zut8lzCwJ0RX/qZFmW2Y97/Jncuebu5xP+ok4imHAZvoC8rKKQsD3NcAsC9EX/6uRsc/e41rGzp8LH1T80QHxVxQFCWwEArdO/Opncuceih7WO3cXdh9U/NMHkZO8I6gSgf3pVJ1baifgqagPDmLRzr9Zx3fK10eQSS5Yda73OdtTcspuCpdeX2/rgabI+93MfZ69xKBzPTWHXKJ6fKm8+xl+2n9d/aoe+d/gQsx+TewF1AtAdvaqT0c/5Y3g64iOb8W3D5H85PB32lZkiyXrVfp23O1JjzH6MfJMfhP+qn7r1IpML+mA/slUuqx5kHUr/LftW/S+T23edvaP9gPfJtQnqBKBDulYnyahO9jbP3jgm7WdZnVhjRZb9Wm81F0x/5H6fVSeZ14V600wXPdDsm/XPKA98GON2QJ0AdEff6iRrGdW27oI6qbBzwVvV0M/VSVaGk+VNRUvUe3JsP9lCG/wCuIJ9L6BOALqjc3USNY1WCPqUOrkY4ppXJ3X1Zm5+qBCL6kRYoR8F3iAaaU1+8g91AtAdvauTqLWLf1lkD1DiaX/2o5Xase8aD0/sCRGW/RPepi6n6iRyVBlqit9NWKJWfZqq8nRzDVCoN7UeZO+JkneS2bfrP+5FoQGBiyTJJmkSE9oXoDv6UyfRuyHLkueSaq+AiBdkfJKS+jLm/TgkAkV7lcRarxO/VbPuoL3bMoQ0mpcfSVKL7ac8SOU7O+rGZ8u7r5fS0LSv13+8C9oErlJICYu/QqIA9EJ/6uTBkCQKAACwoE7aIf39EwAAgP8K6gQAAADaAnUCAAAAbYE6AQAAgLZAnQAAAEBboE4AAACgLVAnAAAA0BaoEwAAAGgL1AkAAAC0BeoEAAAA2gJ1AgAAAG2BOgEAAIC2+AN6M7yG4EVVDQAAAABJRU5ErkJggg==" alt="" width="624" height="247" />
图 4.8 第一分区
aaarticlea/png;base64," alt="" width="635" height="287" />
图 4.9 第二分区
综上一些列分析,分区的用处如下:
1.根据业务需要,产生多个输出文件
2.多个reduce任务在并发运行,提高整体job的运行效率
Hadoop日记Day17---计数器、map规约、分区学习的更多相关文章
- Hadoop日记系列目录
下面是Hadoop日记系列的目录,由于目前时间不是很充裕,以后的更新的速度会变慢,会按照一星期发布一期的原则进行,希望能和大家相互学习.交流. 目录安排 1> Hadoop日记Day1---H ...
- splittability A SequenceFile can be split by Hadoop and distributed across map jobs whereas a GZIP file cannot be.
splittability CompressedStorage Skip to end of metadata Created by Confluence Administrator, l ...
- Hadoop日记Day12---MapReduce学习
一.MapReduce简介 1.1MapReduce概述 MapReduce是一种分布式计算模型,由Google提出,主要用于搜索领域,解决海量数据的计算问题.MR由两个阶段组成:Map和Reduce ...
- pwn学习日记Day17 《程序员的自我修养》读书笔记
静态链接章小结 本章首先学习了静态链接的第一步骤,即目标文件在被链接成最终可执行文件时,输入目标文件中的各段是如何被合并到输出文件中的,链接器如何为它们分配在输出文件中的空间和地址.一旦输入段中的最终 ...
- Hadoop日记Day15---MapReduce新旧api的比较
我使用hadoop的是hadoop1.1.2,而很多公司也在使用hadoop0.2x版本,因此市面上的hadoop资料版本不一,为了扩充自己的知识面,MapReduce的新旧api进行了比较研究. h ...
- Hadoop日记Day13---使用hadoop自定义类型处理手机上网日志
测试数据的下载地址为:http://pan.baidu.com/s/1gdgSn6r 一.文件分析 首先可以用文本编辑器打开一个HTTP_20130313143750.dat的二进制文件,这个文件的内 ...
- Hadoop MapReduce 保姆级吐血宝典,学习与面试必读此文!
Hadoop 涉及的知识点如下图所示,本文将逐一讲解: 本文档参考了关于 Hadoop 的官网及其他众多资料整理而成,为了整洁的排版及舒适的阅读,对于模糊不清晰的图片及黑白图片进行重新绘制成了高清彩图 ...
- Hadoop日记Day18---MapReduce排序分组
本节所用到的数据下载地址为:http://pan.baidu.com/s/1bnfELmZ MapReduce的排序分组任务与要求 我们知道排序分组是MapReduce中Mapper端的第四步,其中分 ...
- Hadoop日记Day1---Hadoop介绍
一.Hadoop项目简介 1. Hadoop是什么 Hadoop是一个适合大数据的分布式存储与计算平台. 作者:Doug Cutting:Lucene,Nutch. 受Google三篇论文的启发 2. ...
随机推荐
- SQL Server日期时间格式转换字符串详解 (详询请加qq:2085920154)
在SQL Server数据库中,SQL Server日期时间格式转换字符串可以改变SQL Server日期和时间的格式,是每个SQL数据库用户都应该掌握的.本文我们主要就介绍一下SQL Server日 ...
- Bash Shell字符串操作
转自:http://my.oschina.net/aiguozhe/blog/41557,并对内容作了验证修改. 1. 取长度 str="abcd" 2.查找子串的位置 貌似也只有 ...
- CMake 使用方法(转)
CMake是一个跨平台的安装(编译)工具,可以用简单的语句来描述所有平台的安装(编译过程).他能够输出各种各样的makefile或者project文件,能测试编译器所支持的C++特性,类似UNIX下的 ...
- 炫酷CSS
<!DOCTYPE html><!--To change this license header, choose License Headers in Project Propert ...
- AngularJS四大特性
Google AnguarJS是一个JS框架,适用于以数据的CRUD操作为主的SPA应用. 四大特性: (1)MVC模型 Model:模型,即数据=>JS中的变量 View:视图,即数据的呈现= ...
- button、label、textfield、页面跳转、传值
.AppDelegate.m #import “OneViewController.h” //一打开就运行的 -(BOOL)application:(UIApplication *)applicati ...
- poj3728
[描述] 有 N 城 市在一个国家,有一个且只有一个简单的路径每一对城市之间. 一个商人选择了一些路径和想赚尽可能多的钱在每个路径. 当他沿着一条路径,可以选择一个城市购买一些商品和出售他们在一个城市 ...
- Win7 64位 VS2015及MinGW环境编译FFMPEG-20160326
因为又要弄MinGW了,所以顺便把FFMPEG编译了,文章主要参考这篇,防抽所以复制一遍,顺便加些自己的内容 http://blog.csdn.net/finewind/article/details ...
- SQL一致性错误修复SQL
USE master; ); SET @databasename = 'BenlaiTask'; ALTER DATABASE BenlaiTask SET SINGLE_USER WITH ROLL ...
- Discuz!X2大附件上传插件-Xproer.HttpUploader6
插件代码(github):https://github.com/1269085759/up6-discuz 插件代码(coding):https://coding.net/u/xproer/p/up6 ...