影评案例

数据及需求

数据格式

movies.dat  3884条数据

1::Toy Story (1995)::Animation|Children's|Comedy
2::Jumanji (1995)::Adventure|Children's|Fantasy
3::Grumpier Old Men (1995)::Comedy|Romance
4::Waiting to Exhale (1995)::Comedy|Drama
5::Father of the Bride Part II (1995)::Comedy
6::Heat (1995)::Action|Crime|Thriller
7::Sabrina (1995)::Comedy|Romance
8::Tom and Huck (1995)::Adventure|Children's
9::Sudden Death (1995)::Action
10::GoldenEye (1995)::Action|Adventure|Thriller

users.dat  6041条数据

1::F::1::10::48067
2::M::56::16::70072
3::M::25::15::55117
4::M::45::7::02460
5::M::25::20::55455
6::F::50::9::55117
7::M::35::1::06810
8::M::25::12::11413
9::M::25::17::61614
10::F::35::1::95370

ratings.dat  1000210条数据

1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275
1::2355::5::978824291
1::1197::3::978302268
1::1287::5::978302039
1::2804::5::978300719
1::594::4::978302268
1::919::4::978301368

数据解释

1、users.dat 数据格式为: 2::M::56::16::70072
对应字段为:UserID BigInt, Gender String, Age Int, Occupation String, Zipcode String
对应字段中文解释:用户id,性别,年龄,职业,邮政编码

2、movies.dat 数据格式为: 2::Jumanji (1995)::Adventure|Children's|Fantasy
对应字段为:MovieID BigInt, Title String, Genres String
对应字段中文解释:电影ID,电影名字,电影类型

3、ratings.dat 数据格式为: 1::1193::5::978300760
对应字段为:UserID BigInt, MovieID BigInt, Rating Double, Timestamped String
对应字段中文解释:用户ID,电影ID,评分,评分时间戳

用户ID,电影ID,评分,评分时间戳,性别,年龄,职业,邮政编码,电影名字,电影类型
userid, movieId, rate, ts, gender, age, occupation, zipcode, movieName, movieType

需求统计

(1)求被评分次数最多的10部电影,并给出评分次数(电影名,评分次数)
(2)分别求男性,女性当中评分最高的10部电影(性别,电影名,评分)
(3)求movieid = 2116这部电影各年龄段(因为年龄就只有7个,就按这个7个分就好了)的平均影评(年龄段,评分)
(4)求最喜欢看电影(影评次数最多)的那位女性评最高分的10部电影的平均影评分(人,电影名,影评)
(5)求好片(评分>=4.0)最多的那个年份的最好看的10部电影
(6)求1997年上映的电影中,评分最高的10部Comedy类电影
(7)该影评库中各种类型电影中评价最高的5部电影(类型,电影名,平均影评分)
(8)各年评分最高的电影类型(年份,类型,影评分)
(9)每个地区最高评分的电影名,把结果存入HDFS(地区,电影名,电影评分)

代码实现

1、求被评分次数最多的10部电影,并给出评分次数(电影名,评分次数)

分析:此问题涉及到2个文件,ratings.dat和movies.dat,2个文件数据量倾斜比较严重,此处应该使用mapjoin方法,先将数据量较小的文件预先加载到内存中

MovieMR1_1.java

 public class MovieMR1_1 {

     public static void main(String[] args) throws Exception {

         if(args.length < 4) {
args = new String[4];
args[0] = "/movie/input/";
args[1] = "/movie/output/";
args[2] = "/movie/cache/movies.dat";
args[3] = "/movie/output_last/";
} Configuration conf1 = new Configuration();
conf1.set("fs.defaultFS", "hdfs://hadoop1:9000/");
System.setProperty("HADOOP_USER_NAME", "hadoop");
FileSystem fs1 = FileSystem.get(conf1); Job job1 = Job.getInstance(conf1); job1.setJarByClass(MovieMR1_1.class); job1.setMapperClass(MoviesMapJoinRatingsMapper1.class);
job1.setReducerClass(MovieMR1Reducer1.class); job1.setMapOutputKeyClass(Text.class);
job1.setMapOutputValueClass(IntWritable.class); job1.setOutputKeyClass(Text.class);
job1.setOutputValueClass(IntWritable.class); //缓存普通文件到task运行节点的工作目录
URI uri = new URI("hdfs://hadoop1:9000"+args[2]);
System.out.println(uri);
job1.addCacheFile(uri); Path inputPath1 = new Path(args[0]);
Path outputPath1 = new Path(args[1]);
if(fs1.exists(outputPath1)) {
fs1.delete(outputPath1, true);
}
FileInputFormat.setInputPaths(job1, inputPath1);
FileOutputFormat.setOutputPath(job1, outputPath1); boolean isDone = job1.waitForCompletion(true);
System.exit(isDone ? 0 : 1); } public static class MoviesMapJoinRatingsMapper1 extends Mapper<LongWritable, Text, Text, IntWritable>{ //用了存放加载到内存中的movies.dat数据
private static Map<String,String> movieMap = new HashMap<>();
//key:电影ID
Text outKey = new Text();
//value:电影名+电影类型
IntWritable outValue = new IntWritable(); /**
* movies.dat: 1::Toy Story (1995)::Animation|Children's|Comedy
*
*
* 将小表(movies.dat)中的数据预先加载到内存中去
* */
@Override
protected void setup(Context context) throws IOException, InterruptedException { Path[] localCacheFiles = context.getLocalCacheFiles(); String strPath = localCacheFiles[0].toUri().toString(); BufferedReader br = new BufferedReader(new FileReader(strPath));
String readLine;
while((readLine = br.readLine()) != null) { String[] split = readLine.split("::");
String movieId = split[0];
String movieName = split[1];
String movieType = split[2]; movieMap.put(movieId, movieName+"\t"+movieType);
} br.close();
} /**
* movies.dat: 1 :: Toy Story (1995) :: Animation|Children's|Comedy
* 电影ID 电影名字 电影类型
*
* ratings.dat: 1 :: 1193 :: 5 :: 978300760
* 用户ID 电影ID 评分 评分时间戳
*
* value: ratings.dat读取的数据
* */
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException { String[] split = value.toString().split("::"); String userId = split[0];
String movieId = split[1];
String movieRate = split[2]; //根据movieId从内存中获取电影名和类型
String movieNameAndType = movieMap.get(movieId);
String movieName = movieNameAndType.split("\t")[0];
String movieType = movieNameAndType.split("\t")[1]; outKey.set(movieName);
outValue.set(Integer.parseInt(movieRate)); context.write(outKey, outValue); } } public static class MovieMR1Reducer1 extends Reducer<Text, IntWritable, Text, IntWritable>{
//每部电影评论的次数
int count;
//评分次数
IntWritable outValue = new IntWritable(); @Override
protected void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException { count = 0; for(IntWritable value : values) {
count++;
} outValue.set(count); context.write(key, outValue);
} } }

MovieMR1_2.java

 public class MovieMR1_2 {

     public static void main(String[] args) throws Exception {
if(args.length < 2) {
args = new String[2];
args[0] = "/movie/output/";
args[1] = "/movie/output_last/";
} Configuration conf1 = new Configuration();
conf1.set("fs.defaultFS", "hdfs://hadoop1:9000/");
System.setProperty("HADOOP_USER_NAME", "hadoop");
FileSystem fs1 = FileSystem.get(conf1); Job job = Job.getInstance(conf1); job.setJarByClass(MovieMR1_2.class); job.setMapperClass(MoviesMapJoinRatingsMapper2.class);
job.setReducerClass(MovieMR1Reducer2.class); job.setMapOutputKeyClass(MovieRating.class);
job.setMapOutputValueClass(NullWritable.class); job.setOutputKeyClass(MovieRating.class);
job.setOutputValueClass(NullWritable.class); Path inputPath1 = new Path(args[0]);
Path outputPath1 = new Path(args[1]);
if(fs1.exists(outputPath1)) {
fs1.delete(outputPath1, true);
}
//对第一步的输出结果进行降序排序
FileInputFormat.setInputPaths(job, inputPath1);
FileOutputFormat.setOutputPath(job, outputPath1); boolean isDone = job.waitForCompletion(true);
System.exit(isDone ? 0 : 1); } //注意输出类型为自定义对象MovieRating,MovieRating按照降序排序
public static class MoviesMapJoinRatingsMapper2 extends Mapper<LongWritable, Text, MovieRating, NullWritable>{ MovieRating outKey = new MovieRating(); @Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
//'Night Mother (1986) 70
String[] split = value.toString().split("\t"); outKey.setCount(Integer.parseInt(split[1]));;
outKey.setMovieName(split[0]); context.write(outKey, NullWritable.get()); } } //排序之后自然输出,只取前10部电影
public static class MovieMR1Reducer2 extends Reducer<MovieRating, NullWritable, MovieRating, NullWritable>{ Text outKey = new Text();
int count = 0; @Override
protected void reduce(MovieRating key, Iterable<NullWritable> values,Context context) throws IOException, InterruptedException { for(NullWritable value : values) {
count++;
if(count > 10) {
return;
}
context.write(key, value); } } }
}

MovieRating.java

 public class MovieRating implements WritableComparable<MovieRating>{
private String movieName;
private int count; public String getMovieName() {
return movieName;
}
public void setMovieName(String movieName) {
this.movieName = movieName;
}
public int getCount() {
return count;
}
public void setCount(int count) {
this.count = count;
} public MovieRating() {} public MovieRating(String movieName, int count) {
super();
this.movieName = movieName;
this.count = count;
} @Override
public String toString() {
return movieName + "\t" + count;
}
@Override
public void readFields(DataInput in) throws IOException {
movieName = in.readUTF();
count = in.readInt();
}
@Override
public void write(DataOutput out) throws IOException {
out.writeUTF(movieName);
out.writeInt(count);
}
@Override
public int compareTo(MovieRating o) {
return o.count - this.count ;
} }

2、分别求男性,女性当中评分最高的10部电影(性别,电影名,评分)

分析:此问题涉及到3个表的联合查询,需要先将2个小表的数据预先加载到内存中,再进行查询

对三表进行联合

MoviesThreeTableJoin.java

 /**
* 进行3表的联合查询
*
* */
public class MoviesThreeTableJoin { public static void main(String[] args) throws Exception { if(args.length < 4) {
args = new String[4];
args[0] = "/movie/input/";
args[1] = "/movie/output2/";
args[2] = "/movie/cache/movies.dat";
args[3] = "/movie/cache/users.dat";
} Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://hadoop1:9000/");
System.setProperty("HADOOP_USER_NAME", "hadoop");
FileSystem fs = FileSystem.get(conf);
Job job = Job.getInstance(conf); job.setJarByClass(MoviesThreeTableJoin.class);
job.setMapperClass(ThreeTableMapper.class); job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class); URI uriUsers = new URI("hdfs://hadoop1:9000"+args[3]);
URI uriMovies = new URI("hdfs://hadoop1:9000"+args[2]);
job.addCacheFile(uriUsers);
job.addCacheFile(uriMovies); Path inputPath = new Path(args[0]);
Path outputPath = new Path(args[1]); if(fs.exists(outputPath)) {
fs.delete(outputPath,true);
} FileInputFormat.setInputPaths(job, inputPath);
FileOutputFormat.setOutputPath(job, outputPath); boolean isDone = job.waitForCompletion(true);
System.exit(isDone ? 0 : 1); } public static class ThreeTableMapper extends Mapper<LongWritable, Text, Text, NullWritable>{ //用于缓存movies和users中数据
private Map<String,String> moviesMap = new HashMap<>();
private Map<String,String> usersMap = new HashMap<>();
//用来存放读取的ratings.dat中的一行数据
String[] ratings; Text outKey = new Text(); @Override
protected void setup(Context context) throws IOException, InterruptedException { BufferedReader br = null; Path[] paths = context.getLocalCacheFiles();
String usersLine = null;
String moviesLine = null; for(Path path : paths) {
String name = path.toUri().getPath();
if(name.contains("movies.dat")) {
//读取movies.dat文件中的一行数据
br = new BufferedReader(new FileReader(name));
while((moviesLine = br.readLine()) != null) {
/**对读取的这行数据按照::进行切分
* 2::Jumanji (1995)::Adventure|Children's|Fantasy
* 电影ID,电影名字,电影类型
*
*电影ID作为key,其余作为value
*/
String[] split = moviesLine.split("::");
moviesMap.put(split[0], split[1]+"::"+split[2]);
}
}else if(name.contains("users.dat")) {
//读取users.dat文件中的一行数据
br = new BufferedReader(new FileReader(name));
while((usersLine = br.readLine()) != null) {
/**
* 对读取的这行数据按照::进行切分
* 2::M::56::16::70072
* 用户id,性别,年龄,职业,邮政编码
*
* 用户ID作为key,其他的作为value
* */
String[] split = usersLine.split("::");
System.out.println(split[0]+"----"+split[1]);
usersMap.put(split[0], split[1]+"::"+split[2]+"::"+split[3]+"::"+split[4]);
}
} } } @Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException { ratings = value.toString().split("::");
//通过电影ID和用户ID获取用户表和电影表中的其他信息
String movies = moviesMap.get(ratings[1]);
String users = usersMap.get(ratings[0]); //三表信息的联合
String threeTables = value.toString()+"::"+movies+"::"+users;
outKey.set(threeTables); context.write(outKey, NullWritable.get());
}
} }

三表联合之后的数据为

::::::::Winnie the Pooh and the Blustery Day ()::Animation|Children's::F::25::6::90027
::::::::Dumbo ()::Animation|Children's|Musical::F::25::6::90027
::::::::Die Hard ()::Action|Thriller::F::::::
::::::::Streetcar Named Desire, A ()::Drama::F::::::
::::::::Braveheart ()::Action|Drama|War::F::::::
::::::::Star Wars: Episode V - The Empire Strikes Back ()::Action|Adventure|Drama|Sci-Fi|War::F::::::
::::::::Raiders of the Lost Ark ()::Action|Adventure::F::::::
::::::::Aliens ()::Action|Sci-Fi|Thriller|War::F::::::
::::::::Good, The Bad and The Ugly, The ()::Action|Western::F::::::
::::::::Star Wars: Episode VI - Return of the Jedi ()::Action|Adventure|Romance|Sci-Fi|War::F::::::

字段解释

1000    ::    1036    ::    4    ::    975040964    ::    Die Hard (1988)    ::    Action|Thriller    ::    F    ::    25    ::    6    ::    90027

用户ID        电影ID        评分        评分时间戳             电影名字                  电影类型                性别        年龄        职业        邮政编码

0        1        2        3            4              5            6      7      8         9

要分别求男性,女性当中评分最高的10部电影(性别,电影名,评分)

1、以性别和电影名分组,以电影名+性别为key,以评分为value进行计算;

2、以性别+电影名+评分作为对象,以性别分组,以评分降序进行输出TOP10

业务逻辑:MoviesDemo2.java

 public class MoviesDemo2 {

     public static void main(String[] args) throws Exception {

         Configuration conf1 = new Configuration();
Configuration conf2 = new Configuration();
FileSystem fs1 = FileSystem.get(conf1);
FileSystem fs2 = FileSystem.get(conf2);
Job job1 = Job.getInstance(conf1);
Job job2 = Job.getInstance(conf2); job1.setJarByClass(MoviesDemo2.class);
job1.setMapperClass(MoviesDemo2Mapper1.class);
job2.setMapperClass(MoviesDemo2Mapper2.class);
job1.setReducerClass(MoviesDemo2Reducer1.class);
job2.setReducerClass(MoviesDemo2Reducer2.class); job1.setOutputKeyClass(Text.class);
job1.setOutputValueClass(DoubleWritable.class); job2.setOutputKeyClass(MoviesSexBean.class);
job2.setOutputValueClass(NullWritable.class); job2.setGroupingComparatorClass(MoviesSexGC.class); Path inputPath1 = new Path("D:\\MR\\hw\\movie\\output3he1");
Path outputPath1 = new Path("D:\\MR\\hw\\movie\\output2_1");
Path inputPath2 = new Path("D:\\MR\\hw\\movie\\output2_1");
Path outputPath2 = new Path("D:\\MR\\hw\\movie\\output2_end"); if(fs1.exists(outputPath1)) {
fs1.delete(outputPath1,true);
}
if(fs2.exists(outputPath2)) {
fs2.delete(outputPath2,true);
} FileInputFormat.setInputPaths(job1, inputPath1);
FileOutputFormat.setOutputPath(job1, outputPath1); FileInputFormat.setInputPaths(job2, inputPath2);
FileOutputFormat.setOutputPath(job2, outputPath2); JobControl control = new JobControl("MoviesDemo2"); ControlledJob aJob = new ControlledJob(job1.getConfiguration());
ControlledJob bJob = new ControlledJob(job2.getConfiguration()); bJob.addDependingJob(aJob); control.addJob(aJob);
control.addJob(bJob); Thread thread = new Thread(control);
thread.start(); while(!control.allFinished()) {
thread.sleep(1000);
}
System.exit(0); } /**
* 数据来源:3个文件关联之后的输出文件
* 以电影名+性别为key,以评分为value进行输出
*
* 1000::1036::4::975040964::Die Hard (1988)::Action|Thriller::F::25::6::90027
*
* 用户ID::电影ID::评分::评分时间戳::电影名字::电影类型::性别::年龄::职业::邮政编码
*
* */
public static class MoviesDemo2Mapper1 extends Mapper<LongWritable, Text, Text, DoubleWritable>{ Text outKey = new Text();
DoubleWritable outValue = new DoubleWritable(); @Override
protected void map(LongWritable key, Text value,Context context)
throws IOException, InterruptedException { String[] split = value.toString().split("::");
String strKey = split[4]+"\t"+split[6];
String strValue = split[2]; outKey.set(strKey);
outValue.set(Double.parseDouble(strValue)); context.write(outKey, outValue);
} } /**
* 以电影名+性别为key,计算平均分
* */
public static class MoviesDemo2Reducer1 extends Reducer<Text, DoubleWritable, Text, DoubleWritable>{ DoubleWritable outValue = new DoubleWritable(); @Override
protected void reduce(Text key, Iterable<DoubleWritable> values,Context context)
throws IOException, InterruptedException { int count = 0;
double sum = 0;
for(DoubleWritable value : values) {
count++;
sum += Double.parseDouble(value.toString());
}
double avg = sum / count; outValue.set(avg);
context.write(key, outValue);
}
} /**
* 以电影名+性别+评分作为对象,以性别分组,以评分降序排序
* */
public static class MoviesDemo2Mapper2 extends Mapper<LongWritable, Text, MoviesSexBean, NullWritable>{ MoviesSexBean outKey = new MoviesSexBean(); @Override
protected void map(LongWritable key, Text value,Context context)
throws IOException, InterruptedException { String[] split = value.toString().split("\t");
outKey.setMovieName(split[0]);
outKey.setSex(split[1]);
outKey.setScore(Double.parseDouble(split[2])); context.write(outKey, NullWritable.get()); }
} /**
* 取性别男女各前10名评分最好的电影
* */
public static class MoviesDemo2Reducer2 extends Reducer<MoviesSexBean, NullWritable, MoviesSexBean, NullWritable>{ @Override
protected void reduce(MoviesSexBean key, Iterable<NullWritable> values,Context context)
throws IOException, InterruptedException { int count = 0;
for(NullWritable nvl : values) {
count++;
context.write(key, NullWritable.get());
if(count == 10) {
return;
}
} }
}
}

对象:MoviesSexBean.java

 public class MoviesSexBean implements WritableComparable<MoviesSexBean>{

     private String movieName;
private String sex;
private double score; public MoviesSexBean() {
super();
}
public MoviesSexBean(String movieName, String sex, double score) {
super();
this.movieName = movieName;
this.sex = sex;
this.score = score;
}
public String getMovieName() {
return movieName;
}
public void setMovieName(String movieName) {
this.movieName = movieName;
}
public String getSex() {
return sex;
}
public void setSex(String sex) {
this.sex = sex;
}
public double getScore() {
return score;
}
public void setScore(double score) {
this.score = score;
}
@Override
public String toString() {
return movieName + "\t" + sex + "\t" + score ;
}
@Override
public void readFields(DataInput in) throws IOException {
movieName = in.readUTF();
sex = in.readUTF();
score = in.readDouble();
}
@Override
public void write(DataOutput out) throws IOException {
out.writeUTF(movieName);
out.writeUTF(sex);
out.writeDouble(score);
}
@Override
public int compareTo(MoviesSexBean o) { int result = this.getSex().compareTo(o.getSex());
if(result == 0) {
double diff = this.getScore() - o.getScore(); if(diff == 0) {
return 0;
}else {
return diff > 0 ? -1 : 1;
} }else {
return result > 0 ? -1 : 1;
} } }

分组:MoviesSexGC.java

 public class MoviesSexGC extends WritableComparator{

     public MoviesSexGC() {
super(MoviesSexBean.class,true);
} @Override
public int compare(WritableComparable a, WritableComparable b) { MoviesSexBean msb1 = (MoviesSexBean)a;
MoviesSexBean msb2 = (MoviesSexBean)b; return msb1.getSex().compareTo(msb2.getSex());
} }

3、求movieid = 2116这部电影各年龄段(因为年龄就只有7个,就按这个7个分就好了)的平均影评(年龄段,评分)

以第二部三表联合之后的文件进行操作

 public class MovieDemo3 {

     public static void main(String[] args) throws Exception {

         Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Job job = Job.getInstance(conf); job.setJarByClass(MovieDemo3.class);
job.setMapperClass(MovieDemo3Mapper.class);
job.setReducerClass(MovieDemo3Reducer.class); job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class); Path inputPath = new Path("D:\\MR\\hw\\movie\\3he1");
Path outputPath = new Path("D:\\MR\\hw\\movie\\outpu3"); if(fs.exists(outputPath)) {
fs.delete(outputPath,true);
} FileInputFormat.setInputPaths(job, inputPath);
FileOutputFormat.setOutputPath(job, outputPath); boolean isDone = job.waitForCompletion(true);
System.exit(isDone ? 0 : 1); } /**
* 1000::1036::4::975040964::Die Hard (1988)::Action|Thriller::F::25::6::90027
*
* 用户ID::电影ID::评分::评分时间戳::电影名字::电影类型::性别::年龄::职业::邮政编码
* 0 1 2 3 4 5 6 7 8 9
*
* key:电影ID+电影名字+年龄段
* value:评分
* 求movieid = 2116这部电影各年龄段
* */
public static class MovieDemo3Mapper extends Mapper<LongWritable, Text, Text, DoubleWritable>{ Text outKey = new Text();
DoubleWritable outValue = new DoubleWritable(); @Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException { String[] split = value.toString().split("::");
int movieID = Integer.parseInt(split[1]); if(movieID == 2116) {
String strKey = split[1]+"\t"+split[4]+"\t"+split[7];
String strValue = split[2]; outKey.set(strKey);
outValue.set(Double.parseDouble(strValue)); context.write(outKey, outValue);
} }
} /**
* 对map的输出结果求平均评分
* */
public static class MovieDemo3Reducer extends Reducer<Text, DoubleWritable, Text, DoubleWritable>{ DoubleWritable outValue = new DoubleWritable(); @Override
protected void reduce(Text key, Iterable<DoubleWritable> values, Context context)
throws IOException, InterruptedException { int count = 0;
double sum = 0; for(DoubleWritable value : values) {
count++;
sum += Double.parseDouble(value.toString());
} double avg = sum / count; outValue.set(avg); context.write(key, outValue); } } }

4、求最喜欢看电影(影评次数最多)的那位女性评最高分的10部电影的平均影评分(人,电影名,影评)

1000    ::    1036    ::    4    ::    975040964    ::    Die Hard (1988)    ::    Action|Thriller    ::    F    ::    25    ::    6    ::    90027

用户ID        电影ID        评分        评分时间戳             电影名字                  电影类型                性别        年龄        职业        邮政编码

0        1        2        3            4              5            6      7      8         9

(1)求出评论次数最多的女性ID

  MoviesDemo4_1.java

 public class MoviesDemo4 {

     public static void main(String[] args) throws Exception {

         Configuration conf1 = new Configuration();
FileSystem fs1 = FileSystem.get(conf1);
Job job1 = Job.getInstance(conf1); job1.setJarByClass(MoviesDemo4.class);
job1.setMapperClass(MoviesDemo4Mapper1.class);
job1.setReducerClass(MoviesDemo4Reducer1.class); job1.setMapOutputKeyClass(Text.class);
job1.setMapOutputValueClass(Text.class);
job1.setOutputKeyClass(Text.class);
job1.setOutputValueClass(DoubleWritable.class); Configuration conf2 = new Configuration();
FileSystem fs2 = FileSystem.get(conf2);
Job job2 = Job.getInstance(conf2); job2.setJarByClass(MoviesDemo4.class);
job2.setMapperClass(MoviesDemo4Mapper2.class);
job2.setReducerClass(MoviesDemo4Reducer2.class); job2.setMapOutputKeyClass(Moviegoers.class);
job2.setMapOutputValueClass(NullWritable.class);
job2.setOutputKeyClass(Moviegoers.class);
job2.setOutputValueClass(NullWritable.class); Path inputPath1 = new Path("D:\\MR\\hw\\movie\\3he1");
Path outputPath1 = new Path("D:\\MR\\hw\\movie\\outpu4_1"); if(fs1.exists(outputPath1)) {
fs1.delete(outputPath1,true);
} FileInputFormat.setInputPaths(job1, inputPath1);
FileOutputFormat.setOutputPath(job1, outputPath1); Path inputPath2 = new Path("D:\\MR\\hw\\movie\\outpu4_1");
Path outputPath2 = new Path("D:\\MR\\hw\\movie\\outpu4_2"); if(fs2.exists(outputPath2)) {
fs2.delete(outputPath2,true);
} FileInputFormat.setInputPaths(job2, inputPath2);
FileOutputFormat.setOutputPath(job2, outputPath2); JobControl control = new JobControl("MoviesDemo4"); ControlledJob ajob = new ControlledJob(job1.getConfiguration());
ControlledJob bjob = new ControlledJob(job2.getConfiguration()); bjob.addDependingJob(ajob); control.addJob(ajob);
control.addJob(bjob); Thread thread = new Thread(control);
thread.start(); while(!control.allFinished()) {
thread.sleep(1000);
}
System.exit(0);
} /**
* 1000::1036::4::975040964::Die Hard (1988)::Action|Thriller::F::25::6::90027
*
* 用户ID::电影ID::评分::评分时间戳::电影名字::电影类型::性别::年龄::职业::邮政编码
* 0 1 2 3 4 5 6 7 8 9
*
* 1、key:用户ID
* 2、value:电影名+评分
*
* */
public static class MoviesDemo4Mapper1 extends Mapper<LongWritable, Text, Text, Text>{ Text outKey = new Text();
Text outValue = new Text(); @Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException { String[] split = value.toString().split("::"); String strKey = split[0];
String strValue = split[4]+"\t"+split[2]; if(split[6].equals("F")) {
outKey.set(strKey);
outValue.set(strValue);
context.write(outKey, outValue);
} } } //统计每位女性的评论总数
public static class MoviesDemo4Reducer1 extends Reducer<Text, Text, Text, IntWritable>{ IntWritable outValue = new IntWritable(); @Override
protected void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException { int count = 0;
for(Text value : values) {
count++;
}
outValue.set(count);
context.write(key, outValue);
} } //对第一次MapReduce的输出结果进行降序排序
public static class MoviesDemo4Mapper2 extends Mapper<LongWritable, Text,Moviegoers,NullWritable>{ Moviegoers outKey = new Moviegoers(); @Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException { String[] split = value.toString().split("\t"); outKey.setName(split[0]);
outKey.setCount(Integer.parseInt(split[1]));
context.write(outKey, NullWritable.get());
} } //排序之后取第一个值(评论最多的女性ID和评论次数)
public static class MoviesDemo4Reducer2 extends Reducer<Moviegoers,NullWritable, Moviegoers,NullWritable>{ int count = 0; @Override
protected void reduce(Moviegoers key, Iterable<NullWritable> values,Context context)
throws IOException, InterruptedException { for(NullWritable nvl : values) {
count++;
if(count > 1) {
return;
}
context.write(key, nvl);
} } } }

(2)

Hadoop学习之路(二十六)MapReduce的API使用(三)的更多相关文章

  1. 嵌入式Linux驱动学习之路(二十六)DM9000C网卡驱动程序

    基于DM9000C的原厂代码修改dm9000c的驱动程序. 首先确认内存的基地址 iobase. 确定中断号码. 打开模块的初始化函数定义. 配置内存控制器的相应时序(结合DM9000C.C的手册). ...

  2. Hadoop学习之路(十六)Hadoop命令hadoop fs -ls详解

    http://blog.csdn.net/strongyoung88/article/details/68952248

  3. FastAPI 学习之路(十六)Form表单

    系列文章: FastAPI 学习之路(一)fastapi--高性能web开发框架 FastAPI 学习之路(二) FastAPI 学习之路(三) FastAPI 学习之路(四) FastAPI 学习之 ...

  4. 【Java学习笔记之二十六】深入理解Java匿名内部类

    在[Java学习笔记之二十五]初步认知Java内部类中对匿名内部类做了一个简单的介绍,但是内部类还存在很多其他细节问题,所以就衍生出这篇博客.在这篇博客中你可以了解到匿名内部类的使用.匿名内部类要注意 ...

  5. Linux学习之CentOS(二十六)--Linux磁盘管理:LVM逻辑卷的创建及使用

    在上一篇随笔里面 Linux学习之CentOS(二十五)--Linux磁盘管理:LVM逻辑卷基本概念及LVM的工作原理,详细的讲解了Linux的动态磁盘管理LVM逻辑卷的基本概念以及LVM的工作原理, ...

  6. Hadoop学习之路(十三)MapReduce的初识

    MapReduce是什么 首先让我们来重温一下 hadoop 的四大组件: HDFS:分布式存储系统 MapReduce:分布式计算系统 YARN:hadoop 的资源调度系统 Common:以上三大 ...

  7. Hadoop学习之路(十二)分布式集群中HDFS系统的各种角色

    NameNode 学习目标 理解 namenode 的工作机制尤其是元数据管理机制,以增强对 HDFS 工作原理的 理解,及培养 hadoop 集群运营中“性能调优”.“namenode”故障问题的分 ...

  8. Hadoop学习之路(十五)MapReduce的多Job串联和全局计数器

    MapReduce 多 Job 串联 需求 一个稍复杂点的处理逻辑往往需要多个 MapReduce 程序串联处理,多 job 的串联可以借助 MapReduce 框架的 JobControl 实现 实 ...

  9. Hadoop学习之路(十四)MapReduce的核心运行机制

    概述 一个完整的 MapReduce 程序在分布式运行时有两类实例进程: 1.MRAppMaster:负责整个程序的过程调度及状态协调 2.Yarnchild:负责 map 阶段的整个数据处理流程 3 ...

  10. Hadoop学习之路(十九)MapReduce框架排序

    流量统计项目案例 样本示例 需求 1. 统计每一个用户(手机号)所耗费的总上行流量.总下行流量,总流量 2. 得出上题结果的基础之上再加一个需求:将统计结果按照总流量倒序排序 3. 将流量汇总统计结果 ...

随机推荐

  1. android 模拟器无法启动问题

    很早之前就碰到过Android Studio模拟器无法启动的问题,今天终于尝试去解决了下,下面将我解决的方法记录下. 模拟器报错信息为: emulator: ERROR: x86 emulation ...

  2. apache 优化配置详解

    ###=========httpd.conf begin===================##Apache主配置文件##设置服务器的基础目录,默认为Apache安装目录ServerRoot &qu ...

  3. 使用dom4j写xml文件——源码

    1 dom4j下载与配置 1.1 dom4j下载 请移步下载链接 1.2 maven依赖 <dependency> <groupId>org.dom4j</groupId ...

  4. java.lang.StackOverflowError: null

    异常信息 java.lang.StackOverflowError: null at sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa(Floatin ...

  5. 简单Json序列化和反序列化

    序列化是什么: 序列化就是将一个对象的状态(各个属性量)保存起来,然后在适当的时候再获得.序列化分为两大部分:序列化和反序列化.序列化是这个过程的第一部分,将数据分解成字节流,以便存储在文件中或在网络 ...

  6. SqlServer代理(已禁用代理xp)

    SqlServer 本地库作业管理的时候已禁用,将其修改为可使用,master数据库下执行以下语句: sp_configure 'show advanced options', 1;  GO  REC ...

  7. BZOJ2960:跨平面

    题面 BZOJ Sol 对该平面图的对偶图建图后就是最小树形图,建一个超级点向每个点连 \(inf\) 边即可 怎么转成对偶图,怎么弄出多边形 把边拆成两条有向边,分别挂在两个点上 每个点的出边按角度 ...

  8. 原型链继承中的prototype、__proto__和constructor的关系

    前不久写了有关原型链中prototype.__proto__和constructor的关系的理解,这篇文章说说在原型链继承中的prototype.__proto__和constructor的关系. 通 ...

  9. mybatis三种传值方式

    第一种:按序列传参(dao层的函数方法)[sql] Public User selectUser(String name,String area); 对应的Mapper.xml [sql] <s ...

  10. C++创建一个新的进程

    原文:http://blog.csdn.net/hk627989388/article/details/53309865 STARTUPINFO用于指定新进程的主窗口特性的一个结构. PROCESS_ ...