MapReduce 中的Map后，sort不能对中文的key排序

今天写了一个用mapreduce求平均分的程序，结果是出来了，可是没有按照“学生名字”进行排序，如果是英文名字的话，结果是排好序的。

代码如下：

package com.pro.bq;
 
import java.io.IOException;
import java.util.StringTokenizer;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.fs.Path;
 
public class AverageScore {
    public static class MapAvg extends Mapper<Object, Text, Text, IntWritable>
    {
 
        public void map(Object key, Text value,Context context)
                throws IOException, InterruptedException {  
//            String[] lineData=value.toString().split(" ");//split中间如果有很多“ ”的话lineData的长度增加，灵活性差
//            if(lineData.length==2)
//            {        
//                name.set(lineData[0]);
//                score.set(Integer.parseInt(lineData[1]));
//                context.write(name,score);
//            }
            String line=value.toString();
            StringTokenizer tokenizer=new StringTokenizer(line,"\n");
            while(tokenizer.hasMoreElements())
            {
                StringTokenizer token=new StringTokenizer(tokenizer.nextToken());
                Text name=new Text(token.nextToken());
                IntWritable score=new IntWritable(Integer.parseInt(token.nextToken()));
                context.write(name,score);
            }
        }
    }
    public static class ReduceAvg extends Reducer<Text, IntWritable, Text, IntWritable>
    {
 
        public void reduce(Text key, Iterable<IntWritable> values,Context context)
                throws IOException, InterruptedException {
            // TODO Auto-generated method stub
            int sum=0;
            int cnt=0;
            for(IntWritable val:values)
            {
                sum+=val.get();
                cnt++;
            }
            sum=(Integer)sum/cnt;
            context.write(key, new IntWritable(sum));
        }
    }
 
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration conf=new Configuration();
        String[] hdfsPath=new String[]{"hdfs://localhost:9000/user/haduser/input/averageTest/","hdfs://localhost:9000/user/haduser/output/outAvgScore/"};
        String[] otherArgs=new GenericOptionsParser(conf, hdfsPath).getRemainingArgs();
 
        if(otherArgs.length!=2)
        {
            System.err.println("<in> <out>!!");
            System.exit(2);
        }
        Job job=new Job();
        job.setJarByClass(AverageScore.class);
 
        job.setMapperClass(MapAvg.class);
        job.setReducerClass(ReduceAvg.class);
 
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
 
        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
        FileOutputFormat.setOutputPath(job,new Path(otherArgs[1]));
        System.exit(job.waitForCompletion(true)?0:1);
 
    }
 
}

file1:
zhangsan
lisi
wangwu
zhaoliu 
 
file2:
张三
李四
王五
赵六    
 
file3:
zhangsan
lisi
wangwu
zhaoliu 
 
file4:
李四
张三
王五
赵六

结果如下：

lisi    38
wangwu    49
zhangsan    27
zhaoliu    60
张三    2
李四    1
王五    2
赵六    3

难道不支持中文的排序？？以后学会自己写Partitioner后是不是可以自己写排序的程序？？以后解决...

MapReduce 中的Map后，sort不能对中文的key排序的更多相关文章

MapReduce中的Shuffle和Sort分析
MapReduce 是现今一个非常流行的分布式计算框架,它被设计用于并行计算海量数据.第一个提出该技术框架的是Google 公司,而Google 的灵感则来自于函数式编程语言,如LISP,Scheme ...
Hadoop : MapReduce中的Shuffle和Sort分析
地址 MapReduce 是现今一个非常流行的分布式计算框架,它被设计用于并行计算海量数据.第一个提出该技术框架的是Google 公司,而Google 的灵感则来自于函数式编程语言,如LISP,Sch ...
MapReduce中的map个数
在map阶段读取数据前,FileInputFormat会将输入文件分割成split.split的个数决定了map的个数.影响map个数(split个数)的主要因素有: 1) 文件的大小.当块(dfs. ...
mapreduce中一个map多个输入路径
package duogemap; import java.io.IOException; import java.util.ArrayList; import java.util.List; imp ...
Hadoop框架下MapReduce中的map个数如何控制
控制map个数的核心源码 long minSize = Math.max(getFormatMinSplitSize(), getMinSplitSize(job)); //getFormatMinS ...
list中依据map<String,Object>的某个值排序
private void sort(List<Map<String, Object>> list) { Collections.sort(list, new Comparato ...
MapReduce中combine、partition、shuffle的作用是什么
http://www.aboutyun.com/thread-8927-1-1.html Mapreduce在hadoop中是一个比較难以的概念.以下须要用心看,然后自己就能总结出来了. 概括: co ...
Java Map 键值对排序按key排序和按Value排序
一.理论准备 Map是键值对的集合接口,它的实现类主要包括:HashMap,TreeMap,Hashtable以及LinkedHashMap等. TreeMap:基于红黑树(Red-Black tre ...
mapreduce 中 map数量与文件大小的关系
学习mapreduce过程中, map第一个阶段是从hdfs 中获取文件的并进行切片,我自己在好奇map的启动的数量和文件的大小有什么关系,进过学习得知map的数量和文件切片的数量有关系,那文件的大小 ...

随机推荐

per-project basis
Of course, HSQLDB connection parameters should be stored on a per-project basis, instead of only onc ...
分析 "End" "Unload Me" "Exit Sub" 之间的区别与联系
之前就想过这个问题,这么熟悉的几个东西居然对他们分析的不是很透彻. “End” 跟 “Unload Me” 在敲程序的时候经常敲到,“exit sub” 更是熟悉,下面,解析: End ...
How to create jar for Android Library Project
http://stackoverflow.com/questions/17063826/how-to-create-jar-for-android-library-project This works ...
MySQL、SqlServer、Oracle三大主流数据库分页查询
在这里主要讲解一下MySQL.SQLServer2000(及SQLServer2005)和ORCALE三种数据库实现分页查询的方法.可能会有人说这些网上都有,但我的主要目的是把这些知识通过我实际的应用 ...
使用GitHub建立自己的个人主页
1.建仓库在自己的库里建一个hujun123qwe.github.io的库即可以使用这个名字当网址访问. 2.写内容在库里建一个首页文件 index.html 这个个人主页只支持静态的内容,像p ...
2565: 最长双回文串 - BZOJ
Description 顺序和逆序读起来完全一样的串叫做回文串.比如acbca是回文串,而abc不是(abc的顺序为“abc”,逆序为“cba”,不相同). 输入长度为n的串S,求S的最长双回文子串T ...
Hibernate各种主键生成策略与配置详解【附1--<generator class="foreign">】
1.assigned 主键由外部程序负责生成,在 save() 之前必须指定一个.Hibernate不负责维护主键生成.与Hibernate和底层数据库都无关,可以跨数据库.在存储对象前,必须要使用主 ...
trie树（前缀树）
问题描述: Trie树,即字典树,又称单词查找树或键树,是一种树形结构,是一种哈希树的变种.典型应用是用于统计和排序大量的字符串(但不仅限于字符串),所以经常被搜索引擎系统用于文本词频统计.它的优 ...
DX SetFVF
自由顶点格式(flexible vertex format,FVF) http://www.cnblogs.com/xmzyl/articles/1604096.html if( SUCCEEDED( ...
sql中临时表的创建和使用【本文转自多人博客】
本模块原网址:http://www.cnblogs.com/jeffwongishandsome/archive/2009/08/05/1526466.html 原作者:Jeff Wong 1.创建方 ...

MapReduce 中的Map后，sort不能对中文的key排序

MapReduce 中的Map后，sort不能对中文的key排序的更多相关文章

随机推荐

热门专题