hadoop程序MapReduce之SingletonTableJoin

需求：单表关联问题。从文件中孩子和父母的关系挖掘出孙子和爷奶关系

样板：child-parent.txt

xiaoming daxiong

daxiong alice

daxiong jack

输出：xiaoming alice

xiaoming jack

分析设计：

mapper部分设计：

1、<k1,k1>k1代表：一行数据的编号位置，v1代表：一行数据。

2、左表：<k2,v2>k2代表：parent名字，v2代表：(1,child名字)，此处1：代表左表标志。

3、右表：<k3,v3>k3代表：child名字，v3代表：(2，parent名字)，此处2：代表右表标志。

reduce部分设计：

4、<k4,v4>k4代表：相同的key,v4代表：list<String>

5、求笛卡尔积<k5,v5>:k5代表：grandChild名字，v5代表：grandParent名字。

程序部分：

SingletonTableJoinMapper类

package com.cn.singletonTableJoin;
 
import java.io.IOException;
import java.util.StringTokenizer;
 
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
 
public class SingletonTableJoinMapper extends Mapper<Object, Text, Text, Text> {
    @Override
    protected void map(Object key, Text value, Mapper<Object, Text, Text, Text>.Context context)
            throws IOException, InterruptedException {
        String childName = new String();
        String parentName = new String();
        String relationType = new String();
        String[] values=new String[2];
        int i = 0;
        StringTokenizer itr = new StringTokenizer(value.toString());
        while(itr.hasMoreElements()){
            values[i] = itr.nextToken();
            i++;
        }
        if(values[0].compareTo("child") != 0){
            childName  = values[0];
            parentName = values[1];
            relationType = "1";
            context.write(new Text(parentName), new Text(relationType+" "+childName));
            relationType = "2";
            context.write(new Text(childName), new Text(relationType+" "+parentName));
        }
    }
}

SingletonTableJoinReduce类：

package com.cn.singletonTableJoin;
 
import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
 
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
 
public class SingletonTableJoinReduce extends Reducer<Text, Text, Text, Text> {
    @Override
    protected void reduce(Text key, Iterable<Text> values, Reducer<Text, Text, Text, Text>.Context context)
            throws IOException, InterruptedException {
        List<String> grandChild = new ArrayList<String>();
        List<String> grandParent = new ArrayList<String>();
        Iterator<Text> itr = values.iterator();
        while(itr.hasNext()){
            String[] record = itr.next().toString().split(" ");
            if(0 == record[0].length()){
                continue;
            }
            if("1".equals(record[0])){
                grandChild.add(record[1]);
            }else if("2".equals(record[0])){
                grandParent.add(record[1]);
            }
        }
        if(0 != grandChild.size() && 0 != grandParent.size()){
            for(String grandchild : grandChild){
                for(String grandparent : grandParent){
                    context.write(new Text(grandchild), new Text(grandparent));
                }
            }
        }
    }
}

SingletonTableJoin类

package com.cn.singletonTableJoin;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
 
/**
 * 单表关联
 * @author root
 *
 */
public class SingletonTableJoin {
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if (otherArgs.length != 2) {
           System.err.println("Usage: SingletonTableJoin  ");
           System.exit(2);
        }
        //创建一个job
        Job job = new Job(conf, "SingletonTableJoin");
        job.setJarByClass(SingletonTableJoin.class);
 
        //设置文件的输入输出路径
        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
 
        //设置mapper和reduce处理类
        job.setMapperClass(SingletonTableJoinMapper.class);
        job.setReducerClass(SingletonTableJoinReduce.class);
 
      //设置输出key-value数据类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
 
       //提交作业并等待它完成
       System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

把总结当成一种习惯。

hadoop程序MapReduce之SingletonTableJoin的更多相关文章

hadoop程序MapReduce之average
需求:求多门课程的平均值. 样板:math.txt zhangsan 90 lisi 88 wanghua 80 china.txt zhangsan 80lisi 90wanghua 88 输出:z ...
hadoop程序MapReduce之DataSort
需求:对文件中的数据进行排序. 样本:sort.log 10 13 10 20 输出:1 10 2 10 3 13 4 20 分析部分: mapper分析: 1.<k1,v1>k1代表:行 ...
hadoop程序MapReduce之DataDeduplication
需求:去掉文件中重复的数据. 样板:data.log 2016-3-1 a 2016-3-2 b 2016-3-2 c 2016-3-2 b 输出结果: 2016-3-1 a 2016 ...
hadoop程序MapReduce之MaxTemperature
需求:求每年当中最高的温度样本:temp.log 2016080623 2016072330 2015030420 输出结果:2016 30 2015 20 MapReduce分析设计: Mappe ...
hadoop程序MapReduce之WordCount
需求:统计一个文件中所有单词出现的个数. 样板:word.log文件中有hadoop hive hbase hadoop hive 输出:hadoop 2 hive 2 hbase 1 MapRedu ...
用PHP编写Hadoop的MapReduce程序
用PHP编写Hadoop的MapReduce程序 Hadoop流虽然Hadoop是用Java写的,但是Hadoop提供了Hadoop流,Hadoop流提供一个API, 允许用户使用任何语言编 ...
Hadoop之MapReduce程序应用三
摘要:MapReduce程序进行数据去重. 关键词:MapReduce 数据去重数据源:人工构造日志数据集log-file1.txt和log-file2.txt. log-file1.txt内容 ...
如何在Windows下面运行hadoop的MapReduce程序
在Windows下面运行hadoop的MapReduce程序的方法: 1.下载hadoop的安装包,这里使用的是"hadoop-2.6.4.tar.gz": 2.将安装包直接解压到 ...
Hadoop之Mapreduce 程序
package com.gylhaut.hadoop.senior.mapreduce; import java.io.IOException; import java.util.StringToke ...

随机推荐

ubuntu 文档查看器/gedit查看txt中文乱码问题
文档查看器界面是中文的,但查看pdf文档只显示英文,中文都空了出来. 用命令: sudo apt-get install poppler-data 解决该问题. gedit查看txt中文乱码问题打开 ...
超详细解说Hadoop伪分布式搭建--实战验证【转】
超详细解说Hadoop伪分布式搭建原文http://www.tuicool.com/articles/NBvMv2原原文 http://wojiaobaoshanyinong.iteye.com/b ...
HTML源文件编码的问题
刚才使用sublime text编辑html文件,在html中使用meta tag指定了charset,如下 <meta http-equiv="content-type" ...
【Unity笔记】获得鼠标点击屏幕的位置，并转成世界坐标
Vector3 pos = Camera.main.ScreenToWorldPoint(Input.mousePosition);
DDD CQRS和Event Sourcing的案例：足球比赛
在12月11日新的有关DDD CQRS和Event Sourcing演讲:改变心态- 以更加面向对象视角看待业务领域建模中,作者以足球比赛football Match为案例说明传统编程方法和CQRS的 ...
AM335x移植linux内核_转
AM335x移植linux内核该博客中详细介绍了移植linux内核到AM335x上相关,设备驱动采用设备树(DT)方式加载,包含设备启动.uboot.kernel.driver.rootfs及简单上 ...
【SpringMVC笔记】第五课改进Handler处理器和视图解析器
第四课已经对注解的映射器和适配器进行了改进. 接下来需要对Handler处理器和视图解析器进行改进.  <bean class=" ...
datanode启动失败
当我动态加入一个hadoop从节点的之后,出现了一个问题: [root@hadoop current]# hadoop-daemon.sh start datanode starting datano ...
记录日志框架：log4net使用
一.log4net简介 Log4net是Apache下一个开放源码的项目,我们可以控制日志信息的输出目的地.Log4net中定义了多种日志信息输出模式.在做项目的时候最头疼的是在程序发布到正式环境之后 ...
GDB十分钟教程（链接）
未联系作者,只能放个链接了. 十分赞的gdb教程. GDB十分钟教程

hadoop程序MapReduce之SingletonTableJoin

hadoop程序MapReduce之SingletonTableJoin的更多相关文章

随机推荐

热门专题