mapreduce-实现单表关联

//map类

package hadoop3;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class danbiaomap extends Mapper <LongWritable,Text,Text,Text>{

String childname=new String();
String parientname=new String();
String flag=new String();

protected void map(LongWritable key,Text value,Context context) throws IOException, InterruptedException

{
String [] str=value.toString().split("\t");
if (str[0].compareTo("child")!=0)

{ //left table
flag="1";
childname=str[0];
parientname=str[1];
context.write(new Text(parientname), new Text(flag+"+"+childname+"+"+parientname));

//right table
flag="2";
context.write(new Text(childname), new Text(flag+"+"+childname+"+"+parientname));

}

//reduce 类

package hadoop3;

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class danbiaoreduce extends Reducer<Text,Text,Text,Text>{

private int num=0;
protected void reduce(Text key,Iterable<Text> value, Context context) throws IOException, InterruptedException

{
if (num==0)

{

context.write(new Text("grandchild"),new Text( "grandparient"));
num++;
}

Iterator <Text> itr=value.iterator();
int grandchildnum=0;
String [] grandchild=new String[100];
int grandparientnum=0;
String [] grandparient=new String[100];

while (itr.hasNext())

{
String [] record=itr.next().toString().split("\\+");
if (record[0].compareTo("1")==0)
{
grandchild[grandchildnum]=record[1];
grandchildnum++;

}
else if (record[0].compareTo("2")==0)
{
grandparient[grandparientnum]=record[2];
grandparientnum++;

}
else
{}
}
if(grandchildnum !=0 && grandparientnum !=0)
{
for (int i=0;i<grandparientnum;i++)
{
for (int j=0;j<grandchildnum;j++)
{
context.write(new Text(grandchild[i]), new Text(grandparient[j]));

}

//主类

package hadoop3;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

//import com.sun.jersey.core.impl.provider.entity.XMLJAXBElementProvider.Text;

public class danbiao extends Configured implements Tool{

public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
ToolRunner.run(new danbiao(), args);
}

@Override
public int run(String[] arg0) throws Exception {
// TODO Auto-generated method stub

Configuration conf=getConf();
Job job=new Job();
job.setJarByClass(getClass());
FileSystem fs=FileSystem.get(conf);
fs.delete(new Path("/outfile1104"),true);
FileInputFormat.addInputPath(job,new Path("/luo/danbiao.txt"));
FileOutputFormat.setOutputPath(job, new Path("/outfile1104"));
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);

job.setMapperClass(danbiaomap.class);
job.setReducerClass(danbiaoreduce.class);

job.waitForCompletion(true);

return 0;
}

}

mapreduce-实现单表关联的更多相关文章

MapReduce应用案例--单表关联
1. 实例描述单表关联这个实例要求从给出的数据中寻找出所关心的数据,它是对原始数据所包含信息的挖掘. 实例中给出child-parent 表, 求出grandchild-grandparent表. ...
Hadoop on Mac with IntelliJ IDEA - 8 单表关联NullPointerException
简化陆喜恒. Hadoop实战(第2版)5.4单表关联的代码时遇到空指向异常,经分析是逻辑问题,在此做个记录. 环境:Mac OS X 10.9.5, IntelliJ IDEA 13.1.5, Ha ...
Hadoop 单表关联
前面的实例都是在数据上进行一些简单的处理,为进一步的操作打基础.单表关联这个实例要求从给出的数据中寻找到所关心的数据,它是对原始数据所包含信息的挖掘.下面进入这个实例. 1.实例描述实例中给出chi ...
MapRedece(单表关联)
源数据:Child--Parent表 Tom Lucy Tom Jack Jone Lucy Jone Jack Lucy Marry Lucy Ben Jack Alice Jack Jesse T ...
MR案例：单表关联查询
"单表关联"这个实例要求从给出的数据中寻找所关心的数据,它是对原始数据所包含信息的挖掘. 需求:实例中给出 child-parent(孩子—父母)表,要求输出 grandchild ...
Hadoop工程师面试题(1)--MapReduce实现单表汇总统计
数据源格式描述: 输入t1.txt源数据,数据文件分隔符"*&*",字段说明如下: 字段序号字段英文名称字段中文名称字段类型字段长度 1 TIME_ID 时间(到时 ...
MapReduce编程系列 — 5：单表关联
1.项目名称: 2.项目数据: chile parentTom LucyTom JackJone LucyJone JackLucy MaryLucy Ben ...
MapReduce单表关联学习~
首先考虑表的自连接,其次是列的设置,最后是结果的整理. 文件内容: import org.apache.hadoop.conf.Configuration; import org.apache.had ...
利用hadoop来解决“单表关联”的问题
已知 child parent a b a c d b d c b e b f c g c h x g x h m x m n o x o n 则 c 2+c+g 2+c+h 1+a+c 1+d+c ...

随机推荐

iOS 事件响应者链的学习(也有叫 UI连锁链)
当发生事件响应的时候,必须知道由谁来响应事件.在iOS中,由响应链来对事件进行响应,所有的事件响应的类都是继承于UIResponder的子类,响应链是一个由不同对象组成的层次结构,其中每个对象将依次获 ...
[转] - 学习ASP.NET比较完整的流程！
如果你已经有较多的面向对象开发经验,跳过以下这两步: 第一步掌握一门.NET面向对象语言,C#或VB.NET 我强烈反对在没系统学过一门面向对象(OO)语言的前提下去学ASP.NET. ASP.N ...
Ubuntu系统常用操作命令
1.基本命令: sudo 提升用户权限为root用户 ls 显示文件内容 cd 进入指定路径,后接路径参数如cd /进入根目录 cd -进入用户目录 cd ..返回上一级目录 mv xx.txt x ...
SwfUpload文件上传
SWFUpload是一个flash和js相结合而成的文件上传插件,其功能非常强大.以前在项目中用过几次,但它的配置参数太多了,用过后就忘记怎么用了,到以后要用时又得到官网上看它的文档,真是太烦了.所以 ...
linux下bwa和samtools的安装与使用
bwa的安装流程安装本软体总共需要完成以下两个软体的安装工作:1) BWA2) Samtools 1.BWA的安装a.下载BWA (download from BWA Source Forge ) h ...
java中的特殊有用类
1.MessageDigest:类似与md5加密算法应用的功能类
字符在内存中最终的表示形式是什么？是某种字符编码还是码位(Code Point)？
字符在内存中最终的表示形式是什么?是某种字符编码还是码位(Code Point)? 根据我的了解,编码中有三个核心概念:1. 字符集(Character Set),可以说是一个抽象概念,字符的合集2. ...
C# 反射通过GetCustomAttributes方法，获得自定义特性
http://blog.csdn.net/litao2/article/details/17633107 使用反射访问: 自定义属性的信息和对其进行操作的方法. 一.实例1 1.代码: 如:Syste ...
Android Studio2.3中简单配置，释放C盘空间
重新安装了一下android studio,由于占用了太多的C盘空间.记录一下,在网上收集到的studio中两个主要占用C盘空间的文件,我们将它移除C盘. 原博地址: http://blog.csdn ...
ANT+JMETER集成1（生成报告）
配置build.xml文件时,网上找了各种版本的代码都会报错, 终于找到个可以生成报告的build源码了链接: http://www.cnblogs.com/hanxiaomin/p/6731810 ...

mapreduce-实现单表关联

mapreduce-实现单表关联的更多相关文章

随机推荐

热门专题