矩阵乘法的MapReduce实现

对于任意矩阵M和N，若矩阵M的列数等于矩阵N的行数，则记M和N的乘积为P=M*N，其中m_ik 记做矩阵M的第i行和第k列，n_kj记做矩阵N的第k行和第j列，则矩阵P中，第i行第j列的元素可表示为公式（1-1）：

p_ij=(M*N)_ij=∑m_ikn_kj=m_i1*n_1j+m_i2*n_2j+……+m_ik*n_kj（公式1-1）

由公式（1-1）可以看出，最后决定p_ij是(i,j)，所以可以将其作为Reducer的输入key值。为了求出p_ij分别需要知道m_ik和n_kj，对于m_ik，其所需要的属性有矩阵M，所在行数i、所在列数k，和其本身的数值大小m_ik；同样对于n_kj，其所需要的属性有矩阵N，所在行数k、所在列数j，和其本身数值大小n_kj，这些属性值可由Mapper处理得到

Map函数：对于矩阵M中的每个元素m_ik，产生一系列的key-value对<(i,j),(M,k,m_ik)>，其中，k=1,2……直到矩阵N的总列数，对于矩阵N的每个元素n_kj，产生一系列的key-value对，<(i,j),(N,k,n_kj)>，其中i=1,2……直到矩阵M的总行数

Reduce函数：对于每个键(i,j)相关联的值(M,k,m_ik)及(N,k,n_kj)，根据相同的k值将m_ik和n_kj分别放入不同的数组中，然后将两者的第k个元素抽取出来分别相乘，再累加，即可得到p_ij的值

有M和N两个文件分别存放两个矩阵，文件内容的每一行的形式是“行号，列号\t元素值”，本例中，使用shell脚本生成数据

代码1-2

root@lejian:/data# cat matrix

#!/bin/bash

for i in `seq 1 $1`

do

        for j in `seq 1 $2`

        do

                s=$(($RANDOM % 100))

                echo -e "$i,$j\t$s" >> M_$1_$2

        done

done

for i in `seq 1 $2`

do

        for j in `seq 1 $3`

        do

                s=$(($RANDOM%100))

                echo -e "$i,$j\t$s" >> N_$2_$3

        done

done

代码1-3，执行matrix脚本，生成一个2行3列和3行3列的矩阵，并在HDFS下新建一个data文件夹，将生成的两个矩阵放入data文件夹下

代码1-3

root@lejian:/data# ./matrix 2 3 3

root@lejian:/data# cat M_2_3

1,1     6

1,2     84

1,3     40

2,1     51

2,2     37

2,3     97

root@lejian:/data# cat N_3_3

1,1     97

1,2     34

1,3     95

2,1     93

2,2     10

2,3     70

3,1     71

3,2     24

3,3     47

root@lejian:/data# hadoop fs -mkdir /data

root@lejian:/data# hadoop fs -put /data/M_2_3 /data/

root@lejian:/data# hadoop fs -put /data/N_3_3 /data/

root@lejian:/data# hadoop fs -ls -R /data

-rw-r--r--   1 root supergroup         41 2017-01-07 11:57 /data/M_2_3

-rw-r--r--   1 root supergroup         63 2017-01-07 11:57 /data/N_3_3

矩阵乘法Mapper类程序如代码1-4

代码1-4

package com.hadoop.mapreduce;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.lib.input.FileSplit;

public class MatrixMapper extends Mapper<LongWritable, Text, Text, Text> {

	private int columnN = 0;

	private int rowM = 0;

	private Text mapKey = new Text();

	private Text mapValue = new Text();

	protected void setup(Context context) throws IOException, InterruptedException {

		Configuration conf = context.getConfiguration();

		columnN = Integer.parseInt(conf.get("columnN"));

		rowM = Integer.parseInt(conf.get("rowM"));

	};

	protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

		FileSplit file = (FileSplit) context.getInputSplit();

		String fileName = file.getPath().getName();

		String line = value.toString();

		String[] tuple = line.split(",");

		if (tuple.length != 2) {

			throw new RuntimeException("MatrixMapper tuple error");

		}

		int row = Integer.parseInt(tuple[0]);

		String[] tuples = tuple[1].split("\t");

		if (tuples.length != 2) {

			throw new RuntimeException("MatrixMapper tuples error");

		}

		if (fileName.contains("M")) {

			matrixM(row, Integer.parseInt(tuples[0]), Integer.parseInt(tuples[1]), context);

		} else {

			matrixN(row, Integer.parseInt(tuples[0]), Integer.parseInt(tuples[1]), context);

		}

	};

	private void matrixM(int row, int column, int value, Context context) throws IOException, InterruptedException {

		for (int i = 1; i < columnN + 1; i++) {

			mapKey.set(row + "," + i);

			mapValue.set("M," + column + "," + value);

			context.write(mapKey, mapValue);

		}

	}

	private void matrixN(int row, int column, int value, Context context) throws IOException, InterruptedException {

		for (int i = 1; i < rowM + 1; i++) {

			mapKey.set(i + "," + column);

			mapValue.set("N," + row + "," + value);

			context.write(mapKey, mapValue);

		}

	}

}

矩阵乘法Reducer类程序如代码1-5

代码1-5

package com.hadoop.mapreduce;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Reducer;

public class MatrixReducer extends Reducer<Text, Text, Text, Text> {

	private int columnM = 0;

	protected void setup(Context context) throws IOException, InterruptedException {

		Configuration conf = context.getConfiguration();

		columnM = Integer.parseInt(conf.get("columnM"));

	};

	protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {

		int sum = 0;

		int[] m = new int[columnM + 1];

		int[] n = new int[columnM + 1];

		for (Text val : values) {

			String[] tuple = val.toString().split(",");

			if (tuple.length != 3) {

				throw new RuntimeException("MatrixReducer tuple error");

			}

			if ("M".equals(tuple[0])) {

				m[Integer.parseInt(tuple[1])] = Integer.parseInt(tuple[2]);

			} else {

				n[Integer.parseInt(tuple[1])] = Integer.parseInt(tuple[2]);

			}

		}

		for (int i = 1; i < columnM + 1; i++) {

			sum += m[i] * n[i];

		}

		context.write(key, new Text(sum + ""));

	};

}

矩阵乘法主函数如代码1-5

package com.hadoop.mapreduce;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Matrix {

	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

		if (args == null || args.length != 5) {

			throw new RuntimeException("请输入输入路径、输出路径、矩阵M的行数、矩阵M的列数、矩阵N的列数");

		}

		Configuration conf = new Configuration();

		conf.set("rowM", args[2]);

		conf.set("columnM", args[3]);

		conf.set("columnN", args[4]);

		Job job = Job.getInstance(conf);

		job.setJobName("Matrix");

		job.setJarByClass(Matrix.class);

		job.setMapperClass(MatrixMapper.class);

		job.setReducerClass(MatrixReducer.class);

		job.setOutputKeyClass(Text.class);

		job.setOutputValueClass(Text.class);

		FileInputFormat.addInputPaths(job, args[0]);

		FileOutputFormat.setOutputPath(job, new Path(args[1]));

		System.exit(job.waitForCompletion(true) ? 0 : 1);

	}

}

运行代码1-5，运行结果如代码1-6所示（注：代码1-6省略部分MapReduce执行内容）

代码1-6

root@lejian:/data# hadoop jar matrix.jar com.hadoop.mapreduce.Matrix /data/ /output/ 2 3 3

…………

root@lejian:/data# hadoop fs -ls -R /output

-rw-r--r--   1 root supergroup          0 2017-01-07 12:04 /output/_SUCCESS

-rw-r--r--   1 root supergroup         57 2017-01-07 12:04 /output/part-r-00000

root@lejian:/data# hadoop fs -cat /output/part-r-00000

1,1     11234

1,2     2004

1,3     8330

2,1     15275

2,2     4432

2,3     11994

矩阵乘法的MapReduce实现的更多相关文章

MapReduce实现矩阵乘法
简单回想一下矩阵乘法: 矩阵乘法要求左矩阵的列数与右矩阵的行数相等.m×n的矩阵A,与n×p的矩阵B相乘,结果为m×p的矩阵C.具体内容能够查看:矩阵乘法. 为了方便描写叙述,先进行如果: 矩阵A的行 ...
【甘道夫】MapReduce实现矩阵乘法--实现代码
之前写了一篇分析MapReduce实现矩阵乘法算法的文章: [甘道夫]Mapreduce实现矩阵乘法的算法思路为了让大家更直观的了解程序运行,今天编写了实现代码供大家參考. 编程环境: java v ...
MapReduce实现大矩阵乘法
来自:http://blog.csdn.net/xyilu/article/details/9066973 引言何为大矩阵?Excel.SPSS,甚至SAS处理不了或者处理起来非常困难,需要设计巧 ...
*HDU2254 矩阵乘法
奥运 Time Limit: 1000/1000 MS (Java/Others) Memory Limit: 65536/65536 K (Java/Others)Total Submissi ...
*HDU 1757 矩阵乘法
A Simple Math Problem Time Limit: 3000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Ot ...
CH Round #30 摆花[矩阵乘法]
摆花 CH Round #30 - 清明欢乐赛背景及描述艺术馆门前将摆出许多花,一共有n个位置排成一排,每个位置可以摆花也可以不摆花.有些花如果摆在相邻的位置(隔着一个空的位置不算相邻),就不好看 ...
POJ3070 Fibonacci[矩阵乘法]
Fibonacci Time Limit: 1000MS Memory Limit: 65536K Total Submissions: 13677 Accepted: 9697 Descri ...
bzoj 2738 矩阵乘法
其实这题跟矩阵乘法没有任何卵关系,直接整体二分,用二维树状数组维护(刚刚学会>_<),复杂度好像有点爆炸(好像有十几亿不知道是不是算错了),但我们不能怂啊23333. #include&l ...
【BZOJ-2476】战场的数目矩阵乘法 + 递推
2476: 战场的数目 Time Limit: 1 Sec Memory Limit: 128 MBSubmit: 58 Solved: 38[Submit][Status][Discuss] D ...

随机推荐

初转java随感（一）程序=数据结构+算法
大学刚学编程的时候,有一句很经典的话程序=数据结构+算法今天有了进一步认识. 场景: 1.当前局面 (1)有现成的封装好的分页组件返回结果是page.类型为:Page.包括 page 分页信息,d ...
Nginx中文域名配置
Nginx虚拟主机上绑定一个带中文域名,比如linuxeye.中国,浏览器不能跳转. why? 因为操作系统的核心都是英文组成,DNS服务器的解析也是由英文代码交换,所以DNS服务器上并不支持直接的中 ...
第一个FPGA工程----点亮开发板上的3个LED灯
第一个FPGA工程----点亮开发板上的3个LED灯 1.新建FPGA工程开启Quartus2的画面 File--New Project Wizard..指定工程的路径与工程名指定所使用的FPGA ...
在CentOS中将/var等已有目录挂载到新添加的硬盘
1.查看当前硬盘使用状况: [root@gluster_node1 ~]# df -h Filesystem Size Used Avail Use% Mounted on / ...
Druid连接池初探
Druid首先是一个数据库连接池,但它不仅仅是一个数据库连接池,它还包含一个ProxyDriver,一系列内置的JDBC组件库,一个SQL Parser. Maven配置在pom.xml文件中添加如 ...
[zz]利用碎片时间健身
利用碎片时间健身(上) http://v.163.com/zixun/V96957QH6/VBSQ4D861.html#from=zixunplay_recommended 利用碎片时间健身(下) h ...
cordova环境配置
1,安装node.js 2,安装git 3,安装cordova 安装node.js后,命令行输入:npm install -g cordova 加@版本号可安装指定版本,如:npm istall -g ...
通过Daffodil for VS使VS2010的IDE可以用VC6 VC7.1 VC9等编译器进行项目编译
本文内容中的部分资料和知识来源于网络,具体引用出处不明. VS的IDE从VC6到VS2010的变化可谓是天翻地覆,最新的VS2010有一个特性就是支持多显示器开发,这无疑为我们的开发带来很大的便利. ...
简单了解undo
ORACLE 数据库 DML语句执行之前,会将数据块修改前的信息镜像保存到 undo 段 insert语句 rollback 将会执行一条deleter操作 (相对数据量最小,只需利 ...
带你入门带你飞Ⅱ 使用Mocha + Chai + SuperTest测试Restful API in node.js
目录 1. 简介 2. 准备开始 3. Restful API测试实战 Example 1 - GET Example 2 - Post Example 3 - Put Example 4 - Del ...

矩阵乘法的MapReduce实现

矩阵乘法的MapReduce实现的更多相关文章

随机推荐

热门专题