MapReduce -- 最短路径

示例：

给出各个节点到相邻节点的距离，要求算出初始节点到各个节点的最短路径。

数据：

A    (B,)    (D,)

B    (C,)    (D,)

C    (E,)

D    (B,)    (C,)    (E,)

E    (A,)    (C,)

A节点为初始节点，A到B的距离为10，A到D的距离为5。

B节点到C的距离为1，B到D的距离为2，其他类推。

MapReduce计算最短路径

Map阶段

如：

A　　(B,10)　　(D,5)

A　　0　　(B,10)　　(D,5) #A到A的最短距离为0

B　　10 #存在A到B的距离为10

D　　5 #存在A到D的距离为5

从初始节点开始，将节点到其他相连节点的距离列举出来，然后传递给reduce，找到距离最短的。

记住从初始节点开始，从A开始，找到B和D，然后再找B和D的相邻节点，依次类推，这个就是广度优先搜索。

从A节点出发，A节点没有到达的节点默认的距离为inf表示无穷大。

Reduce阶段

找到所有存在的距离中最短的，并更新记录中的最短距离。

如：针对key值为B的

B　　inf　　(C,1)　　(D,2) #inf为最远距离，

B　　10　　

B　　10　　(C,1)　　(D,2)　　#A到B的最短距离为10

MapReduce过程中数据的变化：

原始数据：
A    (B,)    (D,)

B    (C,)    (D,)

C    (E,)

D    (B,)    (C,)    (E,)

E    (A,)    (C,)

第一次mr结果：

A        (B,)    (D,)                #从初始节点A出发，找到A到B节点和D节点的距离

B        (C,)    (D,)　　　　　　　　　 #找到B节点，且更新值，A到B节点目前的最短距离

C    inf    (E,)

D        (B,)    (C,)    (E,)        #找到D节点，且更新值，A到D节点目前的最短距离

E    inf    (A,)    (C,)

第二次mr结果

A        (B,)    (D,)

B        (C,)    (D,)

C        (E,)

D        (B,)    (C,)    (E,)

E        (A,)    (C,)

第三次mr结果

A        (B,)    (D,)

B        (C,)    (D,)

C        (E,)

D        (B,)    (C,)    (E,)

E        (A,)    (C,)

第四次mr结果

A        (B,)    (D,)

B        (C,)    (D,)

C        (E,)

D        (B,)    (C,)    (E,)

E        (A,)    (C,)

接下来还要考虑，什么时候所有节点的最短距离都计算完成？

我的计算方式，假设如果所有节点没有距离进行更新，说明所有节点的最短距离都已经计算完成，则完成计算。

源代码：

RunJob.java

 import java.io.IOException;

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.fs.FileSystem;

 import org.apache.hadoop.fs.Path;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Job;

 import org.apache.hadoop.mapreduce.Mapper;

 import org.apache.hadoop.mapreduce.Reducer;

 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat;

 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 import org.apache.hadoop.util.StringUtils;

 /**

  * Created by Edward on 2016/7/15.

  */

 public class RunJob {

     static enum eInf {

         COUNTER

     }

     public static void main(String[] args) {

         Configuration conf = new Configuration();

         conf.set("fs.defaultFS", "hdfs://node1:8020");

         try {

             FileSystem fs = FileSystem.get(conf);

             int i = 0;

             long num = 1;

             long tmp = 0;

             while (num > 0) {

                 i++;

                 conf.setInt("run.counter", i);

                 Job job = Job.getInstance(conf);

                 job.setJarByClass(RunJob.class);

                 job.setMapperClass(ShortestPathMapper.class);

                 job.setReducerClass(ShortestPathReducer.class);

                 job.setMapOutputKeyClass(Text.class);

                 job.setMapOutputValueClass(Text.class);

                 //key value 的格式   第一个item为key，后面的item为value

                 job.setInputFormatClass(KeyValueTextInputFormat.class);

                 if (i == 1)

                     FileInputFormat.addInputPath(job, new Path("/test/shortestpath/input/"));

                 else

                     FileInputFormat.addInputPath(job, new Path("/test/shortestpath/output/sp" + (i - 1)));

                 Path outPath = new Path("/test/shortestpath/output/sp" + i);

                 if (fs.exists(outPath)) {

                     fs.delete(outPath, true);

                 }

                 FileOutputFormat.setOutputPath(job, outPath);

                 boolean b = job.waitForCompletion(true);

                 if (b) {

                     num = job.getCounters().findCounter(eInf.COUNTER).getValue();

                     if (num == 0) {

                         System.out.println("执行了" + i + "次，完成最短路径的计算");

                     }

                 }

             }

         } catch (Exception e) {

             e.printStackTrace();

         }

     }

     /**

      * @author Edward

      *

      *         @1 A (B,10) (D,5) =>

      *            A 0 (B,10) (D,5)

      *            B 10

      *            D 5

      *         @2 B 10 (C,1) (D,2) =>

      *         B 10 (C,1) (D,2)

      *         C 11

      *         D 13

      */

     public static class ShortestPathMapper extends Mapper<Text, Text, Text, Text> {

         protected void map(Text key, Text value, Context context) throws IOException, InterruptedException {

             int conuter = context.getConfiguration().getInt("run.counter", 1);

             Node node = new Node();

             String distance = null;

             String str = null;

             // 第一次计算，填写默认距离 A:0 其他:inf

             if (conuter == 1) {

                 if (key.toString().equals("A") || key.toString().equals("1")) {

                     distance = "0";

                 } else {

                     distance = "inf";

                 }

                 str = distance + "\t" + value.toString();

             } else {

                 str = value.toString();

             }

             context.write(key, new Text(str));

             node.FormatNode(str);

             // 没走到此节点 退出

             if (node.getDistance().equals("inf"))

                 return;

             // 重新计算源点A到各点的距离

             for (int i = 0; i < node.getNodeNum(); i++) {

                 String k = node.getNodeKey(i);

                 String v = new String(

                         Integer.parseInt(node.getNodeValue(i)) + Integer.parseInt(node.getDistance()) + "");

                 context.write(new Text(k), new Text(v));

             }

         }

     }

     /**

      * @author Edward

      *

      *         B 10 (C,1) (D,2)

      *         B 8              =>

      *         B 8 (C,1) (D,2)

      *

      */

     public static class ShortestPathReducer extends Reducer<Text, Text, Text, Text> {

         protected void reduce(Text arg0, Iterable<Text> arg1, Context arg2) throws IOException, InterruptedException {

             String min = null;

             int i = 0;

             String dis = "inf";

             Node node = new Node();

             for (Text t : arg1) {

                 i++;

                 dis = StringUtils.split(t.toString(), '\t')[0];

                 // 如果存在inf节点，表示存在没有计算距离的节点。

                 // if(dis.equals("inf"))

                 // arg2.getCounter(eInf.COUNTER).increment(1L);

                 // 判断是否存在相邻节点，如果是则需要保留信息，并找到最小距离进行更新。

                 String[] strs = StringUtils.split(t.toString(), '\t');

                 if (strs.length > 1) {

                     node.FormatNode(t.toString());

                 }

                 // 第一条数据默认是最小距离

                 if (i == 1) {

                     min = dis;

                 } else {

                     if (dis.equals("inf"))

                         ;

                     else if (min.equals("inf"))

                         min = dis;

                     else if (Integer.parseInt(min) > Integer.parseInt(dis)) {

                         min = dis;

                     }

                 }

             }

             // 有新的最小值，说明还在进行优化计算，需要继续循环计算

             if (!min.equals("inf")) {

                 if (node.getDistance().equals("inf"))

                     arg2.getCounter(eInf.COUNTER).increment(1L);

                 else {

                     if (Integer.parseInt(node.getDistance()) > Integer.parseInt(min))

                         arg2.getCounter(eInf.COUNTER).increment(1L);

                 }

             }

             node.setDistance(min);

             arg2.write(arg0, new Text(node.toString()));

         }

     }

 }

Node.java

 import org.apache.hadoop.util.StringUtils;

 /**

  * Created by Edward on 2016/7/15.

  */

 public class Node {

     private String distance;

     private String[] adjs;

     public String getDistance() {

         return distance;

     }

     public void setDistance(String distance) {

         this.distance = distance;

     }

     public String getKey(String str)

     {

         return str.substring(1, str.indexOf(","));

     }

     public String getValue(String str)

     {

         return str.substring(str.indexOf(",")+1, str.indexOf(")"));

     }

     public String getNodeKey(int num)

     {

         return getKey(adjs[num]);

     }

     public String getNodeValue(int num)

     {

         return getValue(adjs[num]);

     }

     public int getNodeNum()

     {

         return adjs.length;

     }

     public void FormatNode(String str)

     {

         if(str.length() == 0)

             return ;

         String[] strs =  StringUtils.split(str, '\t');

         adjs = new String[strs.length-1];

         for(int i=0; i<strs.length; i++)

         {

             if(i == 0)

             {

                 setDistance(strs[i]);

                 continue;

             }

             this.adjs[i-1]=strs[i];

         }

     }

     public String toString()

     {

         String str = this.distance+"" ;

         if(this.adjs == null)

             return str;

         for(String s:this.adjs)

         {

             str = str+"\t"+s;

         }

         return str;

     }

     public static void main(String[] args)

     {

         Node node  = new Node();

         node.FormatNode("1    (A,20)    (B,30)");

         System.out.println(node.distance+"|"+node.getNodeNum()+"|"+node.toString());

     }

 }

MapReduce -- 最短路径的更多相关文章

Hadoop MapReduce编程 API入门系列之最短路径（十五）
不多说,直接上代码. ======================================= Iteration: 1= Input path: out/shortestpath/input. ...
mapreduce shortest way out
相关知识最优路径算法是无向图中满足通路上所有顶点(除起点.终点外)各异,所有边也各异的通路.应用在公路运输中,可以提供起点和终点之间的最短路径,节省运输成本.可以大大提高交通运输效率. 本实验采用D ...
Mapreduce的文件和hbase共同输入
Mapreduce的文件和hbase共同输入 package duogemap; import java.io.IOException; import org.apache.hadoop.co ...
mapreduce多文件输出的两方法
mapreduce多文件输出的两方法 package duogemap; import java.io.IOException; import org.apache.hadoop.conf ...
mapreduce中一个map多个输入路径
package duogemap; import java.io.IOException; import java.util.ArrayList; import java.util.List; imp ...
Hadoop 中利用 mapreduce 读写 mysql 数据
Hadoop 中利用 mapreduce 读写 mysql 数据有时候我们在项目中会遇到输入结果集很大,但是输出结果很小,比如一些 pv.uv 数据,然后为了实时查询的需求,或者一些 OLAP ...
[Hadoop in Action] 第5章高阶MapReduce
链接多个MapReduce作业执行多个数据集的联结生成Bloom filter 1.链接MapReduce作业 [顺序链接MapReduce作业] mapreduce-1 | mapr ...
MapReduce
2016-12-21 16:53:49 mapred-default.xml mapreduce.input.fileinputformat.split.minsize 0 The minimum ...
Johnson 全源最短路径算法
解决单源最短路径问题(Single Source Shortest Paths Problem)的算法包括: Dijkstra 单源最短路径算法:时间复杂度为 O(E + VlogV),要求权值非负: ...

随机推荐

JSz中的静态方法和实例方法的分析
我又回来了,最近忙着喝枸杞,没来写博客感觉很有负罪感,今晚我来写一点小小的知识点可能我们在用形如Array.of()的方法时会产生一些疑问,为什么我们能不实例化直接使用Array上的of()方法呢, ...
css改变input显示的样式
设置input宽高,边框大小颜色,背景颜色,字体颜色,字体大小,背景图片,去除蓝色边框. input{width:80px ;height:30px;border:1px solid red;colo ...
hudson运行出现java.io.IOException Cannot run program的错误分析
作者:朱金灿来源:http://blog.csdn.net/clever101 在昨天运行每日构建时hudson突然出错,错误信息如下: [MySoft3.1] $ cmd /c call &quo ...
FI配置步骤清单
1.定义公司代码配置路径: R/3定制IMG的实施指南>企业结构>定义>财务会计>定义, 复制, 删除, 检查公司代码事务代码 EC01 2.编辑科目表清单配置路 ...
葡萄城报表 SP2 新特性（1）— 单元格智能合并
中国式复杂报表的布局,因为数据的动态性和结构性,导致其布局往往是无规律,且在设计时无法预测的,如单元格合并,通常不仅希望在每一列的数据展现中,能够根据需要自动将相同的单元格合并,且每个单元格之间该属性 ...
使用IDEA创建Java Web项目并部署
前面给大家介绍了IDEA的安装和基本配置,睡觉前Alan再给大家分享一下使用IDEA创建Java Web并部署访问. 打开IDEA,File>New>Project,进入Java Ente ...
Azure 中虚拟机的区域和可用性
Azure 在中国的两个数据中心运行. 这些数据中心分组到地理区域,让用户可灵活选择构建应用程序的位置. 请务必了解 Azure 中虚拟机 (VM) 运行的方式和位置,以及最大化性能.可用性和冗余的选 ...
Oracle EBS AR 贷项通知单核销取值
SELECT cm.trx_number ,fnd_flex_ext.get_segs('SQLGL', 'GL#', gcc.chart_of_accounts_id, ad.code_combin ...
4种更快更简单实现Python数据可视化的方法
数据可视化是数据分析或机器学习项目中十分重要的一环.通常,你需要在项目初期进行探索性的数据分析(EDA),从而对数据有一定的了解,而且创建可视化确实可以使分析的任务更清晰.更容易理解,特别是对于大规模 ...
spring mvc 接收 put参数
web.xml中:  <filter> <filter-name>HttpMethodFilter</filter-nam ...

MapReduce -- 最短路径

MapReduce计算最短路径

MapReduce -- 最短路径的更多相关文章

随机推荐

热门专题