GraphX之Pregel（BSP模型-消息传递机制）学习

/*

 * Licensed to the Apache Software Foundation (ASF) under one or more

 * contributor license agreements.  See the NOTICE file distributed with

 * this work for additional information regarding copyright ownership.

 * The ASF licenses this file to You under the Apache License, Version 2.0

 * (the "License"); you may not use this file except in compliance with

 * the License.  You may obtain a copy of the License at

 *

 *    http://www.apache.org/licenses/LICENSE-2.0

 *

 * Unless required by applicable law or agreed to in writing, software

 * distributed under the License is distributed on an "AS IS" BASIS,

 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

 * See the License for the specific language governing permissions and

 * limitations under the License.

 */

package org.apache.spark.graphx

import scala.reflect.ClassTag

import org.apache.spark.Logging

/**

 * Implements a Pregel-like bulk-synchronous message-passing API.

 *

 * Unlike the original Pregel API, the GraphX Pregel API factors the sendMessage computation over

 * edges, enables the message sending computation to read both vertex attributes, and constrains

 * messages to the graph structure.  These changes allow for substantially more efficient

 * distributed execution while also exposing greater flexibility for graph-based computation.

 *

 * @example We can use the Pregel abstraction to implement PageRank:

 * {{{

 * val pagerankGraph: Graph[Double, Double] = graph

 *   // Associate the degree with each vertex

 *   .outerJoinVertices(graph.outDegrees) {

 *     (vid, vdata, deg) => deg.getOrElse(0)

 *   }

 *   // Set the weight on the edges based on the degree

 *   .mapTriplets(e => 1.0 / e.srcAttr)

 *   // Set the vertex attributes to the initial pagerank values

 *   .mapVertices((id, attr) => 1.0)

 *

 * def vertexProgram(id: VertexId, attr: Double, msgSum: Double): Double =

 *   resetProb + (1.0 - resetProb) * msgSum

 * def sendMessage(id: VertexId, edge: EdgeTriplet[Double, Double]): Iterator[(VertexId, Double)] =

 *   Iterator((edge.dstId, edge.srcAttr * edge.attr))

 * def messageCombiner(a: Double, b: Double): Double = a + b

 * val initialMessage = 0.0

 * // Execute Pregel for a fixed number of iterations.

 * Pregel(pagerankGraph, initialMessage, numIter)(

 *   vertexProgram, sendMessage, messageCombiner)

 * }}}

 *

 */

object Pregel extends Logging {

  /**

   * Execute a Pregel-like iterative vertex-parallel abstraction.  The

   * user-defined vertex-program `vprog` is executed in parallel on

   * each vertex receiving any inbound messages and computing a new

   * value for the vertex.  The `sendMsg` function is then invoked on

   * all out-edges and is used to compute an optional message to the

   * destination vertex. The `mergeMsg` function is a commutative

   * associative function used to combine messages destined to the

   * same vertex.

   *

   * On the first iteration all vertices receive the `initialMsg` and

   * on subsequent iterations if a vertex does not receive a message

   * then the vertex-program is not invoked.

   *

   * This function iterates until there are no remaining messages, or

   * for `maxIterations` iterations.

   *

   * @tparam VD the vertex data type

   * @tparam ED the edge data type

   * @tparam A the Pregel message type

   *

   * @param graph the input graph.

   *

   * @param initialMsg the message each vertex will receive at the on

   * the first iteration

   *

   * @param maxIterations the maximum number of iterations to run for

   *

   * @param activeDirection the direction of edges incident to a vertex that received a message in

   * the previous round on which to run `sendMsg`. For example, if this is `EdgeDirection.Out`, only

   * out-edges of vertices that received a message in the previous round will run. The default is

   * `EdgeDirection.Either`, which will run `sendMsg` on edges where either side received a message

   * in the previous round. If this is `EdgeDirection.Both`, `sendMsg` will only run on edges where

   * *both* vertices received a message.

   *

   * @param vprog the user-defined vertex program which runs on each

   * vertex and receives the inbound message and computes a new vertex

   * value.  On the first iteration the vertex program is invoked on

   * all vertices and is passed the default message.  On subsequent

   * iterations the vertex program is only invoked on those vertices

   * that receive messages.

   *

   * @param sendMsg a user supplied function that is applied to out

   * edges of vertices that received messages in the current

   * iteration

   *

   * @param mergeMsg a user supplied function that takes two incoming

   * messages of type A and merges them into a single message of type

   * A.  ''This function must be commutative and associative and

   * ideally the size of A should not increase.''

   *

   * @return the resulting graph at the end of the computation

   *

   */

  def apply[VD: ClassTag, ED: ClassTag, A: ClassTag]

     (graph: Graph[VD, ED],

      initialMsg: A,

      maxIterations: Int = Int.MaxValue,

      activeDirection: EdgeDirection = EdgeDirection.Either)

     (vprog: (VertexId, VD, A) => VD,

      sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexId, A)],

      mergeMsg: (A, A) => A)

    : Graph[VD, ED] =

  {

    var g = graph.mapVertices((vid, vdata) => vprog(vid, vdata, initialMsg)).cache()

    // compute the messages

    var messages = g.mapReduceTriplets(sendMsg, mergeMsg)

    var activeMessages = messages.count()

    // Loop

    var prevG: Graph[VD, ED] = null

    var i = 0

    while (activeMessages > 0 && i < maxIterations) {

      // Receive the messages. Vertices that didn't get any messages do not appear in newVerts.

      val newVerts = g.vertices.innerJoin(messages)(vprog).cache()

      // Update the graph with the new vertices.

      prevG = g

      g = g.outerJoinVertices(newVerts) { (vid, old, newOpt) => newOpt.getOrElse(old) }

      g.cache()

      val oldMessages = messages

      // Send new messages. Vertices that didn't get any messages don't appear in newVerts, so don't

      // get to send messages. We must cache messages so it can be materialized on the next line,

      // allowing us to uncache the previous iteration.

      messages = g.mapReduceTriplets(sendMsg, mergeMsg, Some((newVerts, activeDirection))).cache()

      // The call to count() materializes `messages`, `newVerts`, and the vertices of `g`. This

      // hides oldMessages (depended on by newVerts), newVerts (depended on by messages), and the

      // vertices of prevG (depended on by newVerts, oldMessages, and the vertices of g).

      activeMessages = messages.count()

      logInfo("Pregel finished iteration " + i)

      // Unpersist the RDDs hidden by newly-materialized RDDs

      oldMessages.unpersist(blocking=false)

      newVerts.unpersist(blocking=false)

      prevG.unpersistVertices(blocking=false)

      prevG.edges.unpersist(blocking=false)

      // count the iteration

      i += 1

    }

    g

  } // end of apply

} // end of class Pregel

GraphX之Pregel（BSP模型-消息传递机制）学习的更多相关文章

Android学习笔记-事件处理之Handler消息传递机制
内容摘要:Android Handler消息传递机制的学习总结.问题记录 Handler消息传递机制的目的: 1.实现线程间通信(如:Android平台只允许主线程(UI线程)修改Activity里的 ...
Android学习之Handler消息传递机制
Android只允许UI线程修改Activity里的UI组件.当Android程序第一次启动时,Android会同时启动一条主线程(Main Thread),主线程主要负责处理与UI相关的事件,如用户 ...
从BSP模型到Apache Hama
一.什么是BSP模型概述 BSP(Bulk Synchronous Parallel,整体同步并行计算模型)是一种并行计算模型,由英国计算机科学家Viliant在上世纪80年代提出.Google发布 ...
BSP模型
http://www.uml.org.cn/yunjisuan/201212191.asp Hama中最关键的就是BSP(Bulk Synchronous Parallel-"大型" ...
我理解的Hanlder--android消息传递机制
每一个学习Android的同学都会觉得Handler是一个神奇的东西,我也一样,开始我以为我懂了Handler的机制,后来发现自己是一知半解,昨天想想,我能否自己实现一个Handler,让子线程与Ac ...
Chrome 消息传递机制
Chrome插件开发入门(二)——消息传递机制 Blog | Qiushi Chen 2014-03-31 9538 阅读 Chrome 插件由于插件的js运行环境有区别,所以消息传递机制是一个重要 ...
Chrome插件开发入门（二）——消息传递机制
Chrome插件开发入门(二)——消息传递机制由于插件的js运行环境有区别,所以消息传递机制是一个重要内容.阅读了很多博文,大家已经说得很清楚了,直接转一篇@姬小光的博文,总结的挺好.后面附一 ...
iOS开发——OC篇&消息传递机制（KVO／NOtification／Block／代理／Target－Action）
iOS开发中消息传递机制(KVO/NOtification/Block/代理/Target-Action) 今晚看到了一篇好的文章,所以就搬过来了,方便自己以后学习虽然这一期的主题是关于Fou ...
（Android数据传递）Intent消息传递机制 “Intent”“数据传递”
Intent类的继承关系: 需要注意的是,该类实现了Parcelable(用于数据传递)和Cloneable接口. Intent是一种(系统级别的)消息传递机制,可以在应用程序内使用,也可以在应用 ...

随机推荐

vue.js中内联样式style三元表达式
<span v-bind:style="{'display':config.isHaveSearch ? 'block':'none'}" >搜索</span&g ...
kubernetes promethues预警、报警
k8s addon中prometheus为测试事例,官方推荐生产环境使用Prometheus Operator and kube-prometheus. 1.clone 源码 git clone ht ...
js控制style样式
1.行内样式获取打印出来 2.内嵌和外链的获取不了 <div style="width:200px;height:200px; background: red;">&l ...
java 给任务传递参数
之前https://www.cnblogs.com/kexb/p/10228369.html没有参数,这里介绍参数什么传入 package com.hra.riskprice; import com. ...
leveldb 学习记录(一) skiplist
leveldb LevelDb是一个持久化存储的KV系统,并非完全将数据放置于内存中,部分数据也会存储到磁盘上. 想了解这个由谷歌大神编写的经典项目. 可以从数据结构以及数据结构的处理下手,也可以从示 ...
xbee/xbeeRPOS1、xbee/xbeePROS2C802.15.4/Digimesh功能方法
Digi XBee 802.15.4的第一个版本也称为S1,是基于Freescale的无线收发器片子设计的.最新的802.15.4模块(内部称号S1B)采用和Digi ZigBee模块相同SOC芯片设 ...
spring-boot json数据交互
SpringBoot学习之Json数据交互最近在弄监控主机项目,对javaweb又再努力学习.实际的项目场景中,前后分离几乎是所以项目的标配,全栈的时代的逐渐远去,后端负责业务逻辑处理,前端负责数据 ...
无法解析的外部命令gethostname
使用gethostname需要连接lib: #include <winsock2.h> #pragma comment(lib, "WS2_32.lib")
mysql8.0 linux 安装
1.下载 mysql-8.0.15-linux-glibc2.12-x86_64.tar.xz 2.解压 tar -xvf mysql-8.0.15-linux-glibc2.12-x86_64.ta ...
PCL-Kinfu编译手册
1:配置要求硬件 Win7-62bit 显卡需要compute Capability >=2.0 可以从https://developer.nvidia.com/cuda-gpus 中查找实 ...

GraphX之Pregel（BSP模型-消息传递机制）学习

GraphX之Pregel（BSP模型-消息传递机制）学习的更多相关文章

随机推荐

热门专题