spark Graph 的PregelAPI 理解和使用

图本质上是一种递归的数据结构,可以使用Spark GraphX 的PregelAPI接口对图数据进行批量计算,




* Execute a Pregel-like iterative vertex-parallel abstraction. The
* user-defined vertex-program `vprog` is executed in parallel on
* each vertex receiving any inbound messages and computing a new
* value for the vertex. The `sendMsg` function is then invoked on
* all out-edges and is used to compute an optional message to the
* destination vertex. The `mergeMsg` function is a commutative
* associative function used to combine messages destined to the
* same vertex.
* On the first iteration all vertices receive the `initialMsg` and
* on subsequent iterations if a vertex does not receive a message
* then the vertex-program is not invoked.
* This function iterates until there are no remaining messages, or
* for `maxIterations` iterations.
* @param A the Pregel message type
* @param initialMsg the message each vertex will receive at the on
* the first iteration
* @param maxIterations the maximum number of iterations to run for
* @param activeDirection the direction of edges incident to a vertex that received a message in
* the previous round on which to run `sendMsg`. For example, if this is `EdgeDirection.Out`, only
* out-edges of vertices that received a message in the previous round will run.
* @param vprog the user-defined vertex program which runs on each
* vertex and receives the inbound message and computes a new vertex
* value. On the first iteration the vertex program is invoked on
* all vertices and is passed the default message. On subsequent
* iterations the vertex program is only invoked on those vertices
* that receive messages.
* @param sendMsg a user supplied function that is applied to out
* edges of vertices that received messages in the current
* iteration
* @param mergeMsg a user supplied function that takes two incoming
* messages of type A and merges them into a single message of type
* A. ''This function must be commutative and associative and
* ideally the size of A should not increase.''
* @return the resulting graph at the end of the computation
def pregel[A: ClassTag](
initialMsg: A,
maxIterations: Int = Int.MaxValue,
activeDirection: EdgeDirection = EdgeDirection.Either)(
vprog: (VertexId, VD, A) => VD,
sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexId, A)],
mergeMsg: (A, A) => A)
: Graph[VD, ED] = {
Pregel(graph, initialMsg, maxIterations, activeDirection)(vprog, sendMsg, mergeMsg)






然后再调用sendMsg 函数向出边顶点发送下一轮迭代的消息;




VD : 顶点的属性的数据类型。

ED : 边的属性的数据类型

VertexId : 顶点ID的类型

A : Pregel message的类型。


initialMsg : 图的每个顶点在首轮迭代时收到的初始化消息






      vprog: (VertexId, VD, A) => VD
输入参数: 顶点ID ,该顶点对应的顶点属性值,本轮迭代收到的message
输出结果: 新的顶点属性值




sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexId, A)],
输入参数是 EdgeTriplet :当前迭代计算收到消息的顶点为源顶点的边edges的EdgeTriplet对象。
输出结果: 下一迭代的消息。


用户提供定义的函数,将具有相同目的地的消息合并成一个;如果一个顶点,收到两个以上的A类型的消息message,该函数将他们合并成一个A类型消息。 这个函数必须是可交换的和关联的。理想情况下,A类型的message的size大小不应增加。

mergeMsg: (A, A) => A)




package graphxTest

import org.apache.spark.rdd.RDD
import org.apache.spark.sql.SparkSession
import org.apache.spark.graphx.{Edge, Graph, VertexId} /**
* Created by Mtime on 2018/1/25.
object GraphxPregelTest {
val spark = SparkSession
val sc = spark.sparkContext /**
* 计算最短路径
def shortestPath(): Unit = {
val graph: Graph[Long, Double] = genGraph
graph.triplets.foreach(t => {
println(s"t.srcId=${t.srcId} t.dstId=${t.dstId} t.srcAttr=${t.srcAttr} t.dstAttr=${t.dstAttr}")
}) val sourceId: VertexId = 1 // 计算顶点1到图各个顶点的最短路径
// Initialize the graph such that all vertices except the root have distance infinity.
val initialGraph = graph.mapVertices((id, att) =>
if (id == sourceId) 0.0 else Double.PositiveInfinity) println("------------------------------")
initialGraph.triplets.foreach(t => {
println(s"t.srcId=${t.srcId} t.dstId=${t.dstId} t.srcAttr=${t.srcAttr} t.dstAttr=${t.dstAttr}")
}) val sssp:Graph[Double,Double] = initialGraph.pregel(Double.PositiveInfinity)(
(vid, vidAttr, message) => math.min(vidAttr, message), // Vertex Program
triplet => {
// Send Message
if (triplet.srcAttr + triplet.attr < triplet.dstAttr) {
Iterator((triplet.dstId, triplet.srcAttr + triplet.attr))
} else {
(message_a, message_b) => math.min(message_a, message_b) // Merge Message
} /**
* 初始化图对象
* @return
private def genGraph(): Graph[Long, Double] = {
val vertices: RDD[(VertexId, Long)] =
(1L, 0L),
(2L, 0L),
(3L, 0L),
(4L, 0L),
(5L, 0L),
(6L, 0L))
// Create an RDD for edges
val edges: RDD[Edge[Double]] =
Edge(1L, 2L, 1.0),
Edge(1L, 4L, 1.0),
Edge(1L, 5L, 1.0),
Edge(2L, 3L, 1.0),
Edge(4L, 3L, 1.0),
Edge(5L, 4L, 1.0),
Edge(3L, 6L, 1.0)
val graph: Graph[Long, Double] = Graph(vertices, edges, 0)
} def main(args: Array[String]) {

