所构建的图如下:

Scala程序代码如下:

import org.apache.spark._
import org.apache.spark.graphx._
// To make some of the examples work we will also need RDD
import org.apache.spark.rdd.RDD
object Test {
def main(args: Array[String]): Unit = {
// 初始化SparkContext
val sc: SparkContext = new SparkContext("local[2]", "Spark Graphx");
// 创造一个点的RDD
val users: RDD[(VertexId, (String, String))] =
sc.parallelize(Array((3L, ("rxin", "student")), (7L, ("jgonzal", "postdoc")),
(5L, ("franklin", "prof")), (2L, ("istoica", "prof"))))
// 创造一个边的RDD,包含各种关系
val relationships: RDD[Edge[String]] =
sc.parallelize(Array(Edge(3L, 7L, "collab"), Edge(5L, 3L, "advisor"),
Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi")))
// 定义一个缺省的用户,其主要作用就在于当描述一种关系中不存在的目标顶点时就会使用这个缺省的用户
val defaultUser = ("John Doe", "Missing")
// 构造图
val graph = Graph(users, relationships, defaultUser)
// 输出Graph的信息
graph.vertices.collect().foreach(println(_))
graph.triplets.map(triplet => triplet.srcAttr + "----->" + triplet.dstAttr + " attr:" + triplet.attr)
.collect().foreach(println(_))
// 统计所有用户当中postdoc的数量
val cnt1 = graph.vertices.filter { case (id, (name, pos)) => pos == "postdoc" }.count
System.out.println("所有用户当中postdoc的数量为:"+cnt1);
// 统计所有源顶点大于目标顶点src > dst的边的数量
val cnt2 = graph.edges.filter(e => e.srcId > e.dstId).count
System.out.println("所有源顶点大于目标顶点 src > dst的边的数量为:"+cnt2);
// 统计图各个顶点的入度
val inDegrees: VertexRDD[Int] = graph.inDegrees
inDegrees.collect().foreach(println(_))
}
}

相关内置的图操作方法有:

/** Summary of the functionality in the property graph */
class Graph[VD, ED] {
// Information about the Graph ===================================================================
val numEdges: Long
val numVertices: Long
val inDegrees: VertexRDD[Int]
val outDegrees: VertexRDD[Int]
val degrees: VertexRDD[Int]
// Views of the graph as collections =============================================================
val vertices: VertexRDD[VD]
val edges: EdgeRDD[ED]
val triplets: RDD[EdgeTriplet[VD, ED]]
// Functions for caching graphs ==================================================================
def persist(newLevel: StorageLevel = StorageLevel.MEMORY_ONLY): Graph[VD, ED]
def cache(): Graph[VD, ED]
def unpersistVertices(blocking: Boolean = true): Graph[VD, ED]
// Change the partitioning heuristic ============================================================
def partitionBy(partitionStrategy: PartitionStrategy): Graph[VD, ED]
// Transform vertex and edge attributes ==========================================================
def mapVertices[VD2](map: (VertexID, VD) => VD2): Graph[VD2, ED]
def mapEdges[ED2](map: Edge[ED] => ED2): Graph[VD, ED2]
def mapEdges[ED2](map: (PartitionID, Iterator[Edge[ED]]) => Iterator[ED2]): Graph[VD, ED2]
def mapTriplets[ED2](map: EdgeTriplet[VD, ED] => ED2): Graph[VD, ED2]
def mapTriplets[ED2](map: (PartitionID, Iterator[EdgeTriplet[VD, ED]]) => Iterator[ED2])
: Graph[VD, ED2]
// Modify the graph structure ====================================================================
def reverse: Graph[VD, ED]
def subgraph(
epred: EdgeTriplet[VD,ED] => Boolean = (x => true),
vpred: (VertexID, VD) => Boolean = ((v, d) => true))
: Graph[VD, ED]
def mask[VD2, ED2](other: Graph[VD2, ED2]): Graph[VD, ED]
def groupEdges(merge: (ED, ED) => ED): Graph[VD, ED]
// Join RDDs with the graph ======================================================================
def joinVertices[U](table: RDD[(VertexID, U)])(mapFunc: (VertexID, VD, U) => VD): Graph[VD, ED]
def outerJoinVertices[U, VD2](other: RDD[(VertexID, U)])
(mapFunc: (VertexID, VD, Option[U]) => VD2)
: Graph[VD2, ED]
// Aggregate information about adjacent triplets =================================================
def collectNeighborIds(edgeDirection: EdgeDirection): VertexRDD[Array[VertexID]]
def collectNeighbors(edgeDirection: EdgeDirection): VertexRDD[Array[(VertexID, VD)]]
def aggregateMessages[Msg: ClassTag](
sendMsg: EdgeContext[VD, ED, Msg] => Unit,
mergeMsg: (Msg, Msg) => Msg,
tripletFields: TripletFields = TripletFields.All)
: VertexRDD[A]
// Iterative graph-parallel computation ==========================================================
def pregel[A](initialMsg: A, maxIterations: Int, activeDirection: EdgeDirection)(
vprog: (VertexID, VD, A) => VD,
sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexID,A)],
mergeMsg: (A, A) => A)
: Graph[VD, ED]
// Basic graph algorithms ========================================================================
def pageRank(tol: Double, resetProb: Double = 0.15): Graph[Double, Double]
def connectedComponents(): Graph[VertexID, ED]
def triangleCount(): Graph[Int, ED]
def stronglyConnectedComponents(numIter: Int): Graph[VertexID, ED]
}

参考链接:

http://spark.apache.org/docs/latest/graphx-programming-guide.html

Spark GraphX图处理编程实例的更多相关文章

  1. Spark GraphX图计算核心源码分析【图构建器、顶点、边】

    一.图构建器 GraphX提供了几种从RDD或磁盘上的顶点和边的集合构建图形的方法.默认情况下,没有图构建器会重新划分图的边:相反,边保留在默认分区中.Graph.groupEdges要求对图进行重新 ...

  2. Spark GraphX图计算核心算子实战【AggreagteMessage】

    一.简介 参考博客:https://www.cnblogs.com/yszd/p/10186556.html 二.代码实现 package graphx import org.apache.log4j ...

  3. Spark GraphX图计算简单案例【代码实现,源码分析】

    一.简介 参考:https://www.cnblogs.com/yszd/p/10186556.html 二.代码实现 package big.data.analyse.graphx import o ...

  4. Spark GraphX实例(1)

    Spark GraphX是一个分布式的图处理框架.社交网络中,用户与用户之间会存在错综复杂的联系,如微信.QQ.微博的用户之间的好友.关注等关系,构成了一张巨大的图,单机无法处理,只能使用分布式图处理 ...

  5. Spark + GraphX + Pregel

    Spark+GraphX图 Q:什么是图?图的应用场景 A:图是由顶点集合(vertex)及顶点间的关系集合(边edge)组成的一种网状数据结构,表示为二元组:Gragh=(V,E),V\E分别是顶点 ...

  6. Spark GraphX企业运用

    ========== Spark GraphX 概述 ==========1.Spark GraphX是什么?  (1)Spark GraphX 是 Spark 的一个模块,主要用于进行以图为核心的计 ...

  7. Spark—GraphX编程指南

    Spark系列面试题 Spark面试题(一) Spark面试题(二) Spark面试题(三) Spark面试题(四) Spark面试题(五)--数据倾斜调优 Spark面试题(六)--Spark资源调 ...

  8. 明风:分布式图计算的平台Spark GraphX 在淘宝的实践

    快刀初试:Spark GraphX在淘宝的实践 作者:明风 (本文由团队中梧苇和我一起撰写,并由团队中的林岳,岩岫,世仪等多人Review,发表于程序员的8月刊,由于篇幅原因,略作删减,本文为完整版) ...

  9. Spark Graphx编程指南

    问题导读1.GraphX提供了几种方式从RDD或者磁盘上的顶点和边集合构造图?2.PageRank算法在图中发挥什么作用?3.三角形计数算法的作用是什么?Spark中文手册-编程指南Spark之一个快 ...

随机推荐

  1. hdoj1102 Constructing Roads(Prime || Kruskal)

    题目链接 http://acm.hdu.edu.cn/showproblem.php?pid=1102 题意 有n个村庄(编号1~n),给出n个村庄之间的距离,开始时n个村庄之间已经有了q条路,现在需 ...

  2. poj1251 Jungle Roads(Prime || Kruskal)

    题目链接 http://poj.org/problem?id=1251 题意 有n个村庄,村庄之间有道路连接,求一条最短的路径能够连接起所有村庄,输出这条最短路径的长度. 思路 最小生成树问题,使用普 ...

  3. 在windows下的CLI模式下如何运行php文件

    https://blog.csdn.net/evkj2013/article/details/52313728 https://jingyan.baidu.com/article/da1091fb09 ...

  4. ref:下一个项目为什么要用 SLF4J

    ref:http://blog.mayongfa.cn/267.html 阿里巴巴 Java 开发手册 前几天阿里巴巴在云栖社区首次公开阿里官方Java代码规范标准,就是一个PDF手册,有命名规范,让 ...

  5. The file will have its original line endings in your working directory.

    在空仓库的情况下,add,出现一下问题 The file will have its original line endings in your working directory. 当报这个警告时是 ...

  6. CSU - 2062 Z‘s Array

    Description Z likes to play with array. One day his teacher gave him an array of n elements, and ask ...

  7. date time insert

    DATE=`date '+%m/%d/%Y'`TIME=`date '+%H:%M:%S'` sed -i '1i1***** start*****' test.kshsed -i '2i\ REPO ...

  8. BZOJ.3105.[CQOI2013]新Nim游戏(线性基 贪心 博弈论)

    题目链接 如果后手想要胜利,那么在后手第一次取完石子后 可以使石子数异或和为0.那所有数异或和为0的线性基长啥样呢,不知道.. 往前想,后手可以取走某些石子使得剩下石子异或和为0,那不就是存在异或和为 ...

  9. 【贪心】Codeforces Round #480 (Div. 2) C. Posterized

    题意:让你对[0,255]这个序列任意划分成一些不重叠的子段,每个子段的大小不超过K.给你n个不超过255的数,让你将每个数替换成它所在子段的任意一个元素,使得最终这个n个数的序列的字典序最小. p[ ...

  10. 某gov的逻辑漏洞

    首先找一个号 在企业信息里面查看到大量的企业名称和组织机构代码 随后去找回密码那 可以看到是直接显示了用户名和密码 随后去登录 可以看到大量的工程信息个企业注册信息 ​