在集群中使用文件加载graph
从hdfs上加载文件并创建graph
scala> var graphs = GraphLoader.edgeListFile(sc,"/tmp/dataTest/graphTest.txt")
graphs: org.apache.spark.graphx.Graph[Int,Int] = org.apache.spark.graphx.impl.GraphImpl@ab5670d
scala> val graphs = GraphLoader.edgeListFile(sc, "/tmp/dataTest/graphTest.txt",numEdgePartitions=)
graphs: org.apache.spark.graphx.Graph[Int,Int] = org.apache.spark.graphx.impl.GraphImpl@409ea4d1
scala> var verttmp = graphs.mapVertices((id,attr) => attr*)
verttmp: org.apache.spark.graphx.Graph[Int,Int] = org.apache.spark.graphx.impl.GraphImpl@25d7eb44
scala> verttmp.vertices.take()
// :: WARN executor.Executor: block locks were not released by TID = :
[rdd_37_0]
// :: WARN executor.Executor: block locks were not released by TID = :
[rdd_37_1]
res4: Array[(org.apache.spark.graphx.VertexId, Int)] = Array((,), (,), (,), (,), (,), (,), (,), (,), (,), (,))
scala> var verttmp = graphs.mapVertices((_,attr) => attr*)
verttmp: org.apache.spark.graphx.Graph[Int,Int] = org.apache.spark.graphx.impl.GraphImpl@76828ce4
scala> var edgetmp=graphs.mapEdges(e => e.attr*)
edgetmp: org.apache.spark.graphx.Graph[Int,Int] = org.apache.spark.graphx.impl.GraphImpl@42ce3be7
scala> edgetmp.edges.take()
// :: WARN executor.Executor: block locks were not released by TID = :
[rdd_26_0]
// :: WARN executor.Executor: block locks were not released by TID = :
[rdd_26_1]
res6: Array[org.apache.spark.graphx.Edge[Int]] = Array(Edge(,,), Edge(,,), Edge(,,), Edge(,,), Edge(,,), Edge(,,), Edge(,,), Edge(,,), Edge(,,), Edge(,,))
scala> var triptmp = graphs.mapTriplets(t => t.srcAttr* + t.dstAttr*)
triptmp: org.apache.spark.graphx.Graph[Int,Int] = org.apache.spark.graphx.impl.GraphImpl@318ec664
scala> triptmp.triplets.take()
[Stage :> ( + ) / ]// :: WARN executor.Executor: block locks were not released by TID = :
[rdd_26_0]
// :: WARN executor.Executor: block locks were not released by TID = :
[rdd_26_1]
res7: Array[org.apache.spark.graphx.EdgeTriplet[Int,Int]] = Array(((,),(,),), ((,),(,),), ((,),(,),), ((,),(,),), ((,),(,),), ((,),(,),), ((,),(,),), ((,),(,),), ((,),(,),), ((,),(,),))
class Graph[VD, ED] {
def reverse: Graph[VD, ED]
def subgraph(epred: EdgeTriplet[VD,ED] => Boolean,
vpred: (VertexId, VD) => Boolean): Graph[VD, ED]
def mask[VD2, ED2](other: Graph[VD2, ED2]): Graph[VD, ED]
def groupEdges(merge: (ED, ED) => ED): Graph[VD,ED]
}
def subgraph(epred: EdgeTriplet[VD,ED] => Boolean,
vpred: (VertexId, VD) => Boolean): Graph[VD, ED]
//改函数返回的graph是满足一个boolean条件的graph
//vd就是verticesRdd,包含vertexId和attr vpred:(vertexId,(vertexId,attr))
scala> var subg = graphs.subgraph(epred = e =>e.srcId>e.dstId)
subg: org.apache.spark.graphx.Graph[Int,Int] = org.apache.spark.graphx.impl.GraphImpl@51483f93
scala> subg.edges.take()
res12: Array[org.apache.spark.graphx.Edge[Int]] = Array(
Edge(,,),
Edge(,,),
Edge(,,),
Edge(,,),
Edge(,,),
Edge(,,),
Edge(,,),
Edge(,,),
Edge(,,),
Edge(,,))
scala> subg.vertices.count
res11: Long =
scala> subg.edges.count
res13: Long =
scala> graphs.vertices.count
res9: Long =
scala> graphs.edges.count
res10: Long =
scala> graphs.inDegrees
res15: Array[(org.apache.spark.graphx.VertexId, Int)] = Array((,),
(,), (,), (,), (,), (,),
(,))
scala> graphs.outDegrees.collect
[Stage :>( + ) / ]// :: WARN executor.Executor:
res18: Array[(org.apache.spark.graphx.VertexId, Int)] = Array((,), (,),
(,), (,), (,), (,), (,),
(,), (,), (,), (,), (,))
scala> def max(a:(VertexId,Int),b:(VertexId,Int))={if(a._2>b._2) a else b }
max: (a: (org.apache.spark.graphx.VertexId, Int), b: (org.apache.spark.graphx.VertexId, Int))
(org.apache.spark.graphx.VertexId, Int)
scala> graphs.inDegrees.reduce(max)
// :: WARN executor.Executor: block locks were not released by TID = :
[rdd_14_0]
res35: (org.apache.spark.graphx.VertexId, Int) = (,) scala> graphs.outDegrees.reduce(max)
// :: WARN executor.Executor: block locks were not released by TID = :
[rdd_14_0]
res36: (org.apache.spark.graphx.VertexId, Int) = (,) scala> graphs.degrees.reduce(max)
// :: WARN executor.Executor: block locks were not released by TID = :
[rdd_14_0]
res38: (org.apache.spark.graphx.VertexId, Int) = (,)
scala> var rawG=graphs.mapVertices((id,attr) => )
rawG: org.apache.spark.graphx.Graph[Int,String] = org.apache.spark.graphx.impl.GraphImpl@43d06473
scala> rawG.vertices.collect
res47: Array[(org.apache.spark.graphx.VertexId, Int)] = Array((,), (,), (,), (,))
scala> var ind=rawG.inDegrees;
ind: org.apache.spark.graphx.VertexRDD[Int] = VertexRDDImpl[] at RDD at VertexRDD.scala:
scala> ind.collect
// :: WARN executor.Executor: block locks were not released by TID = :
[rdd_60_0]
res49: Array[(org.apache.spark.graphx.VertexId, Int)] = Array((,), (,), (,))
scala> var temp=rawG.joinVertices[Int](ind)((_,_,optdeg) => optdeg)
temp: org.apache.spark.graphx.Graph[Int,String] = org.apache.spark.graphx.impl.GraphImpl@af0e7ce
scala> temp.vertices.take();
// :: WARN executor.Executor: block locks were not released by TID = :
[rdd_60_0, rdd_77_0]
res51: Array[(org.apache.spark.graphx.VertexId, Int)] = Array((,), (,), (,), (,))
在集群中使用文件加载graph的更多相关文章
- 虹软人脸识别在 linux中so文件加载不到的问题
其实是可以加载到的,不过是so文件放的位置不一对,最简单的方式是放在 /usr/lib64 目录下,也可自己设置. so文件加载不到会报这个错误: .lang.UnsatisfiedLinkEr ...
- 读书笔记(一)—— 浅析浏览器渲染过程和html中的文件加载
在构建页面时,我们会在html中载入一个或多个css和js文件.或许大家都已经习惯了"最佳实践"中,css文件应该放在<head>标签中引入,而js文件则是放在< ...
- ssm中静态文件加载路径
项目在本地软件和在服务器上的项目路径如果写死,有可能会出现项目在本机上可以访问,架设在服务器上后就不能访问 这儿介绍在ssm框架中使用 @WebServlet(urlPatterns = {},loa ...
- js中xml文件加载
- 前端设计中关于外部js文件加载的速度优化
在一般情况下,许多人都是将<script>写在了<head>标签中,而许多浏览器都是使用单一的线程来加载js文件的,从上往下,从左往右. 若是加载过程出错,那么网页就会阻塞,就 ...
- redis/分布式文件存储系统/数据库 存储session,解决负载均衡集群中session不一致问题
先来说下session和cookie的异同 session和cookie不仅仅是一个存放在服务器端,一个存放在客户端那么笼统 session虽然存放在服务器端,但是也需要和客户端相互匹配,试想一个浏览 ...
- 在seajs中使用require加载静态文件的问题
注意,在seajs中使用require加载静态文件时,必须使用常量,不能用变量.如果一定要用变量,请使用require.async var html = require("view/sys/ ...
- html文件在head标签中引入js地址和直接写js代码,所用时间是不同的,因为引入js地址,文件加载的时候需要通过通讯协议去解析地址,读取外部文件
html文件在head标签中引入js地址和直接写js代码,所用时间是不同的,因为引入js地址,文件加载的时候需要通过通讯协议去解析地址,读取外部文件
- Java中的资源文件加载方式
文件加载方式有两种: 使用文件系统自带的路径机制,一个应用程序只能有一个当前目录,但可以有Path变量来访问多个目录 使用ClassPath路径机制,类路径跟Path全局变量一样也是有多个值 在Jav ...
随机推荐
- POJ 3080-Blue Jeans【kmp,字符串剪接】
Blue Jeans Time Limit: 1000MS Memory Limit: 65536K Total Submissions: 20695 Accepted: 9167 Descr ...
- Tarjan缩点【p4819】[中山市选]杀人游戏
Description 一位冷血的杀手潜入Na-wiat,并假装成平民.警察希望能在\(N\)个人里面,查出谁是杀手.警察能够对每一个人进行查证,假如查证的对象是平民,他会告诉警察,他认识的人,谁是杀 ...
- 洛谷——P1170 兔八哥与猎人
P1170 兔八哥与猎人 题目描述 兔八哥躲藏在树林旁边的果园里.果园有M × N棵树,组成一个M行N列的矩阵,水平或垂直相邻的两棵树的距离为1.兔八哥在一棵果树下. 猎人背着猎枪走进了果园,他爬上一 ...
- POJ1273 Drainage Ditches (网络流)
Drainage Ditches Time Limit: 1000MS Memor ...
- 使用after伪类,配合IE的zoom或者overflow清除浮动
用after伪类实现,兼容多种浏览器:.clearfix:after {content:"."; display:block; height:0; clear:both; visi ...
- Xamarin XAML语言教程Xamarin.Forms中构建进度条
Xamarin XAML语言教程Xamarin.Forms中构建进度条 ProgressBar被称为进度条,它类似于没有滑块的滑块控件.进度条总是水平放置的.本节将讲解如何使用进度条. 注意:进度条在 ...
- LA 3713 Astronauts
给个题目链接: https://icpcarchive.ecs.baylor.edu/index.php?option=com_onlinejudge&Itemid=8&page=sh ...
- 六. 异常处理6.try语句的嵌套
Try语句可以被嵌套.也就是说,一个try语句可以在另一个try块内部.每次进入try语句,异常的前后关系都会被推入堆栈.如果一个内部的try语句不含特殊异常的catch处理程序,堆栈将弹出,下一个t ...
- Parse error: syntax error, unexpected end of file in *.php on line * 解决方法
Parse error: syntax error, unexpected end of file in *.php on line * 解决方法 这篇文章主要介绍了PHP错误Parse erro ...
- MySQL 5.7.17 Group Replication 初始
http://blog.csdn.net/mchdba/article/details/53957248