Link Analysis_1_Basic Elements
1. Edge Attributes
1.1 Methods of category
1.1.1 Basic three categories in terms of number of layers as edges or direction of edges:
- import networkx as nx
- G = nx.DiGraph() # 1.directed
- G = nx.Graph() # 2.undirected
- G = nx.MultiGraph() # 3.between two nodes many layers of relationships
1.1.2 Logical categories in terms of cluster characteristics, i.e., Bipartite:
- from networkx.algorithms import bipartite
- B = nx.Graph() # create an empty network first step, no subsets of nodes
- B.add_nodes_from(['H', 'I', 'J', 'K', 'L'], bipartite = 0) # label 1 group
- B.add_nodes_from([7, 8, 9, 10], bipartite = 1) # label 2
- # add a list of edges at one time
- B.add_edges_from([('H', 7), ('I', 7), ('J', 9),('K', 8), ('K', 10), ('L', 10)])
# Chect if bipartite or not
bipartite.is_bipartite(B)
Bipartite graph cannot contain a cycle of an odd number of nodes.
1.2 Edge can contain detailed features:
- G.add_edge('A', 'B', weight = 6, relation = 'family', sign = '+')
- G.remove_edge('A', 'B') # remove edge
1.3 Access edges:
- G.edges() # list of all edges
- G.edges(data = True) # list of all with attributes
- G.edges(data = 'relation') # list with certain attribute
2. Node Attributes
2.1 Node be named as character.
- G.add_node('A', name = 'Sophie')
- G.add_node('B', name = 'Cumberbatch')
- G.add_node('C', name = 'Miko') # pet dog
2.2 Access nodes:
- G.node['A']['name']
3. Network Connectivity
3.1 Triadic Closure: Tendency for people who have shared connections to become connects, i.e., to cluster.
3.1.1 Local Clustering Coefficient
- # local clustering only for multigraph type
- G = nx.Graph()
- G.add_edges_from([('A', 'K'),
- ('A', 'B'),
- ('A', 'C'),
- ('B', 'C'),
- ('B', 'K'),
- ('C', 'E'),
- ('C', 'F'),
- ('D', 'E'),
- ('E', 'F'),
- ('E', 'H'),
- ('F', 'G'),
- ('I', 'J')])
- nx.clustering(G, 'A')
- 0.6666666666666666
Solve: 2 / [2 × 3 ÷ 2] # actual pairs / (C32)
3.1.2 Global Clustering Coefficient
- # Method 1: Take average of all local clustering coefficients.
- nx.average_clustering(G)
- 0.28787878787878785
- # Method 2: Percent of open triads that are triangles in the network
- # Triange: 3 nodes connected by 3 edges
- # open triads: 3 nodes connected by 2 edges
- # Transitivity = (3 * number of closed triads) / number of open triads
- nx.transitivity(G)
- 0.4090909090909091
Method 2 put a larger weight on high degree nodes.
3.2 Distances
3.2.1 Singe Pair Pattern:
Find path and length of the shortest path between two nodes.
- nx.shortest_path(G, 'A', 'H')
- ['A', 'C', 'E', 'H']
- nx.shortest_path_length(G, 'A', 'H')
- 3
3.2.2 One Node to Every Others Pattern:
Breadth-first Search: discover nodes in layers step by step.
- T = nx.bfs_tree(G, 'A')
- T.edges() # to get the tree
- OutEdgeView([('A', 'K'), ('A', 'B'), ('A', 'C'), ('C', 'E'), ('C', 'F'), ('E', 'D'), ('E', 'H'), ('F', 'G')])
- nx.shortest_path_length(G, 'A') # get dictionary of distances from A to others
- {'A': 0, 'K': 1, 'B': 1, 'C': 1, 'E': 2, 'F': 2, 'D': 3, 'H': 3, 'G': 3}
3.2.3 Measures of Distance Patterns
- # Average of all
- nx.average_shortest_path_length(G)
- # Maximum distance
- nx.diameter(G)
Eccentricity of a node is the largest distance between A and all others.
Radius is the minimum eccentricity.
Periphery is the set of nodes that have eccentricity equal to the diameter.
Center is the set of nodes with eccentricity equal to radius.
- nx.eccentricity(G)
nx.radius(G)
nx.periphery(G)
nx.center(G)
3.2.4 Application
- import numpy as np
- import pandas as pd
- %matplotlib notebook
- # Instantiate the graph
- G = nx.karate_club_graph()
- nx.draw_networkx(G)
4. Connectivity
4.1 Connectivity in Undirected Graphs
- # find number of communities (connected componets)
- nx.number_connected_componets(G)
- # give list of them
- sorted(nx.connected_components(G))
- # find the community to which 'M' belongs
- nx.node_connected_components(G, 'M')
4.2 Connectivity in Directed Graphs
- # find strongly connected component (directed path to every other nodes &
- # no other node has directed path to this subset)
- sorted(nx_strongly_connected_components(G))
5. Network Robustness
5.1 Definition: the ability for network to maintain general structural properties (connectivity) when faced with attacks (removal of edges or nodes).
- # smallest number of nodes needed to disconnect
- nx.node_connectivity(G_un)
- # which nodes
- nx.minimum_code_cut(G_un)
- # smallest number of edges needed to disconnect
- nx.edge_connectivity(G_un)
- # which edges
- nx.minimum_edge_cut(G_un)
5.2 Node Connectivity
- # ways to deliver msg from 'G' to 'L'
- sorted(nx.all_simple_paths(G, 'G', 'L'))
- # want to block this path, how many nodes neeed to remove
- nx.node_connectivity(G, 'G', 'L')
- # which nodes
- nx.minimum_node_cut(G, 'G', 'L')
5.3 Edge Connectivity
- # how many
- nx.edge_connectivity(G, 'G', 'L')
- # show in details
- nx.minimum_edge_cut(G, 'G', 'L')
6. Centrality
6.1 Degree Centrality
6.1.1 Undirected Network
- G = nx.karate_club_graph()
- G = nx.convert_node_labels_to_integers(G, first_label = 1)
- degCent = nx.degree_centrality(G)
- degCent[34]
- 0.5151515151515151
6.1.2 Directed Network
- indegCent = nx.in_degree_centrality(G)
- indegCent = nx.out_degree_centrality(G)
6.2 Closeness Centrality
6.2.1 Calculation: Shorter distance away from all other nodes.
- closeCent = nx.closeness_centrality(G)
- closeCent[34]
- 0.55
- sum(nx.shortest_path_length(G, 34).values())
- 60
- # Essence is equivalent to process below
- (len(G.nodes()) - 1)/61
- 0.5409836065573771
6.2.2 Disconnceted Nodes Measurement
Method One
- # choose non-normalizing, closeness centrality would be one
- nx.closeness_centrality(G, normalized = False)
- 1
Method Two
- # choose normalising,i.e. divide by (total nodes - 1)
- nx.closeness_centrality(G, normalized = True)
- 0.071
6.3 Betweenness Centrality (computationally expensive)
Essence: Find nodes which shows up in many shortest paths between two nodes.
6.3.1 Method One: Use all 34 nodes in karate club
- btwnCent = nx.betweenness_centrality(G,normalized = True, endpoints = False)
- import operator
- sorted(btwnCent.items(), key = operator.itemgetter(1), reverse = True)[0:5]
- [(1, 0.43763528138528146),
- (34, 0.30407497594997596),
- (33, 0.145247113997114),
- (3, 0.14365680615680618),
- (32, 0.13827561327561325)]
6.3.2 Method Two: Use 10 nodes as approximation
- btwnCent_approx = nx.betweenness_centrality(G,normalized = True, endpoints = False, k = 10)
- sorted(btwnCent_approx.items(), key = operator.itemgetter(1), reverse = True)[0:5]
- [(1, 0.3674031986531986),
- (34, 0.3048388648388649),
- (32, 0.17290028258778256),
- (3, 0.13572044853294854),
- (33, 0.130249518999519)]
6.3.3 Method Three: Specify subsets
- btwnCent_subset = nx.betweenness_centrality_subset(G,
- [34, 33, 21, 30, 16, 27, 15, 23, 10],
- [1, 4, 13, 11, 6, 12, 17, 7],
- normalized = True)
- sorted(btwnCent_subset.items(), key = operator.itemgetter(1), reverse = True)[0:5]
- [(1, 0.04899515993265994),
- (34, 0.028807419432419434),
- (3, 0.018368205868205867),
- (33, 0.01664712602212602),
- (9, 0.014519450456950456)]
6.3.4 Method Four: Edges
- btwnCent_edge = nx.edge_betweenness_centrality(G, normalized = True)
- sorted(btwnCent_edge.items(), key = operator.itemgetter(1), reverse = True)[0:5]
- # node 1 is the instructor of club
- [((1, 32), 0.1272599949070537),
- ((1, 7), 0.07813428401663695),
- ((1, 6), 0.07813428401663694),
- ((1, 3), 0.0777876807288572),
- ((1, 9), 0.07423959482783014)]
- btwnCent_edge_subset = nx.edge_betweenness_centrality_subset(G,
- [34, 33, 21, 30, 16, 27, 15, 23, 10],
- [1, 4, 13, 11, 6, 12, 17, 7],
- normalized = True)
- sorted(btwnCent_edge_subset.items(), key = operator.itemgetter(1), reverse = True)[0:5]
- [((1, 9), 0.01366536513595337),
- ((1, 32), 0.01366536513595337),
- ((14, 34), 0.012207509266332794),
- ((1, 3), 0.01211343123107829),
- ((1, 6), 0.012032085561497326)]
Link Analysis_1_Basic Elements的更多相关文章
- [.net 面向对象程序设计进阶] (11) 序列化(Serialization)(三) 通过接口 IXmlSerializable 实现XML序列化 及 通用XML类
[.net 面向对象程序设计进阶] (11) 序列化(Serialization)(三) 通过接口 IXmlSerializable 实现XML序列化 及 通用XML类 本节导读:本节主要介绍通过序列 ...
- [.net 面向对象程序设计进阶] (7) Lamda表达式(三) 表达式树高级应用
[.net 面向对象程序设计进阶] (7) Lamda表达式(三) 表达式树高级应用 本节导读:讨论了表达式树的定义和解析之后,我们知道了表达式树就是并非可执行代码,而是将表达式对象化后的数据结构.是 ...
- Skip list--reference wiki
In computer science, a skip list is a data structure that allows fast search within an ordered seque ...
- 基于jsoup的Java服务端http(s)代理程序-代理服务器Demo
亲爱的开发者朋友们,知道百度网址翻译么?他们为何能够翻译源网页呢,iframe可是不能跨域操作的哦,那么可以用代理实现.直接上代码: 本Demo基于MVC写的,灰常简单,copy过去,简单改改就可以用 ...
- Netty源码分析第8章(高性能工具类FastThreadLocal和Recycler)---->第6节: 异线程回收对象
Netty源码分析第八章: 高性能工具类FastThreadLocal和Recycler 第六节: 异线程回收对象 异线程回收对象, 就是创建对象和回收对象不在同一条线程的情况下, 对象回收的逻辑 我 ...
- fullpage.js 具体使用方法
1.fullpage.js 下载地址 https://github.com/alvarotrigo/fullPage.js 2.fullPage.js 是一个基于 jQuery 的插件,它能够很方便 ...
- guestfs-python 手册
Help on module guestfs: NAME guestfs - Python bindings for libguestfs FILE /usr/lib64/python2.7/site ...
- Java爬取网易云音乐民谣并导入Excel分析
前言 考虑到这里有很多人没有接触过Java网络爬虫,所以我会从很基础的Jsoup分析HttpClient获取的网页讲起.了解这些东西可以直接看后面的"正式进入案例",跳过前面这些基 ...
- 由Reference展开的学习
在阅读Thinking in Java的Containers in depth一章中的Holding references时,提到了一个工具包java.lang.ref,说这是个为Java垃圾回收提供 ...
随机推荐
- 忘记linux下的mysql密码,需要重新创建密码123456
你必须要有操作系统的root权限了. # mysqld_safe --skip-grant-tables & &,表示在后台运行,不再后台运行的话,就再打开一个终端咯. # mysql ...
- Latin-1字符集
ISO Latin-1字符集是Unicode字符集的一个子集,对应于IE4+中Unicode字符指令表的前256个条目.下面表格中详细提供了每个字符及字符的十进制编码和HTML已命名实体.其中Unic ...
- JAVA 常用包
JAVA是以包的形式进行语言结构组织的. 引入这些包的关键词就是 import 下面说说 JAVA常用包有下面的几个 1. java.lang 这个是默认引入的,也是一个最基础的包.其中lang不是中 ...
- 使用IDEA导入一个Maven风格的SSM项目
转自: 方法一: (我用的这种,导入的方法 File->New->Project from existing sources)(同理,important也是一样的) https://how ...
- unittest中的parameterized参数化
一.安装插件 pip install parameterized 二.有默认参数情况与没有默认参数情况---1 注意:这种写法,只能给单个用例进行参数化,不能给多个用例使用,要每个用例都进行参数化. ...
- 【转】spring IOC和AOP的理解
spring 的优点?1.降低了组件之间的耦合性 ,实现了软件各层之间的解耦 2.可以使用容易提供的众多服务,如事务管理,消息服务等 3.容器提供单例模式支持 4.容器提供了AOP技术,利用它很容易实 ...
- Eclipse创建一个普通的java web项目
1.右键new ,选web project ,下一步 2.为项目命名,然后finish 3.然后将jar包复制到lib目录下, 4.就会自动将jar包编译到web app Libraries,项目创建 ...
- 【原】postman设置环境变量和全局变量
一:设置环境变量 1. postman通过变换环境变量来快速变换环境地址. 2. 现可以将localhost:80信息添加至环境 3. 点击确定后,在首页可看到已添加的环境变量信息及设置的变量信息: ...
- FTP虚拟账户
部署一个内网FTP服务器 为了解决公司员工文件存储和下载的需求.要求部署内部FTP服务器,员工可以通过自己的账号的权限对FTP进行操作. 1)公司公共文件可以通过匿名下载 2)公司财务部.商务部.行政 ...
- KEIL的一些函数
一 Predefined Functions:http://www.keil.com/support/man/docs/uv4cl/uv4cl_df_predeffunct.htm 主要有三角/反三角 ...