Link Analysis_1_Basic Elements
1. Edge Attributes
1.1 Methods of category
1.1.1 Basic three categories in terms of number of layers as edges or direction of edges:
import networkx as nx
G = nx.DiGraph() # 1.directed
G = nx.Graph() # 2.undirected
G = nx.MultiGraph() # 3.between two nodes many layers of relationships
1.1.2 Logical categories in terms of cluster characteristics, i.e., Bipartite:
from networkx.algorithms import bipartite
B = nx.Graph() # create an empty network first step, no subsets of nodes
B.add_nodes_from(['H', 'I', 'J', 'K', 'L'], bipartite = 0) # label 1 group
B.add_nodes_from([7, 8, 9, 10], bipartite = 1) # label 2
# add a list of edges at one time
B.add_edges_from([('H', 7), ('I', 7), ('J', 9),('K', 8), ('K', 10), ('L', 10)])
# Chect if bipartite or not
bipartite.is_bipartite(B)
Bipartite graph cannot contain a cycle of an odd number of nodes.
1.2 Edge can contain detailed features:
G.add_edge('A', 'B', weight = 6, relation = 'family', sign = '+')
G.remove_edge('A', 'B') # remove edge
1.3 Access edges:
G.edges() # list of all edges
G.edges(data = True) # list of all with attributes
G.edges(data = 'relation') # list with certain attribute
2. Node Attributes
2.1 Node be named as character.
G.add_node('A', name = 'Sophie')
G.add_node('B', name = 'Cumberbatch')
G.add_node('C', name = 'Miko') # pet dog
2.2 Access nodes:
G.node['A']['name']
3. Network Connectivity
3.1 Triadic Closure: Tendency for people who have shared connections to become connects, i.e., to cluster.
3.1.1 Local Clustering Coefficient
# local clustering only for multigraph type
G = nx.Graph()
G.add_edges_from([('A', 'K'),
('A', 'B'),
('A', 'C'),
('B', 'C'),
('B', 'K'),
('C', 'E'),
('C', 'F'),
('D', 'E'),
('E', 'F'),
('E', 'H'),
('F', 'G'),
('I', 'J')])
nx.clustering(G, 'A')
0.6666666666666666
Solve: 2 / [2 × 3 ÷ 2] # actual pairs / (C32)
3.1.2 Global Clustering Coefficient
# Method 1: Take average of all local clustering coefficients.
nx.average_clustering(G)
0.28787878787878785
# Method 2: Percent of open triads that are triangles in the network
# Triange: 3 nodes connected by 3 edges
# open triads: 3 nodes connected by 2 edges
# Transitivity = (3 * number of closed triads) / number of open triads
nx.transitivity(G)
0.4090909090909091
Method 2 put a larger weight on high degree nodes.
3.2 Distances
3.2.1 Singe Pair Pattern:
Find path and length of the shortest path between two nodes.
nx.shortest_path(G, 'A', 'H')
['A', 'C', 'E', 'H']
nx.shortest_path_length(G, 'A', 'H')
3
3.2.2 One Node to Every Others Pattern:
Breadth-first Search: discover nodes in layers step by step.
T = nx.bfs_tree(G, 'A')
T.edges() # to get the tree
OutEdgeView([('A', 'K'), ('A', 'B'), ('A', 'C'), ('C', 'E'), ('C', 'F'), ('E', 'D'), ('E', 'H'), ('F', 'G')])
nx.shortest_path_length(G, 'A') # get dictionary of distances from A to others
{'A': 0, 'K': 1, 'B': 1, 'C': 1, 'E': 2, 'F': 2, 'D': 3, 'H': 3, 'G': 3}
3.2.3 Measures of Distance Patterns
# Average of all
nx.average_shortest_path_length(G)
# Maximum distance
nx.diameter(G)
Eccentricity of a node is the largest distance between A and all others.
Radius is the minimum eccentricity.
Periphery is the set of nodes that have eccentricity equal to the diameter.
Center is the set of nodes with eccentricity equal to radius.
nx.eccentricity(G)
nx.radius(G)
nx.periphery(G)
nx.center(G)
3.2.4 Application
import numpy as np
import pandas as pd
%matplotlib notebook
# Instantiate the graph
G = nx.karate_club_graph()
nx.draw_networkx(G)
4. Connectivity
4.1 Connectivity in Undirected Graphs
# find number of communities (connected componets)
nx.number_connected_componets(G)
# give list of them
sorted(nx.connected_components(G))
# find the community to which 'M' belongs
nx.node_connected_components(G, 'M')
4.2 Connectivity in Directed Graphs
# find strongly connected component (directed path to every other nodes &
# no other node has directed path to this subset)
sorted(nx_strongly_connected_components(G))
5. Network Robustness
5.1 Definition: the ability for network to maintain general structural properties (connectivity) when faced with attacks (removal of edges or nodes).
# smallest number of nodes needed to disconnect
nx.node_connectivity(G_un)
# which nodes
nx.minimum_code_cut(G_un)
# smallest number of edges needed to disconnect
nx.edge_connectivity(G_un)
# which edges
nx.minimum_edge_cut(G_un)
5.2 Node Connectivity
# ways to deliver msg from 'G' to 'L'
sorted(nx.all_simple_paths(G, 'G', 'L'))
# want to block this path, how many nodes neeed to remove
nx.node_connectivity(G, 'G', 'L')
# which nodes
nx.minimum_node_cut(G, 'G', 'L')
5.3 Edge Connectivity
# how many
nx.edge_connectivity(G, 'G', 'L')
# show in details
nx.minimum_edge_cut(G, 'G', 'L')
6. Centrality
6.1 Degree Centrality
6.1.1 Undirected Network
G = nx.karate_club_graph()
G = nx.convert_node_labels_to_integers(G, first_label = 1)
degCent = nx.degree_centrality(G)
degCent[34]
0.5151515151515151
6.1.2 Directed Network
indegCent = nx.in_degree_centrality(G)
indegCent = nx.out_degree_centrality(G)
6.2 Closeness Centrality
6.2.1 Calculation: Shorter distance away from all other nodes.
closeCent = nx.closeness_centrality(G)
closeCent[34]
0.55
sum(nx.shortest_path_length(G, 34).values())
60
# Essence is equivalent to process below
(len(G.nodes()) - 1)/61
0.5409836065573771
6.2.2 Disconnceted Nodes Measurement
Method One
# choose non-normalizing, closeness centrality would be one
nx.closeness_centrality(G, normalized = False)
1
Method Two
# choose normalising,i.e. divide by (total nodes - 1)
nx.closeness_centrality(G, normalized = True)
0.071
6.3 Betweenness Centrality (computationally expensive)
Essence: Find nodes which shows up in many shortest paths between two nodes.
6.3.1 Method One: Use all 34 nodes in karate club
btwnCent = nx.betweenness_centrality(G,normalized = True, endpoints = False)
import operator
sorted(btwnCent.items(), key = operator.itemgetter(1), reverse = True)[0:5]
[(1, 0.43763528138528146),
(34, 0.30407497594997596),
(33, 0.145247113997114),
(3, 0.14365680615680618),
(32, 0.13827561327561325)]
6.3.2 Method Two: Use 10 nodes as approximation
btwnCent_approx = nx.betweenness_centrality(G,normalized = True, endpoints = False, k = 10)
sorted(btwnCent_approx.items(), key = operator.itemgetter(1), reverse = True)[0:5]
[(1, 0.3674031986531986),
(34, 0.3048388648388649),
(32, 0.17290028258778256),
(3, 0.13572044853294854),
(33, 0.130249518999519)]
6.3.3 Method Three: Specify subsets
btwnCent_subset = nx.betweenness_centrality_subset(G,
[34, 33, 21, 30, 16, 27, 15, 23, 10],
[1, 4, 13, 11, 6, 12, 17, 7],
normalized = True)
sorted(btwnCent_subset.items(), key = operator.itemgetter(1), reverse = True)[0:5]
[(1, 0.04899515993265994),
(34, 0.028807419432419434),
(3, 0.018368205868205867),
(33, 0.01664712602212602),
(9, 0.014519450456950456)]
6.3.4 Method Four: Edges
btwnCent_edge = nx.edge_betweenness_centrality(G, normalized = True)
sorted(btwnCent_edge.items(), key = operator.itemgetter(1), reverse = True)[0:5]
# node 1 is the instructor of club
[((1, 32), 0.1272599949070537),
((1, 7), 0.07813428401663695),
((1, 6), 0.07813428401663694),
((1, 3), 0.0777876807288572),
((1, 9), 0.07423959482783014)]
btwnCent_edge_subset = nx.edge_betweenness_centrality_subset(G,
[34, 33, 21, 30, 16, 27, 15, 23, 10],
[1, 4, 13, 11, 6, 12, 17, 7],
normalized = True)
sorted(btwnCent_edge_subset.items(), key = operator.itemgetter(1), reverse = True)[0:5]
[((1, 9), 0.01366536513595337),
((1, 32), 0.01366536513595337),
((14, 34), 0.012207509266332794),
((1, 3), 0.01211343123107829),
((1, 6), 0.012032085561497326)]
Link Analysis_1_Basic Elements的更多相关文章
- [.net 面向对象程序设计进阶] (11) 序列化(Serialization)(三) 通过接口 IXmlSerializable 实现XML序列化 及 通用XML类
[.net 面向对象程序设计进阶] (11) 序列化(Serialization)(三) 通过接口 IXmlSerializable 实现XML序列化 及 通用XML类 本节导读:本节主要介绍通过序列 ...
- [.net 面向对象程序设计进阶] (7) Lamda表达式(三) 表达式树高级应用
[.net 面向对象程序设计进阶] (7) Lamda表达式(三) 表达式树高级应用 本节导读:讨论了表达式树的定义和解析之后,我们知道了表达式树就是并非可执行代码,而是将表达式对象化后的数据结构.是 ...
- Skip list--reference wiki
In computer science, a skip list is a data structure that allows fast search within an ordered seque ...
- 基于jsoup的Java服务端http(s)代理程序-代理服务器Demo
亲爱的开发者朋友们,知道百度网址翻译么?他们为何能够翻译源网页呢,iframe可是不能跨域操作的哦,那么可以用代理实现.直接上代码: 本Demo基于MVC写的,灰常简单,copy过去,简单改改就可以用 ...
- Netty源码分析第8章(高性能工具类FastThreadLocal和Recycler)---->第6节: 异线程回收对象
Netty源码分析第八章: 高性能工具类FastThreadLocal和Recycler 第六节: 异线程回收对象 异线程回收对象, 就是创建对象和回收对象不在同一条线程的情况下, 对象回收的逻辑 我 ...
- fullpage.js 具体使用方法
1.fullpage.js 下载地址 https://github.com/alvarotrigo/fullPage.js 2.fullPage.js 是一个基于 jQuery 的插件,它能够很方便 ...
- guestfs-python 手册
Help on module guestfs: NAME guestfs - Python bindings for libguestfs FILE /usr/lib64/python2.7/site ...
- Java爬取网易云音乐民谣并导入Excel分析
前言 考虑到这里有很多人没有接触过Java网络爬虫,所以我会从很基础的Jsoup分析HttpClient获取的网页讲起.了解这些东西可以直接看后面的"正式进入案例",跳过前面这些基 ...
- 由Reference展开的学习
在阅读Thinking in Java的Containers in depth一章中的Holding references时,提到了一个工具包java.lang.ref,说这是个为Java垃圾回收提供 ...
随机推荐
- shell 脚本基础
弱类型语言 bash 变量类型 本地变量 环境变量 局部变量 位置参数变量 特殊变量 运行 无执行权限 bash hello.sh 有执行权限 检查语法 bash -n user.sh 跟踪每一行的执 ...
- Java IO流详解(三)——字节流InputStream和OutPutStream
我们都知道在计算机中,无论是文本.图片.音频还是视频,所有的文件都是以二进制(字节)形式存在的,IO流中针对字节的输入输出提供了一系列的流,统称为字节流.字节流是程序中最常用的流.在JDK中,提供了两 ...
- JIRA 安装与破解 Mac版
1.安装 JDK 2.安装 MySQL 下载地址:https://dev.mysql.com/downloads/ 选择对应版本下载安装.我选择的是 mysql-8.0.11-macos10.13-x ...
- 高级T-SQL进阶系列 (一)【上篇】:使用 CROSS JOIN 介绍高级T-SQL
[译注:此文为翻译,由于本人水平所限,疏漏在所难免,欢迎探讨指正] 原文连接:传送门 这是一个新进阶系列的第一篇文章,我们将浏览Transact-SQL(T-SQL)的更多高级特性.这个进阶系列将会包 ...
- 嵌入式实时程序设计中C/C++代码的优化
1 引言 计算机技术和信息技术的高速发展的今天,计算机和计算机技术大量应用在人们的日常生活中,嵌入式计算机也得到了广泛的应用.嵌入式计算机是指完成一种或多种特定功能的计算机系统,是软硬件的紧密结合体. ...
- 用java代码打印九九乘法表
package com.wf; public class cal { public static void main(String[] args) { for(int i=1;i<10;i++) ...
- GUI编程与CLI编程
作为一名多年的iOS开发人员,多次触发我思酌“GUI编程与CLI编程”各自的优劣,尤其是在当我为界面交互花费大占比时间时,时常怀疑自己的工作性质,我终究还是为互联网工作的码农,而不是让互联网为我所用的 ...
- iOS中常用的手势
--前言 智能手机问世后的很长一段时间,各大手机厂商都在思考着智能手机应该怎么玩?也都在尝试着制定自己的一套操作方式.直到2007年乔布斯发布了iPhone手机,人们才认识到智能手机就应该这样玩. 真 ...
- 第1节 Scala基础语法:3、环境;4、插件
1. Scala编译器安装 1.1. 安装JDK 因为Scala是运行在JVM平台上的,所以安装Scala之前要安装JDK. 1.2. 安装Scala 1.2.1. Windows ...
- 代码审计变成CTF
0x01 代码审计中的信息收集 一个cms代码量确实不少,通读代码耗时长,效果也不一定好.而一个功能点如果之前出过漏洞,特别是多次出现漏洞的地方,证明开发者对这个漏洞的理解不充分,很容易再次绕过补丁. ...