论文信息

论文标题：Rethinking and Scaling Up Graph Contrastive Learning: An Extremely Efficient Approach with Group Discrimination
论文作者：Yizhen Zheng, Shirui Pan, Vincent Cs Lee, Yu Zheng, Philip S. Yu
论文来源：2022，NeurIPS
论文地址：download
论文代码：download

1 Introduction

　　GCL 需要大量的 Epoch 在数据集上训练，本文的启发来自 GCL 的代表性工作 DGI 和 MVGRL，因为 Sigmoid 函数存在的缺陷，因此，本文提出 Group Discrimination (GD) ，并基于此提出本文的模型 Graph Group Discrimination (GGD)。

　　Graph ContrastiveLearning 和 Group Discrimination 的区别：

GD directly discriminates a group of positive nodes from a group of negative nodes.
GCL maximise the mutual information (MI) between an anchor node and its positive counterparts, sharing similar semantic information while doing the opposite for negative counterparts.

　　贡献：

1) We re-examine existing GCL approaches (e.g., DGI and MVGRL), and we introduce a novel and efficient self-supervised GRL paradigm, namely, Group Discrimination (GD).
2) Based on GD, we propose a new self-supervised GRL model, GGD, which is fast in training and convergence, and possess high scalability.
3) We conduct extensive experiments on eight datasets, including an extremely large dataset, ogbn-papers100M with billion edges.

2 Rethinking Representative GCL Methods

　　本节以经典的 DGI 、MVGRL 为例子，说明了互信息最大化并不是对比学习的贡献因素，而是一个新的范式，群体歧视（group discrimination）。

2.1 Rethinking GCL Methods

　　回顾一下 DGI ：

　　代码：

class DGI(nn.Module):

    def __init__(self, g, in_feats, n_hidden, n_layers, activation, dropout):

        super(DGI, self).__init__()

        self.encoder = Encoder(g, in_feats, n_hidden, n_layers, activation, dropout)

        self.discriminator = Discriminator(n_hidden)

        self.loss = nn.BCEWithLogitsLoss()

    def forward(self, features):

        positive = self.encoder(features, corrupt=False)

        negative = self.encoder(features, corrupt=True)

        summary = torch.sigmoid(positive.mean(dim=0))

        positive = self.discriminator(positive, summary)

        negative = self.discriminator(negative, summary)

        l1 = self.loss(positive, torch.ones_like(positive))

        l2 = self.loss(negative, torch.zeros_like(negative))

        return l1 + l2

　　本文研究 DGI 结论：一个 Sigmoid 函数不适用于权重被 Xavier 初始化的 GNN 生成的 summary vector，且 summary vector 中的元素非常接近于相同的值。

　　接着尝试将 Summary vector 中的数值变换成不同的常量（from 0 to 1）：

　　结论：

- 将 summary vector 中的数值变成 0，求解相似度时导致所有的 score 变成 0，也就是 postive 项的损失函数变成负无穷，无法优化；
- summary vector 设置其他值，导致数值不稳定；

　　DGI 的简化：

　　① 将 summary vector 设置为单位向量（缩放对损失不影响）；

　　② 去掉 Discriminator （Bilinear ：先做线性变换，再求内积相似度）的权重向量；【双线性层的 $W$ 其实就是一个线性变换层】

　　　　$\begin{aligned}\mathcal{L}_{D G I} &=\frac{1}{2 N}\left(\sum\limits _{i=1}^{N} \log \mathcal{D}\left(\mathbf{h}_{i}, \mathbf{s}\right)+\log \left(1-\mathcal{D}\left(\tilde{\mathbf{h}}_{i}, \mathbf{s}\right)\right)\right) \\&\left.=\frac{1}{2 N}\left(\sum\limits_{i=1}^{N} \log \left(\mathbf{h}_{i} \cdot \mathbf{s}\right)+\log \left(1-\tilde{\mathbf{h}}_{i} \cdot \mathbf{s}\right)\right)\right) \\&=\frac{1}{2 N}\left(\sum\limits_{i=1}^{N} \log \left(\operatorname{sum}\left(\mathbf{h}_{i}\right)\right)+\log \left(1-\operatorname{sum}\left(\tilde{\mathbf{h}}_{i}\right)\right)\right)\end{aligned} \quad\quad\quad(1)$

　　Bilinear ：

　　　　$\mathcal{D}\left(\mathbf{h}_{i}, \mathbf{s}\right)=\sigma_{s i g}\left(\mathbf{h}_{i} \cdot \mathbf{W} \cdot \mathbf{s}\right)\quad\quad\quad(2)$

　　实验：替换 $\text{Eq.1}$ 中的 aggregation function ，即 sum 函数

　　替换形式为：

　　　　$\mathcal{L}_{B C E}=-\frac{1}{2 N}\left(\sum\limits _{i=1}^{2 N} y_{i} \log \hat{y}_{i}+\left(1-y_{i}\right) \log \left(1-\hat{y}_{i}\right)\right)\quad\quad\quad(3)$

　　其中，$\hat{y}_{i}=\operatorname{agg}\left(\mathbf{h}_{i}\right)$ ，$y_{i} \in \mathbb{R}^{1 \times 1}$ ，$\hat{y}_{i} \in \mathbb{R}^{1 \times 1}$。论文中阐述 $y_{i}$ 和 $\hat{y}_{i}$ 分别代表 node $i$ 是否是 postive sample ，及其预测输出。Q ：当 aggregation function 采用 $\text{mean}$ 的时候，对于 postive sample $i$ ，$\hat{y}_{i}$ 值会趋于 $1$ 么？

　　DGI 真正所做的是区分正确拓扑生成的一组节点和损坏拓扑生成的节点，如 Figure 1 所示。可以这么理解，DGI 是使用一个固定的向量 $s$ 去区分两组节点嵌入矩阵（postive and negative）。

　　为解决上述 GD 的问题，本文将考虑使用 $\text{Eq.3}$ 去替换 DGI 中的损失函数。替换的好处：节省显存和加快计算速度，对于精度没啥改变，说的天花乱坠。

　　Note：方差大的稍微大一点的 method ，就是容易被诋毁。

　　Group Discrimination 定义：GRL method，将不同组别的节点划分给不同的组，对于 postive pair 和 negative pair 分别划分到 "1" 组和 "0" 组。

3 Methodology

　　整体框架：

　　组成部分：

- Siamese Network ：模仿 MVGRL 的架构；
- Data Augmentation ：提供相似意义信息，带来的是时间成本；【dropout edge、feature mask】
- Loss function : $\text{Eq.3}$；

　　模型推断：

　　首先：固定 GNN encoder、MLP predict 的参数，获得初步的节点表示 $\mathbf{H}_{\theta}$；

　　其次：MVGRL 多视图对比工作给本文深刻的启发，所以考虑引入全局信息：$ \mathbf{H}_{\theta}^{\text {global }}=\mathbf{A}^{n} \mathbf{H}_{\theta}$；

　　最后：得到局部表示和全局表示的聚合 $\mathbf{H}=\mathbf{H}_{\theta}^{\text {global }}+\mathbf{H}_{\theta}$ ；

4 Experiments

4.1 Datasets

4.2 Result

节点分类

训练时间和内存消耗

4.3 Evaluating on Large-scale datasets

5 Future Work

　　 For example, can we extend the current binary Group Discrimination scheme (i.e., classifying nodes generated with different topology) to discrimination among multiple groups?

论文解读（GGD）《Rethinking and Scaling Up Graph Contrastive Learning: An Extremely Efficient Approach with Group Discrimination》的更多相关文章

论文解读（MLGCL）《Multi-Level Graph Contrastive Learning》
论文信息论文标题:Structural and Semantic Contrastive Learning for Self-supervised Node Representation Learn ...
论文解读（GRACE）《Deep Graph Contrastive Representation Learning》
Paper Information 论文标题:Deep Graph Contrastive Representation Learning论文作者:Yanqiao Zhu, Yichen Xu, Fe ...
论文解读（GCC）《GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training》
论文信息论文标题:GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training论文作者:Jiezhong Qiu, Qibi ...
论文解读（GCA）《Graph Contrastive Learning with Adaptive Augmentation》
论文信息论文标题:Graph Contrastive Learning with Adaptive Augmentation论文作者:Yanqiao Zhu.Yichen Xu3.Feng Yu4. ...
论文解读（GROC）《Towards Robust Graph Contrastive Learning》
论文信息论文标题:Towards Robust Graph Contrastive Learning论文作者:Nikola Jovanović, Zhao Meng, Lukas Faber, Ro ...
论文解读（SimGRACE）《SimGRACE: A Simple Framework for Graph Contrastive Learning without Data Augmentation》
论文信息论文标题:SimGRACE: A Simple Framework for Graph Contrastive Learning without Data Augmentation论文作者: ...
论文解读（AGC）《Attributed Graph Clustering via Adaptive Graph Convolution》
论文信息论文标题:Attributed Graph Clustering via Adaptive Graph Convolution论文作者:Xiaotong Zhang, Han Liu, Qi ...
论文解读（SelfGNN）《Self-supervised Graph Neural Networks without explicit negative sampling》
论文信息论文标题:Self-supervised Graph Neural Networks without explicit negative sampling论文作者:Zekarias T. K ...
论文解读《Momentum Contrast for Unsupervised Visual Representation Learning》俗称 MoCo
论文题目:<Momentum Contrast for Unsupervised Visual Representation Learning> 论文作者: Kaiming He.Haoq ...

随机推荐

SpringBoot定时任务 - 开箱即用分布式任务框架xxl-job
除了前文介绍的ElasticJob,xxl-job在很多中小公司有着应用(虽然其代码和设计等质量并不太高,License不够开放,有着个人主义色彩,但是其具体开箱使用的便捷性和功能相对完善性,这是中小 ...
万答#6，MySQL最多只能用到128个逻辑CPU，是真的吗
GreatSQL社区原创内容未经授权不得随意使用,转载请联系小编并注明来源. 江湖传言MySQL最多只能用到128个逻辑CPU,是真的吗? 同事从客户现场回来,委屈巴巴的说,某PG服务商告诉客户&qu ...
Apache Hudi vs Delta Lake：透明TPC-DS Lakehouse性能基准
1. 介绍最近几周,人们对比较 Hudi.Delta 和 Iceberg 的表现越来越感兴趣. 我们认为社区应该得到更透明和可重复的分析. 我们想就如何执行和呈现这些基准.它们带来什么价值以及我们应 ...
源码解析springbatch的job是如何运行的？
202208-源码解析springbatch的job是如何运行的? 注,本文中的demo代码节选于图书<Spring Batch批处理框架>的配套源代码,并做并适配springboot升级 ...
UnifyRemoteManager-多国语言绿色版v1.3-20200315，统一远程连接自动登录软件，欢迎测试
UnifyRemoteManager-多国语言绿色版v1.3-20200315,统一远程连接自动登录软件,欢迎测试下载参考: 百度网盘:https://pan.baidu.com/s/15g-oXT ...
day19--Java集合02
Java集合02 6.ArrayList ArrayList的注意事项: Permits all element , including null ,ArrayList 可以加入null ,并且可以加 ...
它把RabbitMQ的复杂全屏蔽了，我朋友用它后被老板一夜提拔为.NET架构师
本文技术源自外企,并已在多个世界500强大型项目开发中运用. 本文适合有初/中级.NET知识的同学阅读.(支持.NET/.NET Framework/.NET Core) RabbitMQ作为一款主流 ...
Web 前端实战：Gitee 贡献图
前言这次要做的 Web 前端实战是一个 Gitee 个人主页下的贡献图(在线 Demo),偶尔做一两个,熟悉熟悉 JS 以及 jQ.整体来说这个案例并不难,主要是控制第一个节点以及最后一个节点处于星 ...
z—libirary最新地址获取，zlibirary地址获取方式,zliabary最新地址,zliabary官网登录方式,zliabary最新登陆
Z-Library(缩写为z-lib,以前称为BookFinder)是Library Genesis的镜像,一个影子图书馆项目,用于对学术期刊文章.学术文本和大众感兴趣的书籍(其中一些是盗版的)进行文 ...
【c#】仅1600行代码 2D魔方游戏源码-纯WinForm
想起以前高三的时候写过一个很无脑的程序,那个时候.net5.0都还没影儿呢,,现在分享一下.一个平面展开的魔方游戏. 这个是1.0版本,有些许bug. 比如左边的格子操作不了. 「2d cube.ex ...

论文解读（GGD）《Rethinking and Scaling Up Graph Contrastive Learning: An Extremely Efficient Approach with Group Discrimination》