Learning Latent Graph Representations for Relational VQA
The key mechanism of transformer-based models is cross-attentions, which implicitly form graphs over tokens and act as diffusion operators to facilitate information propagation through the graph for question-answering that requires some reasoning over the scene.
基于transformer的模型的关键机制是交叉关注,交叉关注在tokens上隐式地形成图,并充当扩散操作符,以促进信息通过图传播,用于需要对场景进行一些推理的问答。
We reinterpret and reformulate the transformer-based model to explicitly construct latent graphs over tokens and thereby support improved performance for answering visual questions about relations between objects.
我们重新解释和表述基于transformer的模型,以显式地在tokens上构造潜在图,从而支持改进性能,以回答关于对象之间关系的可视化问题。
Coincidentally, transformer-based language encoders can not only take advantage of the tokenization trend but also are intrinsically built for information fusion and alignments due to its core self-attention mechanism.
巧合的是,基于transformer的语言编码器不仅可以利用标记化趋势,而且由于其核心的自我注意机制,其本质上是为信息融合和对齐而构建的。
基于transformer的VQA系统的这种成功表明了两个见解的有效性:图像标记化,以及文本标记和图像标记之间的成对标记交互。
我们观察到成对的tokens交互共同形成了一个图,并且遍历这个图形成了一种推理,这可能是对这些基于transformer的模型的推理能力声明的解释
we reinterpret transformer-based VQA systems as graph convolutions,
We show that our model benefits from its latent graph representations
To the best of our knowledge, current transformer-based models cannot benefit from graph information, and there have not been work on taking advantage of scene graphs or graph representations in general for VQA.
In our model, the goal is to learn to generate a latent graph representation and then perform node classification on the resulting heterogeneous graph.
A typical task for a GCN is node classification, as GCN is capable of learning node representations from a given static homogeneous graph.
Graph Transformer Networks (GTN) are a model for handling heterogeneous graphs, graphs with various types of edges, as well as generating new graphs.
如何利用场景图scene graph和图表示,并利用transformer机制的图卷积,提供VQA。

Learning Latent Graph Representations for Relational VQA的更多相关文章
- 论文解读(GMT)《Accurate Learning of Graph Representations with Graph Multiset Pooling》
论文信息 论文标题:Accurate Learning of Graph Representations with Graph Multiset Pooling论文作者:Jinheon Baek, M ...
- 论文解读(GraRep)《GraRep: Learning Graph Representations with Global Structural Information》
论文题目:<GraRep: Learning Graph Representations with Global Structural Information>发表时间: CIKM论文作 ...
- 论文解读(LG2AR)《Learning Graph Augmentations to Learn Graph Representations》
论文信息 论文标题:Learning Graph Augmentations to Learn Graph Representations论文作者:Kaveh Hassani, Amir Hosein ...
- Learning Conditioned Graph Structures for Interpretable Visual Question Answering
Learning Conditioned Graph Structures for Interpretable Visual Question Answering 2019-05-29 00:29:4 ...
- 论文解读(DeepWalk)《DeepWalk: Online Learning of Social Representations》
一.基本信息 论文题目:<DeepWalk: Online Learning of Social Representations>发表时间: KDD 2014论文作者: Bryan P ...
- 论文解读( N2N)《Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximization》
论文信息 论文标题:Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximiz ...
- 【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs
Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...
- 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos
Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...
- 论文笔记之:Learning Cross-Modal Deep Representations for Robust Pedestrian Detection
Learning Cross-Modal Deep Representations for Robust Pedestrian Detection 2017-04-11 19:40:22 Moti ...
随机推荐
- 【Azure 环境】使用Microsoft Graph PS SDK 登录到中国区Azure, 命令Connect-MgGraph -Environment China xxxxxxxxx 遇见登录错误
问题描述 通过PowerShell 连接到Microsoft Graph 中国区Azure,一直出现AADSTS700016错误, 消息显示 the specific application was ...
- 02稀疏数组(java版本)
1 package com.aixuexi.contact; 2 3 public class SpareArray { 4 public static void main(String[] args ...
- python学习Day21
目录 今日内容详细 作业讲解 os模块 知识点进修 创建目录(文件夹) 删除目录(文件夹) 查看某个路径下所有的文件名称(文件.文件夹) 删除文件.重命名文件 获取当前路径.切换路径 软件开发目录规范 ...
- [AcWing 36] 合并两个排序的链表
点击查看代码 /** * Definition for singly-linked list. * struct ListNode { * int val; * ListNode *next; * L ...
- 【mq】从零开始实现 mq-03-引入 broker 中间人
前景回顾 [mq]从零开始实现 mq-01-生产者.消费者启动 [mq]从零开始实现 mq-02-如何实现生产者调用消费者? [mq]从零开始实现 mq-03-引入 broker 中间人 上一节我们学 ...
- 搭建PWN学习环境
环境清单 系统环境 Ubuntu22.04 编写脚本 pwntools ZIO 调试 IDA PRO gdb pwndbg ROP工具 checksec ROPgadget one_gadget Li ...
- UniApp文件上传(SpringBoot+Minio)
UniApp文件上传(SpringBoot+Minio) 一.Uni文件上传 (1).文件上传的问题 UniApp文件上传文档 uni.uploadFile({ url: 'https://www.e ...
- DevOps、CI、CD都是什么鬼?
关注「开源Linux」,选择"设为星标" 回复「学习」,有我为您特别筛选的学习资料~ DevOps DevOps是Development和Operations的组合,是一种方法论, ...
- Hbase数据库安装部署
Hbase单机版安装 hbase介绍 HBase – Hadoop Database是一个分布式的.面向列的开源数据库,该技术来源于Chang et al所撰写的Google论文"Bigta ...
- @ConfigurationProperties(prefix = "server-options") 抛出 SpringBoot Configuration Annotation Processor not configured 错误
说明 spring-boot-configuration-processor 包的作用是自动生成 META-INF/spring-configuration-metadata.json 文件,而这个 ...