[Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
论文链接:https://arxiv.org/pdf/1502.03044.pdf
代码链接:https://github.com/kelvinxu/arctic-captions & https://github.com/yunjey/show-attend-and-tell & https://github.com/jazzsaxmafia/show_attend_and_tell.tensorflow
主要贡献
在这篇文章中,作者将“注意力机制(Attention Mechanism)”引入了神经机器翻译(Neural Image Captioning)领域,提出了两种不同的注意力机制:‘Soft’ Deterministic Attention Mechanism & ‘Hard’ Stochastic Attention Mechanism。下图展示了"Show, Attend and Tell"模型的整体框架。
注意力机制的关键点在于,如何从图像的特征向量ai中计算得到上下文向量zt。对于每一个位置i,注意力机制能够产生一个权重eti。在Hard Attention机制中,权重αti所扮演的角色是图像区域向量ai在t时刻被选中作为解码器的信息的概率,有且只有一个区域会被选中,为此,引入变量st,i,当区域i被选中时为1,否则为0;在Soft Attention机制中,权重αti所扮演的角色是图像区域向量ai在t时刻输入解码器的信息中所占的比例。(参考Attention机制论文阅读——Soft和Hard Attention,Multimodal —— 看图说话(Image Caption)任务的论文笔记(二)引入attention机制)
实验细节
- 在文章中,作者提出使用在ImageNet数据集上预训练好、不进行微调的VGGNet提取图像特征,将block5_conv4(Conv2D)提取到的feature map(14×14×512)reshape为196×512(L×D,L=196,D=512,即196个图像区域,每个区域特征向量的维度是512)的图像区域向量ai。
To create the annotations ai used by our decoder, we used the Oxford VGGnet pretrained on ImageNet without finetuning.
In our experiments we use the 14×14×512 feature map of the fourth convolutional layer before max pooling. This means our decoder operates on the flattened 196×512 (i.e L × D) encoding.
- 在文章中,作者指出,解码器LSTM初始的细胞状态(init_c)与隐层状态(init_h)由从图像中提取到的特征向量及两个独立的多层感知机(Multi-Layer Perception, MLP)决定。
The initial memory state and hidden state of the LSTM are predicted by an average of the annotation vectors fed through two separate MLPs(init,c and init,h).
版权声明:本文为博主原创文章,欢迎转载,转载请注明作者及原文出处!
[Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention的更多相关文章
- Paper Reading - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ( ICML 2015 )
Link of the Paper: https://arxiv.org/pdf/1502.03044.pdf Main Points: Encoder-Decoder Framework: Enco ...
- 论文笔记:Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 2018-08-10 10:15:06 Pap ...
- 论文:Show, Attend and Tell: Neural Image Caption Generation with Visual Attention-阅读总结
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention-阅读总结 笔记不能简单的抄写文中的内容,得有自 ...
- Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...
- [Paper Reading] Show and Tell: A Neural Image Caption Generator
论文链接:https://arxiv.org/pdf/1411.4555.pdf 代码链接:https://github.com/karpathy/neuraltalk & https://g ...
- [Paper Reading] Image Captioning using Deep Neural Architectures (arXiv: 1801.05568v1)
Main Contributions: A brief introduction about two different methods (retrieval based method and gen ...
- Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning
Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN fra ...
- Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
- Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )
Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Rec ...
随机推荐
- tomcat web的URL解析(web.xml)
1.一个tomcat可以配置多个host: 2.一个host可以包含多个应用:context: 3.一个应用可以包含多个servlet:servlet-path; 4.一个servlet可以包含多个r ...
- BZOJ 2095 [Poi2010]Bridges (二分+最大流判断混合图的欧拉回路)
题面 nnn个点,mmm条双向边(正向与反向权值不同),求经过最大边权最小的欧拉回路的权值 分析 见 commonc大佬博客 精髓就是通过最大流调整无向边的方向使得所有点的入度等于出度 CODE #i ...
- BZOJ 2281: [Sdoi2011]黑白棋 (Nim游戏+dp计数)
题意 这题目有一点问题,应该是在n个格子里有k个棋子,k是偶数.从左到右一白一黑间隔出现.有两个人不妨叫做小白和小黑.两个人轮流操作,每个人可以选 1~d 枚自己颜色的棋子,如果是白色则只能向右移动, ...
- mysql 1040 连接数太多 mysql Error 1040 too many connection解决办法
近在用SpringMVC开发的时候,突然出现1040 too many connection的错误,看错误的意思是连接的人数太多了.百度经验:jingyan.baidu.com 方法/步骤 1 当 ...
- Jenkins 自动化构建
def pipeId = 1130561944231279390 def pipeLogId def isTagOrBranch def tagOrBranch def imageId def add ...
- CF796D Police Stations BFS+染色
题意:给定一棵树,树上有一些点是警察局,要求所有点到最近的警察局的距离不大于 $d$,求最多能删几条边 ? 题解: 考虑什么时候一条边可以被断开:这条边的两个端点被两个不同的警察局覆盖掉. 我们要设计 ...
- leetcode解题报告(6):Remove Duplicates from Sorted List
描述 Given a sorted linked list, delete all duplicates such that each element appear only once. For ex ...
- 把字符串当做js代码执行的方法
字符串还能当做javascript代码来执行?你能想到哪些方法? 1.setInterval("要执行的字符串",500);window对象的方法既可以传字符串,也可以传函数.该函 ...
- linux安装maven简易步骤
版本要求maven3.6.1 软件下载 wget http://mirror.bit.edu.cn/apache/maven/maven-3/3.6.1/binaries/apache-maven-3 ...
- 连接Android模拟器
一.如何找到adb? 安装夜神安卓模拟器后,电脑桌面会有“夜神模拟器”的启动图标,鼠标右键--打开文件所在的位置,就会进入***\Nox\bin,默认路径是C:\Program Files (x ...