[Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

论文链接：https://arxiv.org/pdf/1502.03044.pdf

代码链接：https://github.com/kelvinxu/arctic-captions & https://github.com/yunjey/show-attend-and-tell & https://github.com/jazzsaxmafia/show_attend_and_tell.tensorflow

主要贡献

在这篇文章中，作者将“注意力机制（Attention Mechanism）”引入了神经机器翻译（Neural Image Captioning）领域，提出了两种不同的注意力机制：‘Soft’ Deterministic Attention Mechanism & ‘Hard’ Stochastic Attention Mechanism。下图展示了"Show, Attend and Tell"模型的整体框架。

注意力机制的关键点在于，如何从图像的特征向量a_i中计算得到上下文向量z_t。对于每一个位置i，注意力机制能够产生一个权重e_ti。在Hard Attention机制中，权重α_ti所扮演的角色是图像区域向量a_i在t时刻被选中作为解码器的信息的概率，有且只有一个区域会被选中，为此，引入变量s_t,i，当区域i被选中时为1，否则为0；在Soft Attention机制中，权重α_ti所扮演的角色是图像区域向量a_i在t时刻输入解码器的信息中所占的比例。（参考Attention机制论文阅读——Soft和Hard Attention，Multimodal —— 看图说话（Image Caption）任务的论文笔记（二）引入attention机制）

实验细节

在文章中，作者提出使用在ImageNet数据集上预训练好、不进行微调的VGGNet提取图像特征，将block5_conv4（Conv2D）提取到的feature map（14×14×512）reshape为196×512（L×D，L=196，D=512，即196个图像区域，每个区域特征向量的维度是512）的图像区域向量a_i。

To create the annotations a_i used by our decoder, we used the Oxford VGGnet pretrained on ImageNet without finetuning.

In our experiments we use the 14×14×512 feature map of the fourth convolutional layer before max pooling. This means our decoder operates on the flattened 196×512 (i.e L × D) encoding.

在文章中，作者指出，解码器LSTM初始的细胞状态（init_c）与隐层状态（init_h）由从图像中提取到的特征向量及两个独立的多层感知机（Multi-Layer Perception, MLP）决定。

The initial memory state and hidden state of the LSTM are predicted by an average of the annotation vectors fed through two separate MLPs(init,c and init,h).

[Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention的更多相关文章

Paper Reading - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ( ICML 2015 )
Link of the Paper: https://arxiv.org/pdf/1502.03044.pdf Main Points: Encoder-Decoder Framework: Enco ...
论文笔记：Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 2018-08-10 10:15:06 Pap ...
论文：Show, Attend and Tell: Neural Image Caption Generation with Visual Attention-阅读总结
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention-阅读总结笔记不能简单的抄写文中的内容,得有自 ...
Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...
[Paper Reading] Show and Tell: A Neural Image Caption Generator
论文链接:https://arxiv.org/pdf/1411.4555.pdf 代码链接:https://github.com/karpathy/neuraltalk & https://g ...
[Paper Reading] Image Captioning using Deep Neural Architectures (arXiv: 1801.05568v1)
Main Contributions: A brief introduction about two different methods (retrieval based method and gen ...
Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning
Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN fra ...
Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )
Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Rec ...

随机推荐

C# DocumentCompleted事件多次条用解决方案
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) { ...
k8s 命令自动补全
yum install -y bash-completion source /usr/share/bash-completion/bash_completion source <(kubectl ...
.Net Core 过滤器
请求: public class MyRequest { [Required(ErrorMessage = "Name参数不能为空")]//Required 验证这个参数不能为空 ...
Group by,并汇总求和
static void Main(string[] args) { bbb(); Console.ReadKey(); } public static List<Dto> toAdd() ...
Maratona Brasileira de Popcorn（二分答案+暴力）
题意:输入三个数n,c,t . 桌子上有n堆爆米花,每一堆有ai个, 现在有c个人一起吃爆米花,每人每分钟最多能吃t个爆米花,但有两个规定:1.一堆爆米花只能一个人吃, 2.每个人只能吃连续的若干堆爆 ...
pytest学习笔记（一）
这两天在学习pytest,之前有小用到pytest,觉得这个测试框架很灵巧,用在实现接口自动化(pytest+requests)非常的轻便,然后很有兴致的决定学习下,然后又发现了pytest-sele ...
hive安装运行hive报错通解
参考博文:https://blog.csdn.net/lsxy117/article/details/47703155 大部分问题还是hadoop的配置文件的问题: 修改配置文件hadoop/conf ...
MySQL基础之二：主从复制
# mysql主从复制逻辑: 1.从库执行start slave 开启主从复制. 2.从库请求连接到主库,并且指定binlog文件以及位置后发出请求. 3.主库收到从库请求后,将信息返回给从库,除了信 ...
puppeteer注入cookie然后访问页面
var puppeteer = require('puppeteer'); const devices = require('puppeteer/DeviceDescriptors'); const ...
pwn学习日记Day19 《程序员的自我修养》读书笔记
windows PE/COFF章总结本章学习了windows下的可执行文件和目标文件格式PE/COFF.PE/COFF文件与ELF文件非常相似,它们都是基于段的结构的二进制文件格式.Windows下 ...

[Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

[Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention的更多相关文章

随机推荐

热门专题