[Paper Reading] Show and Tell: A Neural Image Caption Generator

论文链接：https://arxiv.org/pdf/1411.4555.pdf

代码链接：https://github.com/karpathy/neuraltalk & https://github.com/karpathy/neuraltalk2 & https://github.com/zsdonghao/Image-Captioning

主要贡献

在这篇文章中，作者借鉴了神经机器翻译（Neural Machine Translation）领域的方法，将“编码器-解码器（Encoder-Decoder）”模型引入了神经图像标注（Neural Image Captioning）领域，提出了一种端到端（end-to-end）的模型解决图像标注问题。下面展示了从论文中截取的两幅图片，第一幅图片是NIC模型的概述，第二幅图片描述了网络的细节。NIC网络采用卷积神经网络（CNN）作为编码器，长短期记忆网络（LSTM）作为解码器。

实验细节

在文章中，作者提出使用在图像分类任务（Image Classification Task）中预训练好的Inception v2作为编码器，将其最后一个隐藏层提取到的特征作为解码器隐藏层的初始状态。但是，在官方给出的源码neuraltalk中，作者使用了预训练好的VGG16作为了编码器，将Layer FC-4096提取到的特征作为了LSTM隐藏层的初始状态（详见neuraltalk/py_caffe_feat_extract.py line160）。在官方给出的源码neuraltalk2中，同样使用了VGG16作为编码器提取图像特征（详见neuraltalk2/train.lua line27）。在zsdonghao对该方法的TensorFlow实现中，使用了Inception v3作为编码器（详见zsdonghao/Image-Captioning/inception_v3(for TF 0.10).py）。

Hence, it is natural to use a CNN as an image “encoder”, by first pre-training it for an image classification task and using the last hidden layer as an input to the RNN decoder that generates sentences.

An “encoder” RNN reads the source sentence and transforms it into a rich fixed-length vector representation, which in turn in used as the initial hidden state of a “decoder” RNN that generates the target sentence.

在文章中，作者提出使用随机梯度下降（Stochastic Gradient Descent）训练网络。在官方给出的源码neuraltalk2中，作者给出了多种训练网络的优化器及其参数（rmsprop，adagrad，sgd……详见neuraltalk2/misc/optim_updates.lua）。zsdonghao/Image-Captioning使用SGD训练网络，初始学习率2.0，学习率衰减因子0.5，学习率下降后每一代的数量8.0。

It is a neural net which is fully trainable using stochastic gradient descent.

在文章中，作者提出按最大似然训练模型参数。在zsdonghao/Image-Captioning中，作者使用了tensorlayer.cost.cross_entropy_seq_with_mask()（详见zsdonghao/Image-Captioning/buildmodel.py line665）。

The model is trained to maximize the likelihood of the target description sentence given the training image.

在neuraltalk2中，LSTM层的输入（Embedding层的输出）向量维度和LSTM隐藏层的向量维度均设置为512。zsdonghao/Image-Captioning的设置相同。
在zsdonghao/Image-Captioning中，作者将vocabulary_size设置为12000。

[Paper Reading] Show and Tell: A Neural Image Caption Generator的更多相关文章

Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...
Paper Reading - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ( ICML 2015 )
Link of the Paper: https://arxiv.org/pdf/1502.03044.pdf Main Points: Encoder-Decoder Framework: Enco ...
[Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
论文链接:https://arxiv.org/pdf/1502.03044.pdf 代码链接:https://github.com/kelvinxu/arctic-captions & htt ...
[Paper Reading] Image Captioning using Deep Neural Architectures (arXiv: 1801.05568v1)
Main Contributions: A brief introduction about two different methods (retrieval based method and gen ...
Paper Reading - Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge
Link of the Paper: https://arxiv.org/abs/1609.06647 A Correlative Paper: Show and Tell: A Neural Ima ...
论文：Show and Tell: A Neural Image Caption Generator-阅读总结
Show and Tell: A Neural Image Caption Generator-阅读总结笔记不能简单的抄写文中的内容,得有自己的思考和理解. 一.基本信息标题作者作者单位发表 ...
Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )
Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Rec ...
Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning
Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN fra ...

随机推荐

linux实操_硬盘
1.硬盘分区硬盘说明: 查看分区和挂载情况语法: lsblk -f lsblk 2.增加硬盘 (1)虚拟机添加硬盘 (2)分区 fdisk /dev/sdb (3)格式化 mkfs -text4 ...
RHEL7-RHCE培训系列教程，让您零基础入门Linux运维
本教程是旨在帮助那些刚入门IT行业或计划从事IT行业的初学者(包括开发人员和运维人员,以及想要在Linux系统维护上提升自己的网络管理员),0基础入门Linux运维,完整学习完成本系列课程相当于培训机 ...
duilib学习领悟(2)
再次强调,duilib只不过是一种思想! 在上一节中,我剖析了duilib中窗口类的注册,其中遗留两个小问题没有细说的? 第一个问题:过程函数中__WndProc()中有这么一小段代码: pThis ...
C语言calloc()函数：分配内存空间并初始化——stm32中的应用
经常在代码中看到使用malloc来分配,然后memset清零,其实calloc更加方便,一句顶两句~ 头文件:#include <stdlib.h> calloc() 函数用来动态地分配内 ...
ZrOJ #882. 画画
最后染成的图形一定一样的. 那么只用考虑两条路径在那些地方重合,重合的地方可以交换,这样答案就是2的重合次数次方.直接模拟就行了. qiang- CODE #include <bits/stdc ...
sql 查询 between and 和 >= <= 比较
好久没有更新博客了,积累了很多问题没有得到解决,自己也在纠结有些东西需不需要花时间研究一下,认真想了想,不管怎么样,不能停止更新博客,继续保持一周至少一篇的习惯,不能放弃. 今天说的问题比较简单,就是 ...
11、Spring Boot 2.x 集成 HBase
1.11 Spring Boot 2.x 集成 HBase 完整源码: Spring-Boot-Demos
linux mint安装mysql-8.0.16
1.使用通用二进制文件在Unix / Linux上安装MySQL 下载的文件:mysql-8.0.16-linux-glibc2.12-x86_64.tar.xz 注意: 如果您以前使用操作系统本机程 ...
MySQL中的连接、实例、会话、数据库、线程之间的关系
MySQL中的实例.数据库关系简介 1.MySQL是单进程多线程(而Oracle等是多进程),也就是说MySQL实例在系统上表现就是一个服务进程,即进程(通过多种方法可以创建多实例,再安装一个端口号 ...
Win10远程桌面报错：CredSSP加密Oracle修正……
解决方法: 运行 gpedit.msc 本地组策略: 计算机配置>管理模板>系统>凭据分配>加密Oracle修正选择启用并选择易受攻击. 参考: https://blog.c ...

[Paper Reading] Show and Tell: A Neural Image Caption Generator

[Paper Reading] Show and Tell: A Neural Image Caption Generator的更多相关文章

随机推荐

热门专题