Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )

Link of the Paper: https://arxiv.org/abs/1411.4555

Main Points:

A generative model ( NIC, GoogLeNet + LSTM ) based on a deep recurrent architecture: the model is trained to maximize the likelihoodP(S|I) of the target description sentence given the training image I. S = { S₁, S₂, ... } is the target sequence of words and each word S_t comes from a given dictionary, that describes the image adequately.
The authors use a CNN as an image "encoder", by first pre-training it for an image classification task and using the last hidden layer as an input to the RNN decoder that generates sentences. They call this model the Neural Image Caption, or NIC.

Other Key Points:

A description must capture not only the objects contained in an image, but it also must express how these objects relate to each other as well as their attributes and the activities they are involved in.
The inspiration of Image Captioning could come from advances in Machine Translation.
There are multiple approaches that can be used to generate a sentence given an image, with NIC. The first one is Sampling where the authors just sample the first word according to p₁, then provide the corresponding embedding as input and sample p₂, continuing like this until we sample the special end-of-sentence token or some maximum length. The second one is Beamsearch: iteratively consider the set of the k best sentences up to time t as candidates to generate sentences of size t+1, and keep only the resulting best k of them. This better approximates S = arg max_S'p(S'|I).

Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )的更多相关文章

[Paper Reading] Show and Tell: A Neural Image Caption Generator
论文链接:https://arxiv.org/pdf/1411.4555.pdf 代码链接:https://github.com/karpathy/neuraltalk & https://g ...
Paper Reading - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ( ICML 2015 )
Link of the Paper: https://arxiv.org/pdf/1502.03044.pdf Main Points: Encoder-Decoder Framework: Enco ...
[Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
论文链接:https://arxiv.org/pdf/1502.03044.pdf 代码链接:https://github.com/kelvinxu/arctic-captions & htt ...
Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )
Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Rec ...
[Paper Reading] Image Captioning using Deep Neural Architectures (arXiv: 1801.05568v1)
Main Contributions: A brief introduction about two different methods (retrieval based method and gen ...
Paper Reading - Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge
Link of the Paper: https://arxiv.org/abs/1609.06647 A Correlative Paper: Show and Tell: A Neural Ima ...
论文：Show and Tell: A Neural Image Caption Generator-阅读总结
Show and Tell: A Neural Image Caption Generator-阅读总结笔记不能简单的抄写文中的内容,得有自己的思考和理解. 一.基本信息标题作者作者单位发表 ...
Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
CVPR 2016 paper reading (6)
1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fi ...

随机推荐

oracle 子查询的几个种类
1.where型子查询: select cat_id,good_id,good_name from goods where good_id in (selct max(good_id) from go ...
这次的PION的总结
这次的PION的总结果然不出所料,才$129$分. 同级的巨佬们$170,180,\color {red}{280}$$\small{wc这什么神仙啊QAQ}$,都比我强那我还有什么可 ...
『C++』Temp_2019_03_14 不同类的循环引用
#include <iostream> #include <string> #include <regex> using namespace std; //提前声明 ...
10JavaScript作用域
(作用域可访问变量的集合) 1.JavaScript 作用域在 JavaScript 中, 对象和函数同样也是变量. 在 JavaScript 中, 作用域为可访问变量,对象,函数的集合. Java ...
$.trim() 去除空格方法 (验证使用)
PHP保存数组到数据库
数组是 PHP 开发中使用最多的数据类型之一,对于结构化的数据尤为重要. 很多时候我们需要把数组保存到数据库中,实现对结构化数据的直接存储和读取. 其中一个案例就是,对于 Form 提交的多选 che ...
hive工作记录-20180513
Hive的数据导入: 1.从本地文件系统中导入数据到Hive表基础语法1 : create table 表名(列名1 数据类型, 列名2 数据类型, … …) row format delimite ...
jenkins+maven+docker集成java发布(二)#远程发布
jenkins+maven+docker集成java发布(一)中写了在Jenkins服务器自动部署业务,那需要将java项目部署到其他服务器怎么操作这里需要依赖插件Publish Over SSH ...
爬虫常用的 urllib 库知识点
urllib 库 urllib 库是 Python 中一个最基本的网络请求库.它可以模仿浏览器的行为向指定的服务器发送请求,同时可以保存服务器返回的数据. urlopen() 在 Python3 的 ...
20155209林虹宇虚拟机的安装及一点Linux的学习
预备作业3 虚拟机的安装首先,我先了解了一下Linux和安装虚拟机的有关常识. Linux:Linux是一套免费使用和自由传播的类Unix操作系统,是一个基于POSIX和UNIX的多用户.多任务.支 ...

Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )

Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )的更多相关文章

随机推荐

热门专题