Link of the Paper: https://arxiv.org/abs/1411.4555

Main Points:

  1. A generative model ( NIC, GoogLeNet + LSTM ) based on a deep recurrent architecture: the model is trained to maximize the likelihoodP(S|I) of the target description sentence given the training image I. S = { S1, S2, ... } is the target sequence of words and each word St comes from a given dictionary, that describes the image adequately.
  2. The authors use a CNN as an image "encoder", by first pre-training it for an image classification task and using the last hidden layer as an input to the RNN decoder that generates sentences. They call this model the Neural Image Caption, or NIC.

  

Other Key Points:

  1. A description must capture not only the objects contained in an image, but it also must express how these objects relate to each other as well as their attributes and the activities they are involved in.
  2. The inspiration of Image Captioning could come from advances in Machine Translation.
  3. There are multiple approaches that can be used to generate a sentence given an image, with NIC. The first one is Sampling where the authors just sample the first word according to p1, then provide the corresponding embedding as input and sample p2, continuing like this until we sample the special end-of-sentence token or some maximum length. The second one is Beamsearch: iteratively consider the set of the k best sentences up to time t as candidates to generate sentences of size t+1, and keep only the resulting best k of them. This better approximates S = arg maxS' p(S'|I).

Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )的更多相关文章

  1. [Paper Reading] Show and Tell: A Neural Image Caption Generator

    论文链接:https://arxiv.org/pdf/1411.4555.pdf 代码链接:https://github.com/karpathy/neuraltalk & https://g ...

  2. Paper Reading - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ( ICML 2015 )

    Link of the Paper: https://arxiv.org/pdf/1502.03044.pdf Main Points: Encoder-Decoder Framework: Enco ...

  3. [Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

    论文链接:https://arxiv.org/pdf/1502.03044.pdf 代码链接:https://github.com/kelvinxu/arctic-captions & htt ...

  4. Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )

    Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Rec ...

  5. [Paper Reading] Image Captioning using Deep Neural Architectures (arXiv: 1801.05568v1)

    Main Contributions: A brief introduction about two different methods (retrieval based method and gen ...

  6. Paper Reading - Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge

    Link of the Paper: https://arxiv.org/abs/1609.06647 A Correlative Paper: Show and Tell: A Neural Ima ...

  7. 论文:Show and Tell: A Neural Image Caption Generator-阅读总结

    Show and Tell: A Neural Image Caption Generator-阅读总结 笔记不能简单的抄写文中的内容,得有自己的思考和理解. 一.基本信息 标题 作者 作者单位 发表 ...

  8. Paper Reading: Stereo DSO

    开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...

  9. CVPR 2016 paper reading (6)

    1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fi ...

随机推荐

  1. oracle 子查询的几个种类

    1.where型子查询: select cat_id,good_id,good_name from goods where good_id in (selct max(good_id) from go ...

  2. 这次的PION的总结

    这次的PION的总结 果然不出所料,才\(129\)分. 同级的巨佬们\(170,180,\color {red}{280}\)\(\small{wc这什么神仙啊QAQ}\),都比我强 那我还有什么可 ...

  3. 『C++』Temp_2019_03_14 不同类的循环引用

    #include <iostream> #include <string> #include <regex> using namespace std; //提前声明 ...

  4. 10JavaScript作用域

    (作用域可访问变量的集合) 1.JavaScript 作用域 在 JavaScript 中, 对象和函数同样也是变量. 在 JavaScript 中, 作用域为可访问变量,对象,函数的集合. Java ...

  5. $.trim() 去除空格方法 (验证使用)

  6. PHP保存数组到数据库

    数组是 PHP 开发中使用最多的数据类型之一,对于结构化的数据尤为重要. 很多时候我们需要把数组保存到数据库中,实现对结构化数据的直接存储和读取. 其中一个案例就是,对于 Form 提交的多选 che ...

  7. hive工作记录-20180513

    Hive的数据导入: 1.从本地文件系统中导入数据到Hive表 基础语法1 : create table 表名(列名1 数据类型, 列名2 数据类型, … …) row format delimite ...

  8. jenkins+maven+docker集成java发布(二)#远程发布

    jenkins+maven+docker集成java发布(一)中写了在Jenkins服务器自动部署业务,那需要将java项目部署到其他服务器怎么操作 这里需要依赖插件Publish Over SSH ...

  9. 爬虫常用的 urllib 库知识点

    urllib 库 urllib 库是 Python 中一个最基本的网络请求库.它可以模仿浏览器的行为向指定的服务器发送请求,同时可以保存服务器返回的数据. urlopen() 在 Python3 的 ...

  10. 20155209林虹宇虚拟机的安装及一点Linux的学习

    预备作业3 虚拟机的安装 首先,我先了解了一下Linux和安装虚拟机的有关常识. Linux:Linux是一套免费使用和自由传播的类Unix操作系统,是一个基于POSIX和UNIX的多用户.多任务.支 ...