Link of the Paper: https://arxiv.org/abs/1411.4555

Main Points:

  1. A generative model ( NIC, GoogLeNet + LSTM ) based on a deep recurrent architecture: the model is trained to maximize the likelihoodP(S|I) of the target description sentence given the training image I. S = { S1, S2, ... } is the target sequence of words and each word St comes from a given dictionary, that describes the image adequately.
  2. The authors use a CNN as an image "encoder", by first pre-training it for an image classification task and using the last hidden layer as an input to the RNN decoder that generates sentences. They call this model the Neural Image Caption, or NIC.

  

Other Key Points:

  1. A description must capture not only the objects contained in an image, but it also must express how these objects relate to each other as well as their attributes and the activities they are involved in.
  2. The inspiration of Image Captioning could come from advances in Machine Translation.
  3. There are multiple approaches that can be used to generate a sentence given an image, with NIC. The first one is Sampling where the authors just sample the first word according to p1, then provide the corresponding embedding as input and sample p2, continuing like this until we sample the special end-of-sentence token or some maximum length. The second one is Beamsearch: iteratively consider the set of the k best sentences up to time t as candidates to generate sentences of size t+1, and keep only the resulting best k of them. This better approximates S = arg maxS' p(S'|I).

Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )的更多相关文章

  1. [Paper Reading] Show and Tell: A Neural Image Caption Generator

    论文链接:https://arxiv.org/pdf/1411.4555.pdf 代码链接:https://github.com/karpathy/neuraltalk & https://g ...

  2. Paper Reading - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ( ICML 2015 )

    Link of the Paper: https://arxiv.org/pdf/1502.03044.pdf Main Points: Encoder-Decoder Framework: Enco ...

  3. [Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

    论文链接:https://arxiv.org/pdf/1502.03044.pdf 代码链接:https://github.com/kelvinxu/arctic-captions & htt ...

  4. Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )

    Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Rec ...

  5. [Paper Reading] Image Captioning using Deep Neural Architectures (arXiv: 1801.05568v1)

    Main Contributions: A brief introduction about two different methods (retrieval based method and gen ...

  6. Paper Reading - Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge

    Link of the Paper: https://arxiv.org/abs/1609.06647 A Correlative Paper: Show and Tell: A Neural Ima ...

  7. 论文:Show and Tell: A Neural Image Caption Generator-阅读总结

    Show and Tell: A Neural Image Caption Generator-阅读总结 笔记不能简单的抄写文中的内容,得有自己的思考和理解. 一.基本信息 标题 作者 作者单位 发表 ...

  8. Paper Reading: Stereo DSO

    开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...

  9. CVPR 2016 paper reading (6)

    1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fi ...

随机推荐

  1. .gitignore设置不生效

    .gitignore git中,如果想要让git忽略某些文件,或不想push到远程库,不让其受版本的控制.可以使用git提供的.gitignore文件进行配置.像这样: 一般情况下,在文件还未修改前, ...

  2. SVG图形的简单修改

    svg格式的图片是一种矢量图片,最近我就喜欢使用这种图片在做html的元素.网上也有很多现成的svg图片,比如:http://www.sfont.cn这个网站,就能很快的找到各种您想要的图片.但是下载 ...

  3. 位图索引对于DML操作的影响

    位图索引相对于常规的B-tree 索引,有着体积更加小的优势,节省空间.对于重复率特别高的字段,比如性别,比如省份.查询效率要优于B-tree 索引.那为什么我们总被告知在业务库中不要使用呢? 业务库 ...

  4. Oracle 表空间、段、区和块简述

    数据块(Block) 数据块Block是Oracle存储数据信息的最小单位.注意,这里说的是Oracle环境下的最小单位.Oracle也就是通过数据块来屏蔽不同操作系统存储结构的差异.无论是Windo ...

  5. git 设置只输入一次用户名和密码

    https方式每次都要输入密码,非常不爽 按照如下设置可只输入一次 记住密码(默认15分钟): git config --global credential.helper cache 自己定义时间(一 ...

  6. XML的序列化用法 vs平台开发

    protected void Page_Load(object sender, EventArgs e) { if(!IsPostBack) { NewMethod(); } } #region 读取 ...

  7. javascript对象定义及创建

    javascript对象 定义 javascript中的对象,可以理解成是一个键值对的集合,键是调用每个值的名称,值可以是基本变量,还可以是函数和对象. 创建方法 第一种方法 通过顶级Object类来 ...

  8. transform动画的一个3D的正方体盒子

    <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8&quo ...

  9. 4. HTML表单标签

    表单是网页中最常见的元素,也是用户和我们交互的重要手段,在网站中的登录.注册.信息更新这些功能都是依赖表单实现的.在HTML中对于表单提供了一系列的标签,即输入框.下拉框.按钮.文本域,如下是一个最常 ...

  10. mongodb的docker化安装

    查询mongo镜像 docker search mongo 拉取镜像(拉取STARS最多的那个就可以了) docker pull mongo tips:如果拉取不成功,多pull几次就可以了. 使用自 ...