Link of the Paper: https://ieeexplore.ieee.org/document/7298856/

A Correlative Paper: Learning a Recurrent Visual Representation for Image Caption Generation (Link of the Paper: https://arxiv.org/abs/1411.5654)

Main Points:

  1. A bi-directional mapping model using recurrent neural networks: unlike previous approaches which map both sentences and images to a common embedding ( and then calculate the similarity and match / generate, I guess ) that may be used for image search or for ranking image captions.
  2. A bi-directional representation: generates both novel descriptions from images and visual representations from descriptions.
  3. A novel recurrent visual memory: automatically learns to remember long-term visual concepts.
  4. A set of latent variables Ut-1 that encodes the visual interpretation of the previously generated or read words Wt-1. Using U, our goal is to compute P(wt | V, Wt-1, Ut-1) and P(V | Wt-1, Ut-1). Combining these two likelihoods together our global objective is to maximize, P(wt, V | Wt-1, Ut-1) = P(wt | V, Wt-1, Ut-1)P(V | Wt-1, Ut-1). That is, we want to maximize the likelihood of the word wt and the observed visual features V given the previous words and their visual interpretation. Note that in previous papers, the objective was only to compute P(wt | V, Wt-1) and not P(V | Wt-1).

Other Key Points:

  1. Previous approaches project both semantics and visual features to a common embedding, they are not able to perform the inverse projection. That is, they cannot generate novel sentences or visual depictions from the embedding.

Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )的更多相关文章

  1. Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★

    Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...

  2. Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )

    Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...

  3. Paper Reading: Stereo DSO

    开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...

  4. Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning

    Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN fra ...

  5. Paper Reading: In Defense of the Triplet Loss for Person Re-Identification

    In Defense of the Triplet Loss for Person Re-Identification  2017-07-02  14:04:20   This blog comes ...

  6. CVPR 2016 paper reading (6)

    1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fi ...

  7. 论文笔记:Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association

    Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language ...

  8. 论文笔记:Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

    Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 2018-08-10 10:15:06 Pap ...

  9. 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction

    Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...

随机推荐

  1. Java并发编程(十)死锁

    哲学家进餐问题 并发执行带来的最棘手的问题莫过于死锁了,死锁问题中最经典的案例就是哲学家进餐问题:5个哲学家坐在一个桌子上,桌子上有5根筷子,每个哲学家的左手边和右手边各有一根筷子.示意图如下: 哲学 ...

  2. Innodb和Mysiam引擎的区别

    一:区别 Mysiam: 1.是非事务安全型. 2.是表级锁. 3.如果执行大量的select,Mysiam是更好的选择. 4.select count(*)from table.Mysiam只简单的 ...

  3. 001_02-python基础习题答案

    python 基础习题 执行 Python 脚本的两种方式 如:脚本/python/test.py 第一种方式:python /python/test.py 第二中方式:在test.py中声明:/us ...

  4. 接口测试jemeter使用

    使用jemeter5时要先添加环境变量,需要有JDK1.8及以上版本支持.这里主要对接口测试做一些说明. 以上就是常见的设置问题.在window上我们通常是不需要改动配置文件的,如果要在生产上执行测试 ...

  5. Ruby中Enumerable模块的一些实用方法

    我在查看 Array 类和 Hash 类的祖先链的时候都发现了 Enumerable,说明这两个类都mixin了Enumerable模块.Enumerable模块为集合型类提供了遍历.检索.排序等方法 ...

  6. 自定义udf添加一列

    //创建得分窗口字典 var dict= new mutable.HashMap[Double, Int]() ){ dict.put(result_Score(i),i) } //自定义Udf函数 ...

  7. latex常用符号

    希腊字母 字母名称 大写 小写 大写latex 小写latex alpha A \(\alpha\) \alpha beta B \(\beta\) \beta gamma \(\Gamma\) \( ...

  8. 20155209 2016-2017-2 《Java程序设计》第十周学习总结

    20155209 2016-2017-2 <Java程序设计>第十周学习总结 教材学习内容总结 计算机网络,是指分布在不同地理区域的计算机用通信线路互连起来的一个具有强大功能的网络系统.网 ...

  9. 20155218 2006-2007-2 《Java程序设计》第2周学习总结

    20155218 2006-2007-2 <Java程序设计>第2周学习总结 教材学习内容总结 java编程风格: java中没有指针的概念,尽管也有数组和对象的引用的概念,但他的管理全部 ...

  10. 20155323 第三次实验 敏捷开发与XP实践

    20155323 第三次实验 敏捷开发与XP实践 实验内容 XP基础 XP核心实践 相关工具 实验要求 没有Linux基础的同学建议先学习<Linux基础入门(新版)><Vim编辑器 ...