Link of the Paper: https://arxiv.org/abs/1412.2306

Main Points:

  1. An Alignment Model: Convolutional Neural Networks over image regions ( An image -> RCNN -> Top 19 detected locations in addition to the whole image -> the representations based on the pixels Ib inside each bounding box -> a set of h-dimensional vectors {vi | i = 1 ... 20} ), Bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding ( CNN - Structured Objective - BiRNN ).
  2. A Multimodal Recurrent Neural Network architecture: On the image side, Convolutional Neural Networks ( CNNs ) have recently emerged as a powerful class of models for image classification and object detection. On the sentence side, our work takes advantage of pretrained word vectors to obtain low-dimensional representations of words. Finally, Recurrent Neural Networks have been previously used in language modeling, but we additionally condition these models on images.
  3. Authors use bidirectional recurrent neural network to compute word representations in the sentence, dispensing of the need to compute dependency trees and allowing unbounded interactions of words and their context in the sentence.

Other Key Points:

  1. The primary challenge towards generating descriptions of images is in the design of a model that is rich enough to simultaneously reason about contents of images and their representation in the domain of natural language. Additionally, the model should be free of assumptions about specific hard-coded templates, rules or categories and instead rely on learning from the training data. The second, practical challenge is that datasets of image captions are available in large quantities on the internet, but these descriptions multiplex mentions of several entities whose locations in the images are unknown.

Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )的更多相关文章

  1. Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★

    Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...

  2. Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )

    Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...

  3. Deep Visual-Semantic Alignments for Generating Image Descriptions(深度视觉-语义对应对于生成图像描述)

    https://cs.stanford.edu/people/karpathy/deepimagesent/ Abstract We present a model that generates na ...

  4. Paper Reading:Deep Neural Networks for YouTube Recommendations

    论文:Deep Neural Networks for YouTube Recommendations 发表时间:2016 发表作者:(Google)Paul Covington, Jay Adams ...

  5. Paper Reading:Deep Neural Networks for Object Detection

    发表时间:2013 发表作者:(Google)Szegedy C, Toshev A, Erhan D 发表刊物/会议:Advances in Neural Information Processin ...

  6. 论文笔记:Visual Semantic Navigation Using Scene Priors

    Visual Semantic Navigation Using Scene Priors 2018-10-21 19:39:26 Paper:  https://arxiv.org/pdf/1810 ...

  7. Paper Reading: Stereo DSO

    开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...

  8. 论文笔记:Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association

    Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language ...

  9. 论文:利用深度强化学习模型定位新物体(VISUAL SEMANTIC NAVIGATION USING SCENE PRIORS)

    这是一篇被ICLR 2019 接收的论文.论文讨论了如何利用场景先验知识 (scene priors)来定位一个新场景(novel scene)中未曾见过的物体(unseen objects).举例来 ...

随机推荐

  1. 获取某商家当前每个月销量sql语句。

    用两个mysql函数 FROM_UNIXTIME( ordertime )将日期格式转换成时间戳 month( FROM_UNIXTIME( ordertime ) ) 获取当前日期的月 select ...

  2. LeetCode41.缺失的第一个正数 JavaScript

    给定一个未排序的整数数组,找出其中没有出现的最小的正整数. 示例 1: 输入: [1,2,0] 输出: 3 示例 2: 输入: [3,4,-1,1] 输出: 2 示例 3: 输入: [7,8,9,11 ...

  3. ZooKeeper系列(2)--基于ZooKeeper实现简单的配置中心

    ZooKeeper节点的类型分为以下几类:  1. 持久节点:节点创建后就一直存在,直到有删除操作来主动删除该节点 2. 临时节点:临时节点的生命周期和创建该节点的客户端会话绑定,即如果客户端会话失效 ...

  4. Vcenter虚拟化三部曲----Vcenter server 5.5安装部署

    配置SQL Server 2008 R2 1.选择启动 SQL Server Management Studio. 2.选择SQL Server 身份验证登录 ---- 输入sa用户及密码. 3.右键 ...

  5. [译文]程序员能力矩阵 Programmer Competency Matrix

    注意:每个层次的知识都是渐增的,位于层次n,也蕴涵了你需了解所有低于层次n的知识. 计算机科学 Computer Science   2n (Level 0) n2 (Level 1) n (Leve ...

  6. iOS通过切片仿断点机制上传文件

    项目开发中,有时候我们需要将本地的文件上传到服务器,简单的几张图片还好,但是针对iPhone里面的视频文件进行上传,为了用户体验,我们有必要实现断点上传.其实也不是真的断点,这里我们只是模仿断点机制. ...

  7. MongoDB4.0在windows10下的安装与服务配置

    本地安装及网页测试 在官网下载最新的安装文件 下载地址 : https://www.mongodb.com/download-center#community 可以在MongoDB官网选择Commun ...

  8. 十个常见的Java异常出现原因

    异常是 Java 程序中经常遇到的问题,我想每一个 Java 程序员都讨厌异常,一 个异常就是一个 BUG,就要花很多时间来定位异常问题. 1.NullPointerException 空指针异常,操 ...

  9. 弄清Spark、Storm、MapReduce的这几点区别才能学好大数据

    很多初学者在刚刚接触大数据的时候会有很多疑惑,比如对MapReduce.Storm.Spark三个计算框架的理解经常会产生混乱. 哪一个适合对大量数据进行处理?哪一个又适合对实时的流数据进行处理?又该 ...

  10. C语言 编程练习22

    一.题目 1.编一个程序,输入x的值,按下列公式计算并输出y值: 2.已知数A与B,由键盘输入AB的值,交换它们的值,并输出. 3.给一个不多于5位的正整数,要求:一.求它是几位数,二.逆序打印出各位 ...