Link of the Paper: https://arxiv.org/abs/1412.2306

Main Points:

  1. An Alignment Model: Convolutional Neural Networks over image regions ( An image -> RCNN -> Top 19 detected locations in addition to the whole image -> the representations based on the pixels Ib inside each bounding box -> a set of h-dimensional vectors {vi | i = 1 ... 20} ), Bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding ( CNN - Structured Objective - BiRNN ).
  2. A Multimodal Recurrent Neural Network architecture: On the image side, Convolutional Neural Networks ( CNNs ) have recently emerged as a powerful class of models for image classification and object detection. On the sentence side, our work takes advantage of pretrained word vectors to obtain low-dimensional representations of words. Finally, Recurrent Neural Networks have been previously used in language modeling, but we additionally condition these models on images.
  3. Authors use bidirectional recurrent neural network to compute word representations in the sentence, dispensing of the need to compute dependency trees and allowing unbounded interactions of words and their context in the sentence.

Other Key Points:

  1. The primary challenge towards generating descriptions of images is in the design of a model that is rich enough to simultaneously reason about contents of images and their representation in the domain of natural language. Additionally, the model should be free of assumptions about specific hard-coded templates, rules or categories and instead rely on learning from the training data. The second, practical challenge is that datasets of image captions are available in large quantities on the internet, but these descriptions multiplex mentions of several entities whose locations in the images are unknown.

Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )的更多相关文章

  1. Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★

    Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...

  2. Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )

    Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...

  3. Deep Visual-Semantic Alignments for Generating Image Descriptions(深度视觉-语义对应对于生成图像描述)

    https://cs.stanford.edu/people/karpathy/deepimagesent/ Abstract We present a model that generates na ...

  4. Paper Reading:Deep Neural Networks for YouTube Recommendations

    论文:Deep Neural Networks for YouTube Recommendations 发表时间:2016 发表作者:(Google)Paul Covington, Jay Adams ...

  5. Paper Reading:Deep Neural Networks for Object Detection

    发表时间:2013 发表作者:(Google)Szegedy C, Toshev A, Erhan D 发表刊物/会议:Advances in Neural Information Processin ...

  6. 论文笔记:Visual Semantic Navigation Using Scene Priors

    Visual Semantic Navigation Using Scene Priors 2018-10-21 19:39:26 Paper:  https://arxiv.org/pdf/1810 ...

  7. Paper Reading: Stereo DSO

    开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...

  8. 论文笔记:Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association

    Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language ...

  9. 论文:利用深度强化学习模型定位新物体(VISUAL SEMANTIC NAVIGATION USING SCENE PRIORS)

    这是一篇被ICLR 2019 接收的论文.论文讨论了如何利用场景先验知识 (scene priors)来定位一个新场景(novel scene)中未曾见过的物体(unseen objects).举例来 ...

随机推荐

  1. ORACLE逐行累计求和方法(OVER函数)

    1.RANK ( ) OVER ( [QUERY_PARTITION_CLAUSE] ORDER_BY_CLAUSE ) DENSE_RANK ( ) OVER ( [QUERY_PARTITION_ ...

  2. C++笔记015:C++对C的扩展——三目运算符功能增强

    原创笔记,转载请注明出处! 点击[关注],关注也是一种美德~ 三目运算符在C编译器中的表现: int main() { int a=10; int b=20; //三目运算符是一个表达式,表达式不能做 ...

  3. 大数据学习--day16(集合总体架构--ArrayList--LinkedList)

    集合总体架构--ArrayList--LinkedList Collection接口的实现类用法上都有相似的方法.Map同理. List: 特性 :      1. 有索引     2. 有序     ...

  4. 飞控入门之C语言结构体、枚举

    结构体 先来说明一下,结构体存在的意义.比如说有一只猫,要在C语言程序中综合描述它,那么可以这样说,它的体重是float类型的,颜色是char类型的,它的一些食物名字是一个数组,那么如果分开定义这些变 ...

  5. s3c2440存储控制器详解

    从上图可知,外部内存类的设备与存储管理器相连,那么CPU是怎样访问到内存的呢?通过存储管理器.CPU比较单纯,只会按照指令执行,CPU只负责发出地址,怎样找到内存类设备呢?这些都交给存储管理器来管理. ...

  6. 《阿里巴巴Java开发手册》代码格式部分应用——idea中checkstyle的使用教程

    <阿里巴巴Java开发手册>代码格式部分应用--idea中checkstyle的使用教程 1.<阿里巴巴Java开发手册> 这是阿里巴巴工程师送给各位软件工程师的宝典,就像开车 ...

  7. 20155311 实验三 敏捷开发与XP实践 实验报告

    20155311 实验三 敏捷开发与XP实践 实验报告 实验内容 XP基础 xp核心工具 相关工具 实验要求 没有Linux基础的同学建议先学习<Linux基础入门(新版)><Vim ...

  8. day6 RHCE

    17.创建一个脚本 在server0上创建一个名为/root/foo.sh的脚本,让其提供下列特性: 当运行/root/foo.sh redhat,输出fedora 当运行/root/foo.sh f ...

  9. 【BZOJ4803】逆欧拉函数

    [BZOJ4803]逆欧拉函数 题面 bzoj 题解 题目是给定你\(\varphi(n)\)要求前\(k\)小的\(n\). 设\(n=\prod_{i=1}^k{p_i}^{c_i}\) 则\(\ ...

  10. 在Centos7下安装与部署.net core

    在Centos7下安装与部署.net core 2018年02月28日 19:36:16 阅读数:388 个人安装流程,参照文档 https://www.cnblogs.com/Burt/p/6566 ...