Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1412.2306
Main Points:
- An Alignment Model: Convolutional Neural Networks over image regions ( An image -> RCNN -> Top 19 detected locations in addition to the whole image -> the representations based on the pixels Ib inside each bounding box -> a set of h-dimensional vectors {vi | i = 1 ... 20} ), Bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding ( CNN - Structured Objective - BiRNN ).
- A Multimodal Recurrent Neural Network architecture: On the image side, Convolutional Neural Networks ( CNNs ) have recently emerged as a powerful class of models for image classification and object detection. On the sentence side, our work takes advantage of pretrained word vectors to obtain low-dimensional representations of words. Finally, Recurrent Neural Networks have been previously used in language modeling, but we additionally condition these models on images.
- Authors use bidirectional recurrent neural network to compute word representations in the sentence, dispensing of the need to compute dependency trees and allowing unbounded interactions of words and their context in the sentence.


Other Key Points:
- The primary challenge towards generating descriptions of images is in the design of a model that is rich enough to simultaneously reason about contents of images and their representation in the domain of natural language. Additionally, the model should be free of assumptions about specific hard-coded templates, rules or categories and instead rely on learning from the training data. The second, practical challenge is that datasets of image captions are available in large quantities on the internet, but these descriptions multiplex mentions of several entities whose locations in the images are unknown.
Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )的更多相关文章
- Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...
- Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...
- Deep Visual-Semantic Alignments for Generating Image Descriptions(深度视觉-语义对应对于生成图像描述)
https://cs.stanford.edu/people/karpathy/deepimagesent/ Abstract We present a model that generates na ...
- Paper Reading:Deep Neural Networks for YouTube Recommendations
论文:Deep Neural Networks for YouTube Recommendations 发表时间:2016 发表作者:(Google)Paul Covington, Jay Adams ...
- Paper Reading:Deep Neural Networks for Object Detection
发表时间:2013 发表作者:(Google)Szegedy C, Toshev A, Erhan D 发表刊物/会议:Advances in Neural Information Processin ...
- 论文笔记:Visual Semantic Navigation Using Scene Priors
Visual Semantic Navigation Using Scene Priors 2018-10-21 19:39:26 Paper: https://arxiv.org/pdf/1810 ...
- Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
- 论文笔记:Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association
Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language ...
- 论文:利用深度强化学习模型定位新物体(VISUAL SEMANTIC NAVIGATION USING SCENE PRIORS)
这是一篇被ICLR 2019 接收的论文.论文讨论了如何利用场景先验知识 (scene priors)来定位一个新场景(novel scene)中未曾见过的物体(unseen objects).举例来 ...
随机推荐
- JBDC—③数据库连接池的介绍、使用和配置
首先要知道数据库连接(Connection对象)的创建和关闭是非常浪费系统资源的,如果是使用常规的数据库连接方式来操作数据库,当用户变多时,每次访问数据库都要创建大量的Connnection对象,使用 ...
- source .bashrc 报错:virtualenvwrapper.sh: There was a problem running the initialization hooks.
在Ubuntu下安装完virtualenv.virtualenvwrapper,然后设置环境文件 .bashrc 接着 source .bashrc,产生错误信息 首先确认了 libpam-mount ...
- MySQL的库表详细操作
MySQL数据库 本节目录 一 库操作 二 表操作 三 行操作 一 库操作 1.创建数据库 1.1 语法 CREATE DATABASE 数据库名 charset utf8; 1.2 数据库命名规则 ...
- js清除浏览器缓存
浏览器缓存 所有的数据都可以存到服务器中,但这样并不高效,当我们访问网页的时候,一会卡顿,二会浪费服务器的存储空间,三会给服务器造成压力 浏览器缓存,可以提高网站性能和浏览器的速度,但对于需要经常更新 ...
- html网站meta标签大全
案例 一.天猫 <meta charset="utf-8"> <title>天猫TMALL</title> <meta name=&quo ...
- 课时14.DTD文档声明上(掌握)
1.什么是DTD文档声明? 由于HTML有很多格版本的规范,每个版本的规范之间又又一些差异,所以为了让浏览器能够正确的编译/解析/渲染我们的网页,我们需要在HTML文件的第一行告诉浏览器,我们当前这个 ...
- php面试题,百度答案
一公司: 1.@当将其放置在一个PHP表达式之前有什么作用? 2.用foreach把$arr=array(1,2,3,4)每个values值乘2输出: 3.PHP定界符如何使用? 4.说出mysql_ ...
- linux3.4.2内核之块设备驱动
1. 基本概念: 扇区(Sectors):任何块设备硬件对数据处理的基本单位.通常,1个扇区的大小为512byte.(对设备而言) 块 (Blocks):由Linux制定对内核或文件系统等数据处理的 ...
- C语言编程练习 GPS数据处理
题目内容: NMEA-0183协议是为了在不同的GPS(全球定位系统)导航设备中建立统一的BTCM(海事无线电技术委员会)标准,由美国国家海洋电子协会(NMEA-The National Marine ...
- 旭日图(sunburst chart)绘制:R语言 & excel
旭日图(sunburst chart)也叫太阳图,一种圆环镶接图,每一个圆环就代表了同一级别的比例数据,离原点越近的圆环级别越高,最内层的圆表示层次结构的顶级.除了圆环外,旭日图还有若干从原点放射出去 ...