Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1412.2306
Main Points:
- An Alignment Model: Convolutional Neural Networks over image regions ( An image -> RCNN -> Top 19 detected locations in addition to the whole image -> the representations based on the pixels Ib inside each bounding box -> a set of h-dimensional vectors {vi | i = 1 ... 20} ), Bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding ( CNN - Structured Objective - BiRNN ).
- A Multimodal Recurrent Neural Network architecture: On the image side, Convolutional Neural Networks ( CNNs ) have recently emerged as a powerful class of models for image classification and object detection. On the sentence side, our work takes advantage of pretrained word vectors to obtain low-dimensional representations of words. Finally, Recurrent Neural Networks have been previously used in language modeling, but we additionally condition these models on images.
- Authors use bidirectional recurrent neural network to compute word representations in the sentence, dispensing of the need to compute dependency trees and allowing unbounded interactions of words and their context in the sentence.


Other Key Points:
- The primary challenge towards generating descriptions of images is in the design of a model that is rich enough to simultaneously reason about contents of images and their representation in the domain of natural language. Additionally, the model should be free of assumptions about specific hard-coded templates, rules or categories and instead rely on learning from the training data. The second, practical challenge is that datasets of image captions are available in large quantities on the internet, but these descriptions multiplex mentions of several entities whose locations in the images are unknown.
Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )的更多相关文章
- Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...
- Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...
- Deep Visual-Semantic Alignments for Generating Image Descriptions(深度视觉-语义对应对于生成图像描述)
https://cs.stanford.edu/people/karpathy/deepimagesent/ Abstract We present a model that generates na ...
- Paper Reading:Deep Neural Networks for YouTube Recommendations
论文:Deep Neural Networks for YouTube Recommendations 发表时间:2016 发表作者:(Google)Paul Covington, Jay Adams ...
- Paper Reading:Deep Neural Networks for Object Detection
发表时间:2013 发表作者:(Google)Szegedy C, Toshev A, Erhan D 发表刊物/会议:Advances in Neural Information Processin ...
- 论文笔记:Visual Semantic Navigation Using Scene Priors
Visual Semantic Navigation Using Scene Priors 2018-10-21 19:39:26 Paper: https://arxiv.org/pdf/1810 ...
- Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
- 论文笔记:Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association
Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language ...
- 论文:利用深度强化学习模型定位新物体(VISUAL SEMANTIC NAVIGATION USING SCENE PRIORS)
这是一篇被ICLR 2019 接收的论文.论文讨论了如何利用场景先验知识 (scene priors)来定位一个新场景(novel scene)中未曾见过的物体(unseen objects).举例来 ...
随机推荐
- 前端基础-HTML的的标签详解
阅读目录 一.head内常用标签 二. HTML语义化 三. 字符实体 四. h系列标签 五. p标签 六. img标签 七. a标签 八. 列表标签 九. table标签 十. form标签 一. ...
- log4j与logback包冲突原因及解决,不可忽视的Warning
场景 一个简单的spring-boot程序,需要用kafka做消息队列,于是在maven中引入kafka依赖,一切看似没问题,在启动时,打印出Warning信息: SLF4J: Class path ...
- MySQL->处理重复数据[20180517]
限制数据重复的方式:表上增加主键(Primary Key)或增加唯一性索引(Unique) 主键对重复资料进行限制,这样资料在导入时就无法重复插入 create table primary_t ...
- MySQL:如何导入导出数据表和如何清空有外建关联的数据表
1.导入导出 导入数据库:前提:数据库和数据表要存在(已经被创建) (1)将数据表 test_user.sql 导入到test 数据库的test_user 表中 [root@test ~]# mysq ...
- 最新学习springboot 配置注解
一.概述 Spring Boot设计目的是用来简化新Spring应用的初始搭建以及开发过程.Spring Boot并不是对Spring功能上的增强,而是提供了一种快速使用Spring的方式. ...
- ElasticSearch优化系列五:机器设置(硬盘、CPU)
硬盘对集群非常重要,特别是建索引多的情况.磁盘是一个服务器最慢的系统,对于写比较重的集群,磁盘很容易成为集群的瓶颈. 如果可以承担的器SSD盘,最好使用SSD盘.如果使用SSD,最好调整I/O调度算法 ...
- Python学习 :socket基础
socket基础 什么是socket? - socket为接口通道,内部封装了IP地址.端口.协议等信息:我们可以看作是以前的通过电话机拨号上网的年代,socket即为电话线 socket通信流程 我 ...
- golang 错误处理与异常
原文地址 golang 中的错误处理的哲学和 C 语言一样,函数通过返回错误类型(error)或者 bool 类型(不需要区分多种错误状态时)表明函数的执行结果,调用检查返回的错误类型值是否是 nil ...
- python正则表达式03--字符串中匹配数字
import re # \d+ 匹配字符串中的数字部分,返回列表 ss = 'adafasw12314egrdf5236qew' num = re.findall('\d+',ss) print(nu ...
- (转)ASP.NET中常见文件类型及用途
从入门导师那继承来的习惯,也是加上自己的所谓经验判断,一直对WEB开发不太感冒,可惜呀,从业近二十年,还得从头开始对付HTML.CSS.JS.ASPX,以前的经验,用不上啦!!!先从好好学习ASPX开 ...