Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1412.2306
Main Points:
- An Alignment Model: Convolutional Neural Networks over image regions ( An image -> RCNN -> Top 19 detected locations in addition to the whole image -> the representations based on the pixels Ib inside each bounding box -> a set of h-dimensional vectors {vi | i = 1 ... 20} ), Bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding ( CNN - Structured Objective - BiRNN ).
- A Multimodal Recurrent Neural Network architecture: On the image side, Convolutional Neural Networks ( CNNs ) have recently emerged as a powerful class of models for image classification and object detection. On the sentence side, our work takes advantage of pretrained word vectors to obtain low-dimensional representations of words. Finally, Recurrent Neural Networks have been previously used in language modeling, but we additionally condition these models on images.
- Authors use bidirectional recurrent neural network to compute word representations in the sentence, dispensing of the need to compute dependency trees and allowing unbounded interactions of words and their context in the sentence.
Other Key Points:
- The primary challenge towards generating descriptions of images is in the design of a model that is rich enough to simultaneously reason about contents of images and their representation in the domain of natural language. Additionally, the model should be free of assumptions about specific hard-coded templates, rules or categories and instead rely on learning from the training data. The second, practical challenge is that datasets of image captions are available in large quantities on the internet, but these descriptions multiplex mentions of several entities whose locations in the images are unknown.
Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )的更多相关文章
- Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...
- Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...
- Deep Visual-Semantic Alignments for Generating Image Descriptions(深度视觉-语义对应对于生成图像描述)
https://cs.stanford.edu/people/karpathy/deepimagesent/ Abstract We present a model that generates na ...
- Paper Reading:Deep Neural Networks for YouTube Recommendations
论文:Deep Neural Networks for YouTube Recommendations 发表时间:2016 发表作者:(Google)Paul Covington, Jay Adams ...
- Paper Reading:Deep Neural Networks for Object Detection
发表时间:2013 发表作者:(Google)Szegedy C, Toshev A, Erhan D 发表刊物/会议:Advances in Neural Information Processin ...
- 论文笔记:Visual Semantic Navigation Using Scene Priors
Visual Semantic Navigation Using Scene Priors 2018-10-21 19:39:26 Paper: https://arxiv.org/pdf/1810 ...
- Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
- 论文笔记:Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association
Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language ...
- 论文:利用深度强化学习模型定位新物体(VISUAL SEMANTIC NAVIGATION USING SCENE PRIORS)
这是一篇被ICLR 2019 接收的论文.论文讨论了如何利用场景先验知识 (scene priors)来定位一个新场景(novel scene)中未曾见过的物体(unseen objects).举例来 ...
随机推荐
- Template Method(模板方法)模式
1.概述 在面向对象开发过程中,通常我们会遇到这样的一个问题:我们知道一个算法所需的关键步骤,并确定了这些步骤的执行顺序.但是某些步骤的具体实现是未知的,或者说某些步骤的实现与具体的环境相关.例子1: ...
- [leetcode] 二叉树的前序,中序,后续,层次遍历
前序遍历 [144] Binary Tree Preorder Traversal 递归遍历 使用递归,先保存父节点的值,再对左子树进行遍历(递归),最后对右子树进行遍历(递归) vector< ...
- ionic ios 打包
1.安装Xcode 从appstore 安装就行 2.安装node.js 3.安装cordova 由于权限问题 网络问题 可以考虑一下方式 1️⃣使用淘宝镜像 npm install ...
- json数组按照日期先后排序
var allMyApp = [ {"startDate": "2018-07-07 12:30:00",'name':'aa'}, {"startD ...
- ElasticSearch优化系列七:优化建议
尽量运行在Sun/Oracle JDK1.7以上环境中,低版本的jdk容易出现莫名的bug,ES性能体现在在分布式计算中,一个节点是不足以测试出其性能,一个生产系统至少在三个节点以上. ES集群节点规 ...
- 一步一步学习大数据:Hadoop 生态系统与场景
Hadoop概要 到底是业务推动了技术的发展,还是技术推动了业务的发展,这个话题放在什么时候都会惹来一些争议. 随着互联网以及物联网的蓬勃发展,我们进入了大数据时代.IDC预测,到2020年,全球会有 ...
- ACM1013:Digital Roots
Problem Description The digital root of a positive integer is found by summing the digits of the int ...
- sqlserver之on与where条件
在进行两个表乃至多个表进行联接时需要on条件进行匹配,很多时候我们会对过滤条件放在on还是where中心存疑惑.一般来讲,在外联接中on是两个表进行关联的匹配条件,在该条件匹配下会生成一个虚拟表. 如 ...
- 2014年第五届蓝桥杯B组(C/C++)预赛题目及个人答案(欢迎指正)
参考:https://blog.csdn.net/qq_30076791/article/details/50573512 第3题: #include<bits/stdc++.h> usi ...
- [Err] ERROR: wrong record type supplied in RETURN NEXT
在写GP 输出不定长列数据表 函数时,报了一个错,百思不得其解.在公司大佬帮助下,知道是什么鬼了.. 先看看例子吧: ---- 函数定义 CREATE OR REPLACE FUNCTION &quo ...