Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )
Link of the Paper: https://ieeexplore.ieee.org/document/7298856/
A Correlative Paper: Learning a Recurrent Visual Representation for Image Caption Generation (Link of the Paper: https://arxiv.org/abs/1411.5654)
Main Points:
- A bi-directional mapping model using recurrent neural networks: unlike previous approaches which map both sentences and images to a common embedding ( and then calculate the similarity and match / generate, I guess ) that may be used for image search or for ranking image captions.
- A bi-directional representation: generates both novel descriptions from images and visual representations from descriptions.
- A novel recurrent visual memory: automatically learns to remember long-term visual concepts.
- A set of latent variables Ut-1 that encodes the visual interpretation of the previously generated or read words Wt-1. Using U, our goal is to compute P(wt | V, Wt-1, Ut-1) and P(V | Wt-1, Ut-1). Combining these two likelihoods together our global objective is to maximize, P(wt, V | Wt-1, Ut-1) = P(wt | V, Wt-1, Ut-1)P(V | Wt-1, Ut-1). That is, we want to maximize the likelihood of the word wt and the observed visual features V given the previous words and their visual interpretation. Note that in previous papers, the objective was only to compute P(wt | V, Wt-1) and not P(V | Wt-1).
Other Key Points:
- Previous approaches project both semantics and visual features to a common embedding, they are not able to perform the inverse projection. That is, they cannot generate novel sentences or visual depictions from the embedding.
Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )的更多相关文章
- Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...
- Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...
- Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
- Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning
Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN fra ...
- Paper Reading: In Defense of the Triplet Loss for Person Re-Identification
In Defense of the Triplet Loss for Person Re-Identification 2017-07-02 14:04:20 This blog comes ...
- CVPR 2016 paper reading (6)
1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fi ...
- 论文笔记:Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association
Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language ...
- 论文笔记:Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 2018-08-10 10:15:06 Pap ...
- 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction
Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...
随机推荐
- 前端调用接口得到的数据跟postman跑出来的数据里数字部份不相等
昨天碰到这样一个场景,调用后端接口返回的数据发现所有数据都是正常的,只有一个商品ID的最后两位是错的,每一个商品都是,导致无法进行商品的上下架和删除, 经过查资料发现: 浏览器解析数字的坑,一旦超出一 ...
- Linux启动流程(CentOS6)
内核级别: (POST)BIOS加电自检-->(Boot Sequence)从BIOS中读取启动顺序-->读取MBR中的bootloader-->加载内核-->读取伪根--&g ...
- 浅淡 RxJS WebSocket
const open$ = new Subject(); const ws = webSocket({ url: 'wss://echo.websocket.org', openObserver: o ...
- 用file标签实现多图文件上传预览
效果图: js 代码: <script> //下面用于多图片上传预览功能 function setImagePreviews(avalue) { var docObj = document ...
- 直流电机驱动,TIMER口配置
电机的电压输出能力和频率有关??? 修改前:------------------------------------------------------------------------------ ...
- ubuntu配置机器学习环境(二) cuda 和cudnn 安装
Nvidia CUDA Toolkit的安装(cuda) PS:特别推荐*.deb的方法,目前已提供离线版的deb文件,该方法比较简单,不需要切换到tty模式,因此不再提供原来的*.run安装方法,这 ...
- Hadoop HA高可用集群搭建(2.7.2)
1.集群规划: 主机名 IP 安装的软件 执行的进程 drguo1 192.168.80.149 j ...
- 20155327 2016-2017-2 《Java程序设计》第一周学习总结
20155327 2016-2017-2 <Java程序设计>第一周学习总结 教材学习内容总结 浏览教材,根据自己的理解每章提出一个问题 1.JAVA SE中JVM,JRE与JDK分别是什 ...
- 每天一个linux命令(1):ln 命令
每天一个linux命令(35):ln 命令 ln 是linux中又一个非常重要命令,它的功能是为某一个文件在另外一个位置建立一个同步的链接.当我们需要在不同的目录,用到相同的文件时,我们不需要在 每一 ...
- su的使用与退出
偶尔用回到ubuntu系统,想切换到su,总是显示不成功,也许是初次使用,即需要设定一下: 使用sudo $:sudo passwd 系统提示输入密码,即安装时的用户密码,然后,系统提示输入两次新密码 ...