Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )

【Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )】的更多相关文章

Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )

Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Recurrent Visual Representation for Image Caption Generation (Link of the Paper: https://arxiv.org/abs/1411.5654) Main Points: A bi-directional mapping m…

Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★

Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal Recurrent Neural Networks ( AlexNet/VGGNet + a multimodal layer + RNNs ). Their work has two major differences from these methods. Firstly, they inco…

Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )

Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet + LSTM ) based on a deep recurrent architecture: the model is trained to maximize the likelihoodP(S|I) of the target description sentence given the tr…

Paper Reading: Stereo DSO

开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse Visual Odometry with Stereo Cameras Abstract Optimization objectives: intrinsic/extrinsic parameters of all keyframes all selected pixels' depth Inte…

Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning

Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN framework for image captioning. There are four modules in the framework: vision module ( VGG-16 ), which is adopted to "watch" images; language modu…

Paper Reading: In Defense of the Triplet Loss for Person Re-Identification

In Defense of the Triplet Loss for Person Re-Identification 2017-07-02 14:04:20 This blog comes from: http://blog.csdn.net/shuzfan/article/details/70069822 Paper: https://arxiv.org/abs/1703.07737 Github: https://github.com/VisualComputingInstitu…

CVPR 2016 paper reading (6)

1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fidler, Francesc Moreno-Noguer, Raquel Urtasun, in CVPR 2015. Goal: learn and predict how fashionable a person looks on a photograph, and suggest subtle…

论文笔记：Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association

Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association2018-09-29 19:36:43 Paper:http://openaccess.thecvf.com/content_ECCV_2018/papers/Dapeng_Chen_Improving_Deep_Visual_ECCV_2018_paper.pdf 1. I…

论文笔记：Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 2018-08-10 10:15:06 Paper (ICML-2015):http://proceedings.mlr.press/v37/xuc15.pdf Theano (Offical Implementation): https://github.com/kelvinxu/arctic-captions TensorFlow: htt…

【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction

Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on unsupervised learning model from Prof. Gupta's group. Link: http://120.52.73.9/www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsu…

Momentum Contrast for Unsupervised Visual Representation Learning (MoCo)

Momentum Contrast for Unsupervised Visual Representation Learning 一.Methods Previously Proposed 1. End-to-end Mechanisms 方法简介:对于每个mini-batch中的 image 进行增强,每一张图片经过增强处理都得到两张图片q 和 $ k_+ $, 这两张互为正样本.采用两个不同的 encoder 分别对 q和 dictionary中的keys(包含q对应的正样本 $ k_+…

Momentum Contrast for Unsupervised Visual Representation Learning

Momentum Contrast for Unsupervised Visual Representation Learning 一.Methods Previously Proposed 1. End-to-end Mechanisms 方法简介:对于每个mini-batch中的 image 进行增强,每一张图片经过增强处理都得到两张图片q 和 $ k_+ $, 这两张互为正样本.采用两个不同的 encoder 分别对 q和 dictionary中的keys(包含q对应的正样本 $ k_+…

论文：Show, Attend and Tell: Neural Image Caption Generation with Visual Attention-阅读总结

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention-阅读总结笔记不能简单的抄写文中的内容,得有自己的思考和理解. 一.基本信息 **\1.标题:**Show, Attend and Tell: Neural Image Caption Generation with Visual Attention **\2.作者:**Kelvin Xu,Jimmy Lei Ba,Ryan Kiros,Kyu…

论文解读《Momentum Contrast for Unsupervised Visual Representation Learning》俗称 MoCo

论文题目:<Momentum Contrast for Unsupervised Visual Representation Learning> 论文作者: Kaiming He.Haoqi Fan. Yuxin Wu. Saining Xie. Ross Girshick 论文来源:arXiv 论文来源:https://github.com/facebookresearch/moco 1 主要思想文章核心思想是使用基于 Contrastive learning 的方式自监督的训练一个图片表…

Paper Reading - Long-term Recurrent Convolutional Networks for Visual Recognition and Description ( CVPR 2015 )

Link of the Paper: https://arxiv.org/abs/1411.4389 Main Points: A novel Recurrent Convolutional Architecture ( CNN + LSTM ): both Spatially and Temporally Deep. The recurrent long-term models are directly connected to modern visual convnet models and…

Paper Reading - Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images ( ICCV 2015 )

Link of the Paper: https://arxiv.org/pdf/1504.06692.pdf Innovations: The authors propose the Novel Visual Concept learning from Sentences ( NVCS ) task. In this task, methods need to learn novel concepts from sentence descriptions of a few images. Th…

Paper Reading - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ( ICML 2015 )

Link of the Paper: https://arxiv.org/pdf/1502.03044.pdf Main Points: Encoder-Decoder Framework: Encoder uses a convolutional neural network to extract a set of feature vectors which the authors refer to as annotation vectors. The extractor produces L…

[Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

论文链接:https://arxiv.org/pdf/1502.03044.pdf 代码链接:https://github.com/kelvinxu/arctic-captions & https://github.com/yunjey/show-attend-and-tell & https://github.com/jazzsaxmafia/show_attend_and_tell.tensorflow 主要贡献在这篇文章中,作者将“注意力机制(Attention Mechanism…

Paper Reading - Attention Is All You Need ( NIPS 2017 ) ★

Link of the Paper: https://arxiv.org/abs/1706.03762 Motivation: The inherently sequential nature of Recurrent Models precludes parallelization within training examples. Attention mechanisms have become an integral part of compelling sequence modeling…

Paper Reading - Convolutional Sequence to Sequence Learning ( CoRR 2017 ) ★

Link of the Paper: https://arxiv.org/abs/1705.03122 Motivation: Compared to recurrent layers, convolutions create representations for fixed size contexts, however, the effective context size of the network can easily be made larger by stacking severa…

Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )

Link of the Paper: https://arxiv.org/abs/1412.2306 Main Points: An Alignment Model: Convolutional Neural Networks over image regions ( An image -> RCNN -> Top 19 detected locations in addition to the whole image -> the representations based on th…

Paper Reading - Im2Text: Describing Images Using 1 Million Captioned Photographs ( NIPS 2011 )

Link of the Paper: http://papers.nips.cc/paper/4470-im2text-describing-images-using-1-million-captioned-photographs.pdf Main Points: A large novel data set containing images from the web with associated captions written by people, filtered so that th…

CVPR 2016 paper reading (2)

1. Sketch me that shoe, Qian Yu, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales, Cheng Change Loy, in CVPR 2016. A unique characteristic of sketches in the context of image retrieval is that they offer inherently fine-grained visual descript…

Paper Reading - Learning to Evaluate Image Captioning ( CVPR 2018 ) ★

Link of the Paper: https://arxiv.org/abs/1806.06422 Innovations: The authors propose a novel learning based discriminative evaluation metric that is directly trained to distinguish between human and machine-generated captions. They train an automatic…

Paper Reading - Convolutional Image Captioning ( CVPR 2018 )

Link of the Paper: https://arxiv.org/abs/1711.09151 Motivation: LSTM units are complex and inherently sequential across time. Convolutional networks have shown advantages on machine translation and conditional image generation. Innovation: The author…