Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Recurrent Visual Representation for Image Caption Generation (Link of the Paper: https://arxiv.org/abs/1411.5654) Main Points: A bi-directional mapping m…
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal Recurrent Neural Networks ( AlexNet/VGGNet + a multimodal layer + RNNs ). Their work has two major differences from these methods. Firstly, they inco…
Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet + LSTM ) based on a deep recurrent architecture: the model is trained to maximize the likelihoodP(S|I) of the target description sentence given the tr…
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse Visual Odometry with Stereo Cameras Abstract Optimization objectives: intrinsic/extrinsic parameters of all keyframes all selected pixels' depth Inte…
Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN framework for image captioning. There are four modules in the framework: vision module ( VGG-16 ), which is adopted to "watch" images; language modu…
In Defense of the Triplet Loss for Person Re-Identification  2017-07-02  14:04:20   This blog comes from: http://blog.csdn.net/shuzfan/article/details/70069822 Paper:  https://arxiv.org/abs/1703.07737 Github: https://github.com/VisualComputingInstitu…
1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fidler, Francesc Moreno-Noguer, Raquel Urtasun, in CVPR 2015. Goal: learn and predict how fashionable a person looks on a photograph, and suggest subtle…
Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association2018-09-29 19:36:43 Paper:http://openaccess.thecvf.com/content_ECCV_2018/papers/Dapeng_Chen_Improving_Deep_Visual_ECCV_2018_paper.pdf 1. I…
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 2018-08-10 10:15:06 Paper (ICML-2015):http://proceedings.mlr.press/v37/xuc15.pdf Theano (Offical Implementation): https://github.com/kelvinxu/arctic-captions TensorFlow: htt…
Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on unsupervised learning model from Prof. Gupta's group. Link: http://120.52.73.9/www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsu…
Momentum Contrast for Unsupervised Visual Representation Learning 一.Methods Previously Proposed 1. End-to-end Mechanisms 方法简介:对于每个mini-batch中的 image 进行增强,每一张图片经过增强处理都得到两张图片q 和 $ k_+ $, 这两张互为正样本.采用两个不同的 encoder 分别对 q和 dictionary中的keys(包含q对应的正样本 $ k_+…
Momentum Contrast for Unsupervised Visual Representation Learning 一.Methods Previously Proposed 1. End-to-end Mechanisms 方法简介:对于每个mini-batch中的 image 进行增强,每一张图片经过增强处理都得到两张图片q 和 $ k_+ $, 这两张互为正样本.采用两个不同的 encoder 分别对 q和 dictionary中的keys(包含q对应的正样本 $ k_+…
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention-阅读总结 笔记不能简单的抄写文中的内容,得有自己的思考和理解. 一.基本信息 **\1.标题:**Show, Attend and Tell: Neural Image Caption Generation with Visual Attention **\2.作者:**Kelvin Xu,Jimmy Lei Ba,Ryan Kiros,Kyu…
论文题目:<Momentum Contrast for Unsupervised Visual Representation Learning> 论文作者: Kaiming He.Haoqi Fan. Yuxin Wu. Saining Xie. Ross Girshick 论文来源:arXiv 论文来源:https://github.com/facebookresearch/moco 1 主要思想 文章核心思想是使用基于 Contrastive learning 的方式自监督的训练一个图片表…
Link of the Paper: https://arxiv.org/abs/1411.4389 Main Points: A novel Recurrent Convolutional Architecture ( CNN + LSTM ): both Spatially and Temporally Deep. The recurrent long-term models are directly connected to modern visual convnet models and…
Link of the Paper: https://arxiv.org/pdf/1504.06692.pdf Innovations: The authors propose the Novel Visual Concept learning from Sentences ( NVCS ) task. In this task, methods need to learn novel concepts from sentence descriptions of a few images. Th…
Link of the Paper: https://arxiv.org/pdf/1502.03044.pdf Main Points: Encoder-Decoder Framework: Encoder uses a convolutional neural network to extract a set of feature vectors which the authors refer to as annotation vectors. The extractor produces L…
论文链接:https://arxiv.org/pdf/1502.03044.pdf 代码链接:https://github.com/kelvinxu/arctic-captions & https://github.com/yunjey/show-attend-and-tell & https://github.com/jazzsaxmafia/show_attend_and_tell.tensorflow 主要贡献 在这篇文章中,作者将“注意力机制(Attention Mechanism…
Link of the Paper: https://arxiv.org/abs/1706.03762 Motivation: The inherently sequential nature of Recurrent Models precludes parallelization within training examples. Attention mechanisms have become an integral part of compelling sequence modeling…
Link of the Paper: https://arxiv.org/abs/1705.03122 Motivation: Compared to recurrent layers, convolutions create representations for fixed size contexts, however, the effective context size of the network can easily be made larger by stacking severa…
Link of the Paper: https://arxiv.org/abs/1412.2306 Main Points: An Alignment Model: Convolutional Neural Networks over image regions ( An image -> RCNN -> Top 19 detected locations in addition to the whole image -> the representations based on th…
Link of the Paper: http://papers.nips.cc/paper/4470-im2text-describing-images-using-1-million-captioned-photographs.pdf Main Points: A large novel data set containing images from the web with associated captions written by people, filtered so that th…
1. Sketch me that shoe, Qian Yu, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales, Cheng Change Loy, in CVPR 2016. A unique characteristic of sketches in the context of image retrieval is that they offer inherently fine-grained visual descript…
Link of the Paper: https://arxiv.org/abs/1806.06422 Innovations: The authors propose a novel learning based discriminative evaluation metric that is directly trained to distinguish between human and machine-generated captions. They train an automatic…
Link of the Paper: https://arxiv.org/abs/1711.09151 Motivation: LSTM units are complex and inherently sequential across time. Convolutional networks have shown advantages on machine translation and conditional image generation. Innovation: The author…
Inside-Outside Net (ION) 论文:Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks 发表时间:2016 发表作者:(Cornell University)Sean Bell, C. Lawrence Zitnick,(Microsoft Research)Kavita Bala, Ross Girshick 论文链接:论文链接 本文…
<Exploiting Relevance Feedback in Knowledge Graph> Publication: KDD 2015 Authors: Yu Su, Shengqi Yang, etc. Affiliation: UCSB... 1. Short description: p { margin-bottom: 0.1in; line-height: 120% } a:link { } This paper formulate the novice graph rel…
Perceptual Generative Adversarial Networks for Small Object Detection 2017-07-11  19:47:46   CVPR 2017 This paper use GAN to handle the issue of small object detection which is a very hard problem in general object detection. As shown in the followin…
Link of the Paper: https://arxiv.org/pdf/1409.3215.pdf Main Points: Encoder-Decoder Model: Input sequence -> A vector of a fixed dimensionality -> Target sequence. A multilayered  LSTM: The LSTM did not have difficulty on long sentences. Deep LSTMs…
来源:NIPS 2013 作者:DeepMind 理解基础: 增强学习基本知识 深度学习 特别是卷积神经网络的基本知识 创新点:第一个将深度学习模型与增强学习结合在一起从而成功地直接从高维的输入学习控制策略 详细是将卷积神经网络和Q Learning结合在一起.卷积神经网络的输入是原始图像数据(作为状态)输出则为每一个动作相应的价值Value Function来预计未来的反馈Reward 实验成果:使用同一个网络学习玩Atari 2600 游戏.在測试的7个游戏中6个超过了以往的方法而且好几个超…