( 转) Awesome Image Captioning】的更多相关文章

Awesome Image Captioning 2018-12-03 19:19:56 From: https://github.com/zhjohnchan/awesome-image-captioning Papers 2010 I2t: Image parsing to text description - Yao B Z et al, P IEEE 2011. 2011 Im2Text: Describing Images Using 1 Million Captioned Photo…
Convolutional Image Captioning 2018-11-04 20:42:07 Paper: http://openaccess.thecvf.com/content_cvpr_2018/papers/Aneja_Convolutional_Image_Captioning_CVPR_2018_paper.pdf Code: https://github.com/aditya12agd5/convcap Related Papers: 1. Convolutional Se…
Main Contributions: A brief introduction about two different methods (retrieval based method and generative method) for image captioning task. The authors implemented the classical model, Show and Tell, and gave analyses based on the experiments. Exc…
Video Analysis 相关领域介绍之Video Captioning(视频to文字描述)http://blog.csdn.net/wzmsltw/article/details/71192385 基于视频图像的信息:包括简单的用CNN(VGGNet, ResNet等)提取图像(spatial)特征,用action recognition的模型(如C3D)提取视频动态(spatial+temporal)特征 先验特征:比如视频的类别,这种特征能提供很强的先验信息 基于文本的特征:此处基于文…
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal Recurrent Neural Networks ( AlexNet/VGGNet + a multimodal layer + RNNs ). Their work has two major differences from these methods. Firstly, they inco…
Papers Published in 2018 Convolutional Image Captioning - Jyoti Aneja et al., CVPR 2018 - [ Paper Reading ] Learning to Evaluate Image Captioning - Yin Cui et al., CVPR 2018 - [ Paper Reading ] CNN+CNN: Convolutional Decoders for Image Captioning - Q…
Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN framework for image captioning. There are four modules in the framework: vision module ( VGG-16 ), which is adopted to "watch" images; language modu…
Link of the Paper: https://arxiv.org/abs/1806.06422 Innovations: The authors propose a novel learning based discriminative evaluation metric that is directly trained to distinguish between human and machine-generated captions. They train an automatic…
Link of the Paper: https://arxiv.org/abs/1711.09151 Motivation: LSTM units are complex and inherently sequential across time. Convolutional networks have shown advantages on machine translation and conditional image generation. Innovation: The author…
第九讲_图像生成 Image Captioning 生成式对抗网络 Generative Adversarial network 学习数据分布:概率密度函数估计+数据样本生成 生成式模型是共生关系,判别式模型是因果关系 GAN在生成模型的位置 GAN特点 GAN 无监督网络框架 生成器generator and 判别器 discriminator 先学习判别器,然后固定判别器,优化生成器 生成器网络 生成样本数据 判别器网络 样本有真实采样数据+生成器生成的样本数据 EM优化是同方向优化,GAN…
第七讲_图像描述(图说)Image Captioning 本章结构 递归神经网络 时序后向传播(BPTT) 朴素Vanilla-RNN 基本模型 用sigmoid存在严重的梯度消失 LSTM长短时记忆模型(97年提出) 基本模型 模型对比 LSTM数学模型 控制门作用理解 LSTM结构图 LSTM变种: Peephole Coupled 忘记输入门 GRU门限递归单元(Gated Recurrent Unit) 改进 LSTM和GRU比较 图像描述 为图片生成描述语言 具有多模态理解和推理:复合…
初次接触Captioning的问题,第一印象就是Andrej Karpathy好聪明.主要从他的两篇文章开始入门,<Deep Fragment Embeddings for Bidirectional Image Sentence Mapping>和<Deep Visual-Semantic Alignments for Generating Image Descriptions>.基本上,第一篇文章看明白了,第二篇就容易了,研究思路其实是一样的.但确实,第二个模型的功能更强大一些…
Image Caption: Automatically describing the content of an image domain:CV+NLP Category:(by myself, you can read the survey for detail.) CNN+RNN, with attention mechanisms Reinforcement Learning GAN Compositional Architecture: Review Network, Guiding…
这篇涉及到以下三篇论文: Unpaired Image Captioning by Language Pivoting (ECCV 2018) Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data (ECCV 2018) Unsupervised Image Caption (CVPR 2019) 1. Unpaired Image Captioning by Lan…
Image caption generation: https://github.com/eladhoffer/captionGen Simple encoder-decoder image captioning: https://github.com/udacity/CVND---Image-Captioning-Project (Paper)StyleNet: Generating Attractive Visual Captions with Styles:  https://github…
1.Unsupervised learning of video representations using LSTMs 方法:从先前的帧编码预测未来帧序列 相似于Sequence to sequence learning with neural networks论文 方法:使用一个LSTM编码输入文本成固定表示,另一个LSTM解码成不同语言 2.Describing Videos by Exploiting Temporal Structure 该论文发表在iccv2015,是第一篇使用时间关…
题目:SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning 作者: Long Chen等(浙大.新国立.山大) 期刊:CVPR 2017 1       背景 注意力机制已经在自然语言处理和计算机视觉领域取得了很大成功,但是大多数现有的基于注意力的模型只考虑了空间特征,即那些注意模型考虑特征图像中的局部更“重要”的信息,忽略了多通道信息的重要性关系.这篇文章介绍了一种新…
视频描述 顾名思义视频描述是计算机对视频生成一段描述,如图所示,这张图片选取了一段视频的两帧,针对它的描述是"A man is doing stunts on his bike",这对在线的视频的检索等有很大帮助.近几年图像描述的发展也让人们思考对视频生成描述,但不同于图像这种静态的空间信息,视频除了空间信息还包括时序信息,同时还有声音信息,这就表示一段视频比图像包含的信息更多,同时要求提取的特征也就更多,这对生成一段准确的描述是重大的挑战. 一.long-term Recurrent…
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering-阅读总结 笔记不能简单的抄写文中的内容,得有自己的思考和理解. 一.基本信息 **\1.标题:**Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering **\2.作者:**Peter Anderson,Xiaodong…
Link of the Paper: https://arxiv.org/abs/1609.06647 A Correlative Paper: Show and Tell: A Neural Image Caption Generator (Link of the Paper: https://arxiv.org/abs/1411.4555) Main Points ( Improvements Over the CVPR2015 Model  ): Image Model Improveme…
前言废话,作者说把代码公布在gitub上,但是迟迟没有公布,我发邮件询问代码情况,邮件也迟迟不回,表示很尴尬..虽然种种这些,但是工作还是好工作,这个没的黑,那我们今天就来详细的介绍这篇文章. 导论:不了解caption的童鞋可以去看下这两篇知乎专栏:     看图说话的AI小朋友--图像标注趣谈(上)     看图说话的AI小朋友--图像标注趣谈(下) 一:摘要     作者提出了一个新的attention模型,这个模型与以往的区别在于,不仅考虑了状态与预测单词之间的关系,同时也考虑了图像区域…
实现load_img_as_np_array def load_img_as_np_array(path, target_size): """从给定文件[加载]图像,[缩放]图像大小为给定target_size,返回[Keras支持]的浮点数numpy数组. # Arguments path: 图像文件路径 target_size: 元组(图像高度, 图像宽度). # Returns numpy 数组. """ 使用PIL库: from PIL…
Background 分别使用CNN和LSTM对图像和文字进行处理: 将两个神经网络结合: 应用领域 图像搜索 安全 鉴黄 涉猎知识 数字图像处理 图像读取 图像缩放 图像数据纬度变换 自然语言处理 文字清洗 文字嵌入(Embedding) CNN卷积神经网络 图像特征提取 迁移学习(Transfer Learning) LSTM递归神经网络 文字串(sequence)特征提取 DNN深度神经网络 从图像特征和文字串(sequence)的特征预测下一个单词 使用数据集 Framing Image…
昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machine Learning (by Hastie, Tibshirani, and Friedman's ) 2.Elements of Statistical Learning(by Bishop's) 这两本是英文的,但是非常全,第一本需要有一定的数学基础,第可以先看第二本.如果看英文觉得吃力,推荐看一下下面…
cordova + ionic 使用中碰到的一些问题     No Content-Security-Policy meta tag found. Please add one when using the cordova-plugin-whitelist plugin.解决办法index.html 中添加<meta http-equiv="Content-Security-Policy" content="default-src *; style-src 'self'…
来自:http://www.cnblogs.com/bandry/archive/2006/10/11/526229.html 在Web页中嵌入Media Player的方法比较简单,只要用HTML中的<Object></Object>可以了,如下所示.<OBJECT ID="WMPlay" WIDTH=320 HEIGHT=240CLASSID="CLSID:22D6f312-B0F6-11D0-94AB-0080C74C7E95"C…
Top Deep Learning Projects A list of popular github projects related to deep learning (ranked by stars). Last Update: 2016.08.09 Project Name Stars Description TensorFlow 29622              Computation using data flow graphs for scalable machine lear…
Inserting Images Images are essential elements in most of the scientific documents. LATEX provides several options to handle images and make them look exactly what you need. In this article is explained how to include images in the most common format…
UIImageWriteToSavedPhotosAlbum: Next UIKit Function Reference Overview The UIKit framework defines a number of functions, many of them used in graphics and drawing operations. Functions by Task Application Launch UIApplicationMain Image Manipulation…
http://handong1587.github.io/deep_learning/2015/10/09/rnn-and-lstm.html  //RNN and LSTM http://handong1587.github.io/deep_learning/2015/10/09/saliency-prediction.html //saliency Predection http://handong1587.github.io/deep_learning/2015/10/09/scene-l…