A Hierarchical Approach for Generating Descriptive Image Paragraphs (CPVR 2017) Li Fei-Fei.

数据集地址: http://cs.stanford.edu/people/ranjaykrishna/im2p/index.html

Workflow:

1.decompose the input image by detecting objects and other regions of interest

2.aggregate features across these regions to produce a pooled representation richly expressing the image semantics

3.take this feature vector as input by a hierarchical recurrent neural network composed of two levels: a sentence RNN and a word RNN.

4.sentence RNN receives the image features ,decides how many sentences to generate in the resulting paragraph, and produce an input topic vector for each sentence.

5.word RNN use this topic vector to generate the words of a single sentence.

Region Detector:

CNN+RPN

resize image-->pass through a CNN to get feature maps-->region proposal network(RPN) process the resulting feature maps-->regions of interest are projected onto the convolutional feature maps-->the corresponding region of the feature map is resized to a fixed size using bilinear interpolation and processed by two fully-connected layers to give a vector of dimension D for each region.

Given a dataset of images and ground-truth regions of interest, the region detector can be trained end-to-end fashion for object detection and for dense captioning.

Region Pooling:

elementwise maximum, Wpool and bpool are learned parameters, vi stands for a set of vectors produced by the region detector.

Hierarchical Recurrent Network:

Why Hierachical?

1.It reduces the length of time over which the recurrent networks must reason.

2.the generated paragraphs contain numbers of sentences, both the paragraph and sentence RNNs need only reason over much shorter time-scales, making learning an appropriate representation much more tractable

Sentence RNN: take the pooled region vector vp as input and produce a sequence of hidden states h1,h2,...,hS one for each sentence in the paragraph. Each hidden state used in two ways, produce a distributin pi to determine whether to stop and produce the topic vector ti for the i-th sentence of the paragraph ,which is the input of the word RNN.

Word RNN: the same as the LSTM components in the image captionings.

Training and Sampling:

training loss l(x,y) for the example (x,y) is a weighted sum of the two cross-entropy terms: a sentence loss lsent on the stopping distribution pi , and a word loss lword on the word distribution pij

Experiments:

Recurrent Topic-Transition GAN for Visual Paragraph Generation (ICCV 2017)
Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric Xing
RTT-GAN

Towards Diverse and Natural Image Descriptions via a Conditional GAN (ICCV 2017)

Previous approaches, including both generation methods and evaluation metrics, primarily focus on the resemblance to the training samples.

Instead of emphasizing n-gram matching, we aim to improve the naturalness and diversity.

Generation.Under the MLE principle, the joint probability of a sentence is, to a large extent, determined by whether it contains the frequent n-grams from the training set.

When the generator yields a few of words that match the prefix of a frequent n-gram, the remaining words of that n-gram will likely be produced following the Markov chain.

Evaluation.Classical metrics include BLEU, and ROUGE, which respectively focuses on the precision and recall of n-grams. Beyond them, METEOR uses a combination of both the precison and the recall of n-grams. CIDEr uses weighted statistics over n-grams. As we can see, such metrics mostly rely on matching n-grams with the "groundtruths". As a result, sentences that contain frequent n-grams will get higher scores as compared to those using variant expressions. SPICE: Instead of matching between n-grams, it focues on those linguistic entities that reflect visual concepts (e.g. objects and relationships). However, other qualities, e.g. the naturalness of the expressions, are not considered in this metric.

The generator G takes two inputs: an image feature f(I) derived from a CNN and a ramdom vector z.

Diverse and Coherent Paragraph Generation from Images (ECCV 2018)

github: https://github.com/metro-smiles/CapG_RevG_Code

The authors propose to augment paragraph generation techniques with "coherence vectors," "global topic vectors," and modeling of the inherent ambiguity of associating paragraphs with images, via a variational auto-encoder formulation.

Topic Generation Net and Sentence Generation Net

Training for Diversity in Image Paragraph Captioning (EMNLP 2018)

github: https://github.com/lukemelas/image-paragraph-captioning

Image Paragraph论文合辑的更多相关文章

  1. Image Caption论文合辑2

    说明: 这个合辑里面的论文不全是Image Caption, 但大多和Image Caption相关, 同时还有一些Workshop论文. Guiding Long-Short Term Memory ...

  2. Image Captioning 经典论文合辑

    Image Caption: Automatically describing the content of an image domain:CV+NLP Category:(by myself, y ...

  3. Medical Image Report论文合辑

    Learning to Read Chest X-Rays:Recurrent Neural Cascade Model for Automated Image Annotation (CVPR 20 ...

  4. 【Tips】史上最全H1B问题合辑——保持H1B身份终级篇

    [Tips]史上最全H1B问题合辑——保持H1B身份终级篇 2015-04-10留学小助手留学小助手 留学小助手 微信号 liuxue_xiaozhushou 功能介绍 提供最真实全面的留学干货,帮您 ...

  5. SSH三大框架合辑的搭建步骤

    v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VM ...

  6. 【OpenCV新手教程之十二】OpenCV边缘检測:Canny算子,Sobel算子,Laplace算子,Scharr滤波器合辑

    本系列文章由@浅墨_毛星云 出品,转载请注明出处. 文章链接:http://blog.csdn.net/poem_qianmo/article/details/25560901 作者:毛星云(浅墨) ...

  7. 【OpenCV新手教程之十八】OpenCV仿射变换 & SURF特征点描写叙述合辑

    本系列文章由@浅墨_毛星云 出品,转载请注明出处. 文章链接:http://blog.csdn.net/poem_qianmo/article/details/33320997 作者:毛星云(浅墨)  ...

  8. 【OpenCV新手教程之十七】OpenCV重映射 & SURF特征点检測合辑

    本系列文章由@浅墨_毛星云 出品.转载请注明出处. 文章链接:http://blog.csdn.net/poem_qianmo/article/details/30974513 作者:毛星云(浅墨)  ...

  9. [OpenCV入门教程之十二】OpenCV边缘检测:Canny算子,Sobel算子,Laplace算子,Scharr滤波器合辑

    http://blog.csdn.net/poem_qianmo/article/details/25560901 本系列文章由@浅墨_毛星云 出品,转载请注明出处. 文章链接:http://blog ...

随机推荐

  1. 移动端UI界面设计:APP字体排版设计的七个原则

    移动端UI界面设计:APP字体排版设计的七个原则 发布于: 2015 年 2 月 9 日 by admin 再来谈移动端APP字体排版设计,也许有人会说,这个还有什么好说的呢?但是真正能够运用好APP ...

  2. [Ramda] Refactor to Point Free Functions with Ramda using compose and converge

    In this lesson we'll take some existing code and refactor it using some functions from the Ramda lib ...

  3. Python实战:如何隐藏自己的爬虫身份

    使用爬虫访问网站,需要尽可能的隐藏自己的身份,以防被服务器屏蔽,在工作工程中,我们有2种方式来实现这一目的,分别是延时访问和动态代理,接下来我们会对这两种方式进行讲解 1.延时访问 见名之意,延时访问 ...

  4. iOS中js与objective-c的简单交互

    1.首先是objective-c调用js中的代码,可以用UIWebview中的一个方法 stringByEvaluatingJavaScriptFromString:后面接的是js中的方法名.这个函数 ...

  5. 【codeforces 782D】 Innokenty and a Football League

    [题目链接]:http://codeforces.com/contest/782 [题意] 每个队名有两种选择, 然后第一个选择队名相同的那些队只能选第二种; 让你安排队名 [题解] 首先全都选成第一 ...

  6. JS----checked----checked选中和未选中的获取

    , allValue.length - 1); allValue = allValue.replace(/[ ]/g, ""); var checkedIds = allValue ...

  7. 【t029】Mobile Service

    Time Limit: 3 second Memory Limit: 256 MB [问题描述] 一个公司有三个移动服务员.如果某个地方有一个请求,某个员工必须赶到那个地方去(那个地方没有其他员工), ...

  8. hbase 配置高可用hmaster

    1.先停掉hbase bin/stop-hbase.sh 2.在hbase的conf目录下创建 backup-masters 添加hadoop003 3.分发 4.重新启动hbase并查看 bin/s ...

  9. windows 下使用 virtualenv 创建虚拟环境

    virtualenv虚拟环境为每个项目隔离了一套运行类库,不同的项目在各自的虚拟环境中使用不同的类库,避免了将所有类库都安装到系统环境中导致的不同项目需要不同(版本)类库的问题,项目与项目之间的类库依 ...

  10. android开发之微信支付功能的实现

    移动开发中,支付类的App越来越多,对于开发者来说也是不可少的,不可不会的:下面就来说一说支付开发的流程 1.申请你的AppID 请到 开发者应用登记页面 进行登记,登记并选择移动应用进行设置后,将该 ...