Paper Reading - Im2Text: Describing Images Using 1 Million Captioned Photographs ( NIPS 2011 )

Link of the Paper: http://papers.nips.cc/paper/4470-im2text-describing-images-using-1-million-captioned-photographs.pdf

Main Points:

A large novel data set containing images from the web with associated captions written by people, filtered so that the descriptions are likely to refer to visual content.
A description generation method that utilizes global image representations to retrieve and transfer captions from their data set to a query image: authors achieve this by computing the global similarity ( a sum of gist similarity and tiny image color similarity ) of a query image to their large web-collection of captioned images; they find the closest matching image ( or images ) and simply transfer over the description from the matching image to the query image.
A description generation method that utilizes both global representations and direct estimates of image content (objects, actions, stuff, attributes, and scenes) to produce relevant image descriptions.

Other Key Points:

Image captioning will help advance progress toward more complex human recognition goals, such as how to tell the story behind an image.
An approach from Every picture tells a story: generating sentences for images produces image descriptions via a retrieval method, by translating both images and text descriptions to a shared meaning space represented by a single < object, action, scene > tuple. A description for a query image is produced by retrieving whole image descriptions via this meaning space from a set of image descriptions.
The retrieval method relies on collecting and filtering a large data set of images from the internet to produce a novel web-scale captioned photo collection.

Paper Reading - Im2Text: Describing Images Using 1 Million Captioned Photographs ( NIPS 2011 )的更多相关文章

Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...
[Paper Reading]--Exploiting Relevance Feedback in Knowledge Graph
<Exploiting Relevance Feedback in Knowledge Graph> Publication: KDD 2015 Authors: Yu Su, Sheng ...
Paper Reading: Perceptual Generative Adversarial Networks for Small Object Detection
Perceptual Generative Adversarial Networks for Small Object Detection 2017-07-11 19:47:46 CVPR 20 ...
Paper Reading: In Defense of the Triplet Loss for Person Re-Identification
In Defense of the Triplet Loss for Person Re-Identification 2017-07-02 14:04:20 This blog comes ...
Paper Reading - Attention Is All You Need ( NIPS 2017 ) ★
Link of the Paper: https://arxiv.org/abs/1706.03762 Motivation: The inherently sequential nature of ...
Paper Reading - Convolutional Sequence to Sequence Learning ( CoRR 2017 ) ★
Link of the Paper: https://arxiv.org/abs/1705.03122 Motivation: Compared to recurrent layers, convol ...
Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1412.2306 Main Points: An Alignment Model: Convolutional Ne ...
Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )
Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Rec ...

随机推荐

iOS App占用太多磁盘空间
问题:随着App的不断运行,发现所占磁盘空间越来越大分析:应该是网络下载中的缓存,包括利用SDWebImage产生的.和下载单个文件被取消后的缓存验证:查看App目录中的Tmp(系统存放未下载完成 ...
MAC升级openssl
Mac OSX EI Capitan 10.11.6升级自带Openssl - 简书 Mac10.11升级安装openssl _ 刘春桂的博客 openssl_openssl_ TLS_SSL and ...
python2与python3的input函数的区别
Python3.x 中 input() 函数接受一个标准输入数据,返回为 string 类型. Python2.x 中 input() 相等于 eval(raw_input(prompt)) ,用来获 ...
Mysql存中文字符出错：Incorrect string value: '\xC2\xE9\xD7\xED\解决方法
1.数据库连接设置编码格式为UTF-8 jdbc:mysql://localhost:3306/jbpm_test?useUnicode=true&characterEncoding=UTF- ...
遍历语法for...in for...of iterator
1.Javascript最常见的遍历语法是for循环缺点:写法较为麻烦 for (let index = 0; index < array.length; index++) { const e ...
python面试题之基础2
2.3 考虑以下 Python 代码,如果运行结束,命令行中的运行结果是什么? 两者用法相同,不同的是 range 返回的结果是一个列表,而 xrange 的结果是一个生成器,前者是直接开辟一块内存 ...
LEFT JOIN个别问题
SELECT a.loginuser, a.schoolid, count(b.id)FROM vhs_school AS aLEFT JOIN vhs_attence AS b ON a.schoo ...
MongoDB固定集合（capped collection）
一 . 什么是固定集合 MongoDB中有一种特殊类型的集合,值得我们特别留意,那就是固定集合(capped collection). 固定集合可以声明collection的容量大小,其行为类似于循环 ...
基于visual studio 2017 以及cubemx 搭建stm32的开发环境（2）
主要解决 vs2017中,printf无法打印数据的问题. 在keil环境下正常使用printf功能,但是以下的重定向代码在vs2017下使用不了: #ifdef __GNUC__ /* With G ...
C语言编程练习 GPS数据处理
题目内容: NMEA-0183协议是为了在不同的GPS(全球定位系统)导航设备中建立统一的BTCM(海事无线电技术委员会)标准,由美国国家海洋电子协会(NMEA-The National Marine ...

Paper Reading - Im2Text: Describing Images Using 1 Million Captioned Photographs ( NIPS 2011 )

Paper Reading - Im2Text: Describing Images Using 1 Million Captioned Photographs ( NIPS 2011 )的更多相关文章

随机推荐

热门专题