Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555
Main Points:
- A generative model ( NIC, GoogLeNet + LSTM ) based on a deep recurrent architecture: the model is trained to maximize the likelihoodP(S|I) of the target description sentence given the training image I. S = { S1, S2, ... } is the target sequence of words and each word St comes from a given dictionary, that describes the image adequately.
- The authors use a CNN as an image "encoder", by first pre-training it for an image classification task and using the last hidden layer as an input to the RNN decoder that generates sentences. They call this model the Neural Image Caption, or NIC.
Other Key Points:
- A description must capture not only the objects contained in an image, but it also must express how these objects relate to each other as well as their attributes and the activities they are involved in.
- The inspiration of Image Captioning could come from advances in Machine Translation.
- There are multiple approaches that can be used to generate a sentence given an image, with NIC. The first one is Sampling where the authors just sample the first word according to p1, then provide the corresponding embedding as input and sample p2, continuing like this until we sample the special end-of-sentence token or some maximum length. The second one is Beamsearch: iteratively consider the set of the k best sentences up to time t as candidates to generate sentences of size t+1, and keep only the resulting best k of them. This better approximates S = arg maxS' p(S'|I).
Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )的更多相关文章
- [Paper Reading] Show and Tell: A Neural Image Caption Generator
论文链接:https://arxiv.org/pdf/1411.4555.pdf 代码链接:https://github.com/karpathy/neuraltalk & https://g ...
- Paper Reading - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ( ICML 2015 )
Link of the Paper: https://arxiv.org/pdf/1502.03044.pdf Main Points: Encoder-Decoder Framework: Enco ...
- [Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
论文链接:https://arxiv.org/pdf/1502.03044.pdf 代码链接:https://github.com/kelvinxu/arctic-captions & htt ...
- Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )
Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Rec ...
- [Paper Reading] Image Captioning using Deep Neural Architectures (arXiv: 1801.05568v1)
Main Contributions: A brief introduction about two different methods (retrieval based method and gen ...
- Paper Reading - Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge
Link of the Paper: https://arxiv.org/abs/1609.06647 A Correlative Paper: Show and Tell: A Neural Ima ...
- 论文:Show and Tell: A Neural Image Caption Generator-阅读总结
Show and Tell: A Neural Image Caption Generator-阅读总结 笔记不能简单的抄写文中的内容,得有自己的思考和理解. 一.基本信息 标题 作者 作者单位 发表 ...
- Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
- CVPR 2016 paper reading (6)
1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fi ...
随机推荐
- css让不同大小的图片适应div的大小,且不变形。
做成背景图片 单个 .imgdiv { width: 100px; // 你要的正方形 height: 100px; // 你要的正方形 background-image: url(/your/ima ...
- GoBelieve IM 消息推送的方案
消息推送设计方案如下: 所有接入im SDK的deviceTOken都会存储到IM服务器.就可以 IM服务器来根据你们服务器指定的useId来下发消息.判断客户端在线,并且APP在前台.就是socke ...
- Objective-C基础知识之“类”
Objective-C语言是iOS开发的专用语言,虽然现在在逐步被swift语言取代,但是仍可以作为基础学习,学会Objective-C之后入手swift也是相当快速.今天我来简谈一下关于OC中的类. ...
- QueryRunner cannot be resolved to a type:关于包不能正常导入的问题
在操作一个功能模块的时候,出现一个问题: 我原则是按着项目指导一步一步走的,但却出现, QueryRunner cannot be resolved to a type,这个问题应该属于Xxx can ...
- angular setInterval计时操作
在angular中setInterval方法是单向绑定,只绑定一次,无法实现计时效果, 可以使用$interval实现.附上代码:
- python3爬虫-爬取B站排行榜信息
import requests, re, time, os category_dic = { "all": "全站榜", "origin": ...
- mysql数据库的系统操作基本操作
本文主要总结并记录一下简单且常用的mysql 在cmd 窗口中操作的基本命令 命令停止mysql 数据库服务 1.(cmd)命令行 启动:net start mysql 停止:net stop mys ...
- js获取dom元素的子元素,父元素,兄弟元素小记
原生jsvar a = document.getElementById("dom"); del_space(a); //清理空格 var b = a.childNodes; //获 ...
- jQuery $ 的作用
$符号总体来说有两个作用: 1.作为一般函数调用:$(param) (1).参数为函数:当DOM加载完成后,执行此回调函数 $(function(){//dom加载完成后执行 //代码 }) (2). ...
- 关于虚拟机linux网络的一个问题(基于cntos7)
刚刚开始学习搭建Linux集群,目前出现的比较麻烦的一个问题是Linux网络ip问题.其实网上好多出现类似问题的解答大部分说是因为克隆的问题,但实际情况先克隆产生的问题应该是很好排查的.所幸,有博主针 ...