Paper Reading - Convolutional Image Captioning ( CVPR 2018 )
Link of the Paper: https://arxiv.org/abs/1711.09151
Motivation:
- LSTM units are complex and inherently sequential across time.
- Convolutional networks have shown advantages on machine translation and conditional image generation.
Innovation:
- The authors develop a convolutional ( CNN-based ) image captioning method that shows comparable performance to an LSTM based method on standard metrics.
- The authors analyze the characteristics of CNN and LSTM nets and provide useful insights such as -- CNNs produce more entropy ( useful for diverse predictions ), better classification accuracy, and do not suffer from vanishing gradients.
Improvement:
- Improved performance with a CNN model that uses Attention Mechanism to leverage spatial image features.
General Points:
- Image Captioning is applicable to virtual assistants, editing tools, image indexing and support of the disabled.
- Image Captioning is a basic ingredient for more complex operations such as storytelling and visual summarization.
- An illustration of a classical RNN architecture for image captioning is provided below.
Paper Reading - Convolutional Image Captioning ( CVPR 2018 )的更多相关文章
- Paper Read: Convolutional Image Captioning
Convolutional Image Captioning 2018-11-04 20:42:07 Paper: http://openaccess.thecvf.com/content_cvpr_ ...
- Paper Reading - Learning to Evaluate Image Captioning ( CVPR 2018 ) ★
Link of the Paper: https://arxiv.org/abs/1806.06422 Innovations: The authors propose a novel learnin ...
- Paper Reading - Convolutional Sequence to Sequence Learning ( CoRR 2017 ) ★
Link of the Paper: https://arxiv.org/abs/1705.03122 Motivation: Compared to recurrent layers, convol ...
- Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
- 爬取CVPR 2018过程中遇到的坑
爬取 CVPR 2018 过程中遇到的坑 使用语言及模块 语言: Python 3.6.6 模块: re requests lxml bs4 过程 一开始都挺顺利的,先获取到所有文章的链接再逐个爬取获 ...
- 在矩池云上复现 CVPR 2018 LearningToCompare_FSL 环境
这是 CVPR 2018 的一篇少样本学习论文:Learning to Compare: Relation Network for Few-Shot Learning 源码地址:https://git ...
- Paper Reading - Long-term Recurrent Convolutional Networks for Visual Recognition and Description ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4389 Main Points: A novel Recurrent Convolutional Arch ...
- Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning
Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN fra ...
- Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...
随机推荐
- 阿里前端测试题--关于ES6中Promise函数的理解与应用
今天做了阿里前端的笔试题目,原题目是这样的 //实现mergePromise函数,把传进去的数组顺序先后执行,//并且把返回的数据先后放到数组data中 const timeout = ms => ...
- 浅析Vue原理(部分源码解析)
响应式 Object.defineProperty Object.defineProperty(obj, prop, descriptor) // 对象.属性.描述符 Object.definePro ...
- C++分享笔记:扑克牌的洗牌发牌游戏设计
笔者在大学二年级期间,做过的一次C++程序设计:扑克牌的洗牌发牌游戏.具体内容是:除去大王和小王,将52张扑克牌洗牌,并发出5张牌.然后判断这5张牌中有几张相同大小的牌,是否是一条链,有几个同花等. ...
- Spring的绿草丛
Spring 轻量级框架,JavaEE的春天,当前主流框架 “站立式”的企业应用开发框架 目标 实现有的技术更加易用,推进编码最佳实践 内容:loC容器,AOP实现,数据访问支持:简化JDBC/ORM ...
- 开发和调试第一个 LLVM Pass
1. 下载和编译 LLVM LLVM 下载地址 http://releases.llvm.org/download.html,目前最新版是 6.0.0,下载完成之后,执行 tar 解压 llvm 包: ...
- HTML-CSS的几种布局
第一种 两栏式布局 <body> <!-- 两栏式布局 --> <!-- 想要的效果是左边图片右边文字 拉伸盒子文字的高度宽度自动改变 --> <div c ...
- OpenCV-Python 视频读取
import numpy as np import cv2 # 读取视频文件 cap = cv2.VideoCapture('./law.mp4') # 或者电影每秒的帧数 fps = cap.get ...
- mysql 多主多从配置,自增id解决方案
MySQL两主(多主)多从架构配置 一.角色划分 1.MySQL数据库规划 我现在的环境是:zhdy04和zhdy05已经做好了主主架构配置,现在需要的是把两台或者多台从服务器与主一一同步. 主机名 ...
- Laravel 生成二维码
(本实例laravel 版本 >=5.6, PHP版本 >=7.0) 1.首先,添加 QrCode 包添加到你的 composer.json 文件的 require 里: "re ...
- sample采样倾斜key并单独进行join代码
/** * sample采样倾斜key单独进行join */ JavaPairRDD<Long, String> sampledRDD = userid2PartAggrInfoRDD.s ...