Unsupervised Learning of Video Representations using LSTMs

Note here: it's a learning notes on new LSTMs architecture used as an unsupervised learning way of video representations.

(More unsupervised learning related topics, you can refer to:

Learning Temporal Embeddings for Complex Video Analysis

Unsupervised Learning of Visual Representations using Videos

Unsupervised Visual Representation Learning by Context Prediction)

Link: http://arxiv.org/abs/1502.04681

Motivation:

- Understanding temporal sequences is important for solving many video related problems. We should utilize temporal structure of videos as a supervisory signal for unsupervised learning.

Proposed model:

In this paper, the author proposed three models based on LSTM:

1) LSTM Autoencoder Model:

  This model is composed of two parts, the encoder and the decoder.

  The encoder accepts sequences of frames as input, and the learned representation generated from encoder are copied to decoder as initial input. Then the decoder should reconstruct similar images like input frames in reverse order.

  (This is called unconditional version, while a conditional version receives last generated output of decoder as input, shown as the dashed boxes below)

Intuition: The reconstruction work requires the network to capture information about the appearance of objects and the background, this is exactly the information that we would like the representation to contain.

2) LSTM Future Predictor Model:

  This model is similar with the one above. The main difference lies in the output. Output of this model is the prediction of frames that come just after the input sequences. It also varies with conditional/unconditional versions just like the description above.

Intuition: In order to predict the next few frames correctly, the model needs information about which objects are present and how they are moving so that the motion can be extrapolated.

3) A Composite Model:

  This model combines "input reconstruction" and "future prediction" together to form a more powerful model. These two modules share a same encoder, which encodes input sequences into a feature vector and copy them to different decoders.

Intuition: this only encoder learns representations that contain not only static appearance of objects&background, but also the dynamic informations like moving objects and their moving pattern.

【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs的更多相关文章

  1. 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos

    Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...

  2. 论文阅读笔记(三)【AAAI2017】:Learning Heterogeneous Dictionary Pair with Feature Projection Matrix for Pedestrian Video Retrieval via Single Query Image

    Introduction (1)IVPR问题: 根据一张图片从视频中识别出行人的方法称为 image to video person re-id(IVPR) 应用: ① 通过嫌犯照片,从视频中识别出嫌 ...

  3. ZH奶酪:【阅读笔记】Deep Learning, NLP, and Representations

    中文译文:深度学习.自然语言处理和表征方法 http://blog.jobbole.com/77709/ 英文原文:Deep Learning, NLP, and Representations ht ...

  4. 【ML】Two-Stream Convolutional Networks for Action Recognition in Videos

    Two-Stream Convolutional Networks for Action Recognition in Videos & Towards Good Practices for ...

  5. 【ML】ICLR2016_Delving Deeper into Convolutional Networks

    ICLR2016_DELVING DEEPER INTO CONVOLUTIONAL NETWORKS Note here: Ballas recently proposed a novel fram ...

  6. 【RS】CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Collaborative Filtering-CoupledCF:在推荐系统深度协作过滤中学习显式和隐式的用户物品耦合

    [论文标题]CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Colla ...

  7. 【RS】List-wise learning to rank with matrix factorization for collaborative filtering - 结合列表启发排序和矩阵分解的协同过滤

    [论文标题]List-wise learning to rank with matrix factorization for collaborative filtering   (RecSys '10 ...

  8. 【RS】Deep Learning based Recommender System: A Survey and New Perspectives - 基于深度学习的推荐系统:调查与新视角

    [论文标题]Deep Learning based Recommender System: A Survey and New Perspectives ( ACM Computing Surveys  ...

  9. 【ML】Predict and Constrain: Modeling Cardinality in Deep Structured Prediction -预测和约束:在深度结构化预测中建模基数

    [论文标题]Predict and Constrain: Modeling Cardinality in Deep Structured Prediction   (35th-ICML,PMLR) [ ...

随机推荐

  1. 第 16 章 C 预处理器和 C 库(可变参数:stdarg.h)

    /*------------------------------------------------- varargs.c -- use variable number of arguments -- ...

  2. bootstrap table使用及遇到的问题

    本人前端菜鸟一枚,最近使用bootstrap table实现表格,记录一下以便日后翻阅,废话不多说,先看效果图: 1.首先说下要实现该效果需要添加的css样式及所需的js文件,具体下载地址就不粘贴了( ...

  3. 洛谷P2342-叠积木

    Problem 洛谷P2342-叠积木 Accept: 373   Submit: 1.1k Time Limit: 1000 mSec    Memory Limit : 128MB Problem ...

  4. PHP数组转为树的算法

    一.使用引用 function listToTree($list, $pk = 'id', $pid = 'pid', $child = '_child', $root = 0) { $tree = ...

  5. Redis String类型的API使用

    package com.daxin.jedis_datastructure; import org.junit.After; import org.junit.Before; import org.j ...

  6. ADB安装及使用

    环境安装: 下载.安装和配置ADB     https://jingyan.baidu.com/article/22fe7cedf67e353002617f25.html 安装驱动adbdriver  ...

  7. Linux 下的 Redis 安装 && 启动 && 关闭 && 卸载

    转自https://blog.csdn.net/zgf19930504/article/details/51850594 Redis 在Linux 和 在Windows 下的安装是有很大的不同的,和通 ...

  8. 网站性能优化小结和spring整合redis

    现在越来越多的地方需要非关系型数据库了,最近网站优化,当然从页面到服务器做了相应的优化后,通过在线网站测试工具与之前没优化对比,发现有显著提升. 服务器优化目前主要优化tomcat,在tomcat目录 ...

  9. [07] 使用注解完成IOC配置

    1.扫描配置 之前使用的Spring的Bean管理都是通过xml的配置文件来操作的,在Spring3.0之后已经引入了注解形式,Spring可以在指定路径下进行扫描,寻找标注了@Component.@ ...

  10. eclipse中使用svn提交,更新代码。

    在新公司工作,版本管理工具变成了svn,之前一直用git作为版本管理,用的编辑IDE是IntelliJIDEA,在这个编辑器下工作,还是很方便的,但是现在使用eclipse和svn.有点不习惯,但还是 ...