Unsupervised Learning of Video Representations using LSTMs

Note here: it's a learning notes on new LSTMs architecture used as an unsupervised learning way of video representations.

(More unsupervised learning related topics, you can refer to:

Learning Temporal Embeddings for Complex Video Analysis

Unsupervised Learning of Visual Representations using Videos

Unsupervised Visual Representation Learning by Context Prediction)

Link: http://arxiv.org/abs/1502.04681

Motivation:

- Understanding temporal sequences is important for solving many video related problems. We should utilize temporal structure of videos as a supervisory signal for unsupervised learning.

Proposed model:

In this paper, the author proposed three models based on LSTM:

1) LSTM Autoencoder Model:

  This model is composed of two parts, the encoder and the decoder.

  The encoder accepts sequences of frames as input, and the learned representation generated from encoder are copied to decoder as initial input. Then the decoder should reconstruct similar images like input frames in reverse order.

  (This is called unconditional version, while a conditional version receives last generated output of decoder as input, shown as the dashed boxes below)

Intuition: The reconstruction work requires the network to capture information about the appearance of objects and the background, this is exactly the information that we would like the representation to contain.

2) LSTM Future Predictor Model:

  This model is similar with the one above. The main difference lies in the output. Output of this model is the prediction of frames that come just after the input sequences. It also varies with conditional/unconditional versions just like the description above.

Intuition: In order to predict the next few frames correctly, the model needs information about which objects are present and how they are moving so that the motion can be extrapolated.

3) A Composite Model:

  This model combines "input reconstruction" and "future prediction" together to form a more powerful model. These two modules share a same encoder, which encodes input sequences into a feature vector and copy them to different decoders.

Intuition: this only encoder learns representations that contain not only static appearance of objects&background, but also the dynamic informations like moving objects and their moving pattern.

【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs的更多相关文章

  1. 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos

    Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...

  2. 论文阅读笔记(三)【AAAI2017】:Learning Heterogeneous Dictionary Pair with Feature Projection Matrix for Pedestrian Video Retrieval via Single Query Image

    Introduction (1)IVPR问题: 根据一张图片从视频中识别出行人的方法称为 image to video person re-id(IVPR) 应用: ① 通过嫌犯照片,从视频中识别出嫌 ...

  3. ZH奶酪:【阅读笔记】Deep Learning, NLP, and Representations

    中文译文:深度学习.自然语言处理和表征方法 http://blog.jobbole.com/77709/ 英文原文:Deep Learning, NLP, and Representations ht ...

  4. 【ML】Two-Stream Convolutional Networks for Action Recognition in Videos

    Two-Stream Convolutional Networks for Action Recognition in Videos & Towards Good Practices for ...

  5. 【ML】ICLR2016_Delving Deeper into Convolutional Networks

    ICLR2016_DELVING DEEPER INTO CONVOLUTIONAL NETWORKS Note here: Ballas recently proposed a novel fram ...

  6. 【RS】CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Collaborative Filtering-CoupledCF:在推荐系统深度协作过滤中学习显式和隐式的用户物品耦合

    [论文标题]CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Colla ...

  7. 【RS】List-wise learning to rank with matrix factorization for collaborative filtering - 结合列表启发排序和矩阵分解的协同过滤

    [论文标题]List-wise learning to rank with matrix factorization for collaborative filtering   (RecSys '10 ...

  8. 【RS】Deep Learning based Recommender System: A Survey and New Perspectives - 基于深度学习的推荐系统:调查与新视角

    [论文标题]Deep Learning based Recommender System: A Survey and New Perspectives ( ACM Computing Surveys  ...

  9. 【ML】Predict and Constrain: Modeling Cardinality in Deep Structured Prediction -预测和约束:在深度结构化预测中建模基数

    [论文标题]Predict and Constrain: Modeling Cardinality in Deep Structured Prediction   (35th-ICML,PMLR) [ ...

随机推荐

  1. C#中类为什么要实例化

    在使用C#语言时,发现一下有关类实例化的问题,在此之前先复习一下类和对象的概念,类是一个抽象体,是对一类事物的抽象体:而对象就是一个具体的事物,对象的抽象就是类.车就是一个类,而车包括面包车,小汽车, ...

  2. python五十九课——正则表达式的拓展内容

    演示正则表达式的拓展内容:函数:finditer(regex,string,[flags=0]):参数:和match.search.findall一样理解功能:将所有匹配的数据封装为一个一个的matc ...

  3. CentOS 7.X 系统安装及优化

    centos的演变 启动流程sysvinit 串行启动:一次一个,一个一个启动 并行启动:全部的一起启动 init优点 运行非常良好.主要依赖于shell脚本 init缺点 1.启动慢 2.容易夯住, ...

  4. Qt+Qgis二次开发:地理实体抽象

    1  概述 地理实体抽象是指点.线.面及其组合而成的,用于描述实际地物的数据结构. 其中包含几何实体和属性数据. GIS中进行几何操作,以各种实体类为基础进行操作. 在OGC中,地理实体可以由WKT表 ...

  5. Matlab中要显示数学公式或符号Latex

    \rho 代表  ρ, \sigma  代表 σ \alpha   α \beta    β \gamma   γ \delta   δ \epsilon    ϵ \zeta    ζ \eta   ...

  6. 深入浅出的webpack4构建工具---比mock模拟数据更简单的方式(二十一)

    如果想要了解mock模拟数据的话,请看这篇文章(https://www.cnblogs.com/tugenhua0707/p/9813122.html) 在实际应用场景中,总感觉mock数据比较麻烦, ...

  7. B-Tree外存数据结构 _(B 树)第二部分

    2. B 树 B 树是为了磁盘或其它存储设备而设计的一种多叉(相对于二叉,B树每个内结点有多个分支,即多叉)平衡查找树 一棵B树,一棵关键字为英语中辅音字母的B树,现在要从树中查找字母R(包含n[x] ...

  8. treap学习笔记

    treap是个很神奇的数据结构. 给你一个问题,你可以解决它吗? 这个问题需要treap这个数据结构. 众所周知,二叉查找树的查找效率低的原因是不平衡,而我们又不希望用各种奇奇怪怪的旋转来使它平衡,那 ...

  9. java 数据类型和运算符

    1.注释 单行注释:  //哈哈哈 多行注释: /* 啦啦啦 */ 文档注释: /**    */注释中包含一些说明性的文字及一些JavaDoc标签(后期写项目时,可以生成项目的API)        ...

  10. ELF格式文件分析以及运用

    基于本文的一个实践<使用Python分析ELF文件优化Flash和Sram空间的案例>. 1.背景 ELF是Executable and Linkable Format缩写,其官方规范在& ...