Unsupervised Learning of Video Representations using LSTMs

Note here: it's a learning notes on new LSTMs architecture used as an unsupervised learning way of video representations.

(More unsupervised learning related topics, you can refer to:

Learning Temporal Embeddings for Complex Video Analysis

Unsupervised Learning of Visual Representations using Videos

Unsupervised Visual Representation Learning by Context Prediction)

Link: http://arxiv.org/abs/1502.04681

Motivation:

- Understanding temporal sequences is important for solving many video related problems. We should utilize temporal structure of videos as a supervisory signal for unsupervised learning.

Proposed model:

In this paper, the author proposed three models based on LSTM:

1) LSTM Autoencoder Model:

  This model is composed of two parts, the encoder and the decoder.

  The encoder accepts sequences of frames as input, and the learned representation generated from encoder are copied to decoder as initial input. Then the decoder should reconstruct similar images like input frames in reverse order.

  (This is called unconditional version, while a conditional version receives last generated output of decoder as input, shown as the dashed boxes below)

Intuition: The reconstruction work requires the network to capture information about the appearance of objects and the background, this is exactly the information that we would like the representation to contain.

2) LSTM Future Predictor Model:

  This model is similar with the one above. The main difference lies in the output. Output of this model is the prediction of frames that come just after the input sequences. It also varies with conditional/unconditional versions just like the description above.

Intuition: In order to predict the next few frames correctly, the model needs information about which objects are present and how they are moving so that the motion can be extrapolated.

3) A Composite Model:

  This model combines "input reconstruction" and "future prediction" together to form a more powerful model. These two modules share a same encoder, which encodes input sequences into a feature vector and copy them to different decoders.

Intuition: this only encoder learns representations that contain not only static appearance of objects&background, but also the dynamic informations like moving objects and their moving pattern.

【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs的更多相关文章

  1. 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos

    Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...

  2. 论文阅读笔记(三)【AAAI2017】:Learning Heterogeneous Dictionary Pair with Feature Projection Matrix for Pedestrian Video Retrieval via Single Query Image

    Introduction (1)IVPR问题: 根据一张图片从视频中识别出行人的方法称为 image to video person re-id(IVPR) 应用: ① 通过嫌犯照片,从视频中识别出嫌 ...

  3. ZH奶酪:【阅读笔记】Deep Learning, NLP, and Representations

    中文译文:深度学习.自然语言处理和表征方法 http://blog.jobbole.com/77709/ 英文原文:Deep Learning, NLP, and Representations ht ...

  4. 【ML】Two-Stream Convolutional Networks for Action Recognition in Videos

    Two-Stream Convolutional Networks for Action Recognition in Videos & Towards Good Practices for ...

  5. 【ML】ICLR2016_Delving Deeper into Convolutional Networks

    ICLR2016_DELVING DEEPER INTO CONVOLUTIONAL NETWORKS Note here: Ballas recently proposed a novel fram ...

  6. 【RS】CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Collaborative Filtering-CoupledCF:在推荐系统深度协作过滤中学习显式和隐式的用户物品耦合

    [论文标题]CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Colla ...

  7. 【RS】List-wise learning to rank with matrix factorization for collaborative filtering - 结合列表启发排序和矩阵分解的协同过滤

    [论文标题]List-wise learning to rank with matrix factorization for collaborative filtering   (RecSys '10 ...

  8. 【RS】Deep Learning based Recommender System: A Survey and New Perspectives - 基于深度学习的推荐系统:调查与新视角

    [论文标题]Deep Learning based Recommender System: A Survey and New Perspectives ( ACM Computing Surveys  ...

  9. 【ML】Predict and Constrain: Modeling Cardinality in Deep Structured Prediction -预测和约束:在深度结构化预测中建模基数

    [论文标题]Predict and Constrain: Modeling Cardinality in Deep Structured Prediction   (35th-ICML,PMLR) [ ...

随机推荐

  1. 搜索插入位置的golang实现

    给定一个排序数组和一个目标值,在数组中找到目标值,并返回其索引.如果目标值不存在于数组中,返回它将会被按顺序插入的位置. 你可以假设数组中无重复元素. 输入: [,,,], 输出: 输入: [,,,] ...

  2. jQuery的收尾

    一  后台管理布局增删改 二  常用事件 三  jQuery扩展 一  后台管理布局增删改(多种方法) <!DOCTYPE html> <!-- saved from url=(00 ...

  3. json.decoder.JSONDecodeError: Unexpected UTF-8 BOM (decode using utf-8-sig): line 1 column 1

    问题描述:使用Python代码将txt城市列表文件转换为xls文件,源码如下, #!/usr/bin/env Python # coding=utf-8 import os import json i ...

  4. ORB-SLAM2(2) ROS下配置和编译

    1配置USB相机 1.1网友参考: http://www.liuxiao.org/2016/07/ubuntu-orb-slam2-%E5%9C%A8-ros-%E4%B8%8A%E7%BC%96%E ...

  5. hadoop学习笔记壹 --环境搭建及配置文件的修改

    Hadoop生态和其他生态最大的不同之一就是“单一平台多种应用”的理念了. hadoop能解决是什么问题: 1.HDFS :海量数据存储 MapReduce: 海量数据分析   YARN :资源管理调 ...

  6. LoadRunner 11安装Micosoft Visual C++ 2005 SP1时提示命令行选项语法错误

    如果安装LoadRunner 11时弹窗提示"Micosoft Visual C++ 2005 SP1 可再发行组件包(X86):'命令行选项语法错误.键入命令 / ? 可获得帮助信息'&q ...

  7. Qt+QGis二次开发:加载栅格图层和矢量图层

    一.加载栅格图像 加载栅格图像的详细步骤在下面代码里: //添加栅格数据按钮槽函数 void MainWindow::addRasterlayers() { //步骤1:打开文件选择对话框 QStri ...

  8. CVE-2018-14424 use-after-free of disposed transient displays 分析报告

    漏洞描述 GDM守护进程不能正确的取消导出在D-Bus 接口上已经被销毁的display对象,这造成本地用户可以触发UAF,从而使系统崩溃或造成任意代码执行. 调试环境 gdm版本: 3.14.2(通 ...

  9. Echo团队Alpha冲刺随笔 - 第一天

    项目冲刺情况 进展 每个人开始搭建自己要用的各种框架.库,基本实现了登录功能 问题 除了框架使用问题外,暂未遇到其他疑难杂症 心得 今天有一个还可以的开头,相信后续会挺顺利的 今日会议内容 黄少勇 今 ...

  10. 关于Nginx

    访问 www.a.com 自动跳到 www.b.com(301跳转设置) server { listen 80; server_name www.a.com; rewrite ^/(.*)$  htt ...