ICLR2016_DELVING DEEPER INTO CONVOLUTIONAL NETWORKS

Note here: Ballas recently proposed a novel framework on learning video representation, following is the review note after reading his paper.

Link: http://arxiv.org/pdf/1511.06432v4.pdf

[Brief introduction to some neural networks]

CNN: excellent in static image classification

RNN: can understand temporal sequences in various learning tasks
(however, with exploding or vanishing weights problem)
---> LSTM/GRU are proposed to avoid this problem

RCN: leverage properties from both CNN and RNN, use CNN top level feature map as input of RNN, it has recently introduced to learn video representations.

[Video reprensentation]

Mmotivation:
Adopt RCN as basic model.
- Top-level feature map presents high sementic features, namely the spatial naunces are ignored after pooling.
- However, frame-to-frame temporal variation is known to be smooth, which is the key for action recognition from videos.
(we need a new model to adapt this problem)

[Proposed models]

GRU-RCN:
- replace recurrent units in RCN with GRU.

(z: activation gate, decides to what degree previous hidden state would contribute to the next hidden state)
(r: reset gate, decides whether or not last hidden state should be propagated into next state)
(~h: candidate hidden state, it'll pass through the activatin gate)
(h: final hidden state)

Problems:
- number of parameters in fully-connected layer is huge due to size of conv map.
- fully-connected layers break the spatial structure of conv map.

Trick:
- replace the fully-connected units in GRU with convolution operations, which can keep spatial structure and reduce number of parameters meanwhile.

Intuition:
- we can see the propagation of hidden states as a process of convolution.
if so, the next hidden state percepts spatial structure of all the previous states. as the sequence goes further, the receptive field on previous states are larger, and we only get a general concept of frames in the beginning.
- compare to our cognition system, it does make sense!

Stacked GRU-RCN:
- it applies L GRU-RCNs independently on each convolutional map.
- tile up L GRU-RCNs.
- feed L final time-step hidden states into a classifier.

【ML】ICLR2016_Delving Deeper into Convolutional Networks的更多相关文章

  1. 【ML】Two-Stream Convolutional Networks for Action Recognition in Videos

    Two-Stream Convolutional Networks for Action Recognition in Videos & Towards Good Practices for ...

  2. 【论文笔记】Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

    Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition 2018-01-28  15:4 ...

  3. 【ML】Predict and Constrain: Modeling Cardinality in Deep Structured Prediction -预测和约束:在深度结构化预测中建模基数

    [论文标题]Predict and Constrain: Modeling Cardinality in Deep Structured Prediction   (35th-ICML,PMLR) [ ...

  4. 【网络结构可视化】Visualizing and Understanding Convolutional Networks(ZF-Net) 论文解析

    目录 0. 论文地址 1. 概述 2. 可视化结构 2.1 Unpooling 2.2 Rectification: 2.3 Filtering: 3. Feature Visualization 4 ...

  5. 【转载】 卷积神经网络(Convolutional Neural Network,CNN)

    作者:wuliytTaotao 出处:https://www.cnblogs.com/wuliytTaotao/ 本作品采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可,欢迎 ...

  6. 【翻译】给初学者的 Neural Networks / 神经网络 介绍

    本文翻译自 SATYA MALLICK 的  "Neural Networks : A 30,000 Feet View for Beginners" 原文链接: https:// ...

  7. 【ML】从特征分解,奇异值分解到主成分分析

    1.理解特征值,特征向量 一个对角阵\(A\),用它做变换时,自然坐标系的坐标轴不会发生旋转变化,而只会发生伸缩,且伸缩的比例就是\(A\)中对角线对应的数值大小. 对于普通矩阵\(A\)来说,是不是 ...

  8. 【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs

    Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...

  9. 【ML】人脸识别

    https://github.com/colipso/face_recognition https://medium.com/@ageitgey/machine-learning-is-fun-par ...

随机推荐

  1. 【PAT】B1051 复数乘法(15 分)

    要会使用math函数, 还要注意到用四舍五入的方法判断是否应该输出0.00 #include <math.h> #include<stdio.h> int main() { d ...

  2. zabbix疑难之时区问题

    zabbix疑难总结: 1.zabbix的web界面的时间不对.晚12个小时整 适用于:安装网上的说法来修改,但是时间仍然不对的情况   我们按照以前的网上的那些方法在配置zabbix,需要配置时区: ...

  3. do-while语句及for语句(初学者)

    1.do-while语句的一般形式为: do 语句 while(表达式): 这个循环与while循环的不同在于:它先执行循环中的语句,然后再判断这个表达式是否为真,如果为真则继续循环:如果为假,则中止 ...

  4. WebService Client Generation Error with JDK8

    java.lang.AssertionError: org.xml.sax.SAXParseException; systemId: jar:file:/path/to/glassfish/modul ...

  5. 美人鱼 hdu 5784

    Peter has a sequence a1,a2,...,ana1,a2,...,an and he define a function on the sequence -- F(a1,a2,.. ...

  6. 简单规划dp sumsets

    Farmer John commanded his cows to search for different sets of numbers that sum to a given number. T ...

  7. eclipse中xml下Namespaces显示不全的解决办法

    1.问题描述: 如图,有时候编写spring相关的xml文件时,使用namepace中显示不全或者完全不显示 2.解决方法: Window —— Spring ——     Beans Support ...

  8. Zabbix安装 Grafana安装

    每天学习一点点 编程PDF电子书免费下载: http://www.shitanlife.com/code 前提: 先需要安装好 lamp环境. 官方文档: https://www.zabbix.com ...

  9. 转载 线程初步了解 - <第一篇>

    操作系统通过线程对程序的执行进行管理,当操作系统运行一个程序的时候,首先,操作系统将为这个准备运行的程序分配一个进程,以管理这个程序所需要的各种资源.在这些资源之中,会包含一个称为主线程的线程数据结构 ...

  10. gulp学习-metamask前端使用

    https://www.gulpjs.com.cn/docs/getting-started/ ,这个是3.9.0版本 后面发现安装的版本是4.0.0,看下面这个: https://github.co ...