Learning Temporal Embeddings for Complex Video Analysis

Note here: it's a review note on novel work from Feifei-Li's group about video representations, published on ICCV2015.

Link: http://www.cv-foundation.org/openaccess/content_iccv_2015/html/Ramanathan_Learning_Temporal_Embeddings_ICCV_2015_paper.html

Motivation:

- Labeled video data is short for learning video representations, we need an unsupervised way.

- Context(temporal structure) is significant for video representations.

Proposed model:

- give one query frame, we can predict corresponding context representations(embeddings) of it through this model.

- Pipline:

\(f_{vj}(s_{vj};w_{e})\): embedding function

(\(W_{e}\) is the only parameter here we need to train for)

- Training:

\(h_{vj}=\frac{1}{2T}\sum_{t=1}^T(f_{vj+t}+f_{vj-t})\): context vector

Unsupervised learning objective (SVM Loss):

\(J(W_{e})=\sum_{v\in V}\sum_{S_{vj\in V},S\neq S_{vj}}max(0,1-(f_{vj}-f_{\_})\cdot h_{vj})\)

(\(f_{vj}\) is the embedding of frame \(S_{vj}\))

(\(f_{\_}\) is a negative frame which is not highly relevant to \(S_{vj}\))

(\(h_{vj}\) is the context embedding of frame \(S_{vj}\))

We’ll go further into the choosing of negative frames and context range later.

Intuition:

This model momorizes the context of specific frame. It utilizes the spatial appearance of the frame to form an embedding vector, which infers its context information.

Spatial feature learned from CNN \(\xrightarrow{\;\;\;W_{e}\;\;projection\;\;\;}\) Temporal feature embeds context

(\(W_{e}\) memorizes the temporal pattern during training)

With the temporal structure, even though some frames are not appearance similar, they can also be near in the feature space as long as they share similar context. Like following:

There’re two takeaways in the training process:

- Multi-resolution sampling: it’s hard to decide a generic context range(T), for videos own different paces, some may be quick while some are slow. This paper proposed a multi-resolution sampling strategy, instead of only sampling the context with same frame gap, it sampling with various gap lengths. That’s a trade-off between semantic relatedness and visual variaty.

- Hard Negative: choosing of negative samples are important for a robust model. It’s natural to come up with sampling negative frames in other videos and context frames from the same video, but this may cause the model overfit for some video-specific, less sementic properties, like lighting, camera characteristics and background. As a result, this paper also samples negative frames that are out of context range from the same video to avoid this problem.

【CV】ICCV2015_Learning Temporal Embeddings for Complex Video Analysis的更多相关文章

  1. 【CV】ICCV2015_Describing Videos by Exploiting Temporal Structure

    Describing Videos by Exploiting Temporal Structure Note here: it's a learning note on the topic of v ...

  2. 【转载】Hierarchal Temporal Memory (HTM)

    最近在看机器学习,看能否根据已有的历史来预测Hardware的故障发生概率.下文是一篇很有意思的文章,转自 http://numenta.org/htm.html. NuPIC是一个开源项目,用来实现 ...

  3. 【CV】ICCV2015_Unsupervised Learning of Spatiotemporally Coherent Metrics

    Unsupervised Learning of Spatiotemporally Coherent Metrics Note here: it's a learning note on the to ...

  4. 【DB2】SQL0437W Performance for this complex query may be sub-optimal

    参考链接 Technote (troubleshooting) Problem(Abstract) Error [IBM][CLI Driver][DB2/6000] SQL0437W Perform ...

  5. 【CV】CVPR2015_A Discriminative CNN Video Representation for Event Detection

    A Discriminative CNN Video Representation for Event Detection Note here: it's a learning note on the ...

  6. 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction

    Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...

  7. 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos

    Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...

  8. 【题解】[USACO12JAN]视频游戏的连击Video Game Combos

    好久没有写博客了,好惭愧啊……虽然这是一道弱题但还是写一下吧. 这道题目的思路应该说是很容易形成:字符串+最大值?自然联想到学过的AC自动机与DP.对于给定的字符串建立出AC自动机,dp状态dp[i] ...

  9. 【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs

    Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...

随机推荐

  1. QSetting的值不能保存。

    最近在使用QSetting的时候,setting的值死活保存不下来,后来添加了如何设置后,settting的可以获取到. QCoreApplication::setOrganizationName(& ...

  2. 【PAT】B1067 试密码(20 分)

    注意读取时的换行符用getchar吸收 第十个错误后直接输出锁定 #include<cstdio> #include<string.h> int main(){ char mi ...

  3. ccf-20161203--权限查询

    这题我的思路是将用户直接与他的权限联系起来.比如: 用户 角色 权限 Alice hr crm:2直接转变为:Alice: crm:2 题目与代码如下: 问题描述 试题编号: 201612-3 试题名 ...

  4. Eclipse中定位当前文件在项目中的位置

    点击红色框内的按钮,就能定位当前文件在项目中的位置, 另外, 找到位置后记得再点击一下这个按钮, 要不然每次打开一个文件都会自动定位

  5. oracle使用with as提高查询效率

    经常在开发过程中会用到视图或组合查询的情况,但由于涉及表数据经常达到千万级别的笛卡尔积,而且一段查询时会反复调用,但结果输出往往不需要那么多,可以使用with将过滤或处理后的结果先缓存到临时表(此处原 ...

  6. React框架简介

    React的基本认识 Facebook开源的一个js库,一个用来动态构建用户界面的js库 英文官网,中文官网 React的特点 Declarative(声明式编码),Component-Based(组 ...

  7. 泰泽智能电视(Tizen smart TV)问世

        6月2日至4日,泰泽开发人员大会(TDC)在美国洛杉矶举行,会上韓国三星公司展出了一台泰泽智能电视(原型机).        智能电视(Smart TV not to be confused ...

  8. 阿里巴巴Java开发手册要点笔记 (一)

    1:[强制]Object 的 equals 方法容易抛空指针异常,应使用常量或确定有值的对象来调用 equals. 正例:"test".equals(object); 反例:obj ...

  9. Java 缓存技术之 ehcache

    1. EHCache 的特点,是一个纯Java ,过程中(也可以理解成插入式)缓存实现,单独安装Ehcache ,需把ehcache-X.X.jar 和相关类库方到classpath中.如项目已安装了 ...

  10. jQuery添加自定义函数的三种方法

    原文链接:http://caibaojian.com/284.html 方法一: jQuery.fn.setApDiv=function () { //apDiv浮动层显示位置居中控制 var whe ...