【CV】ICCV2015_Learning Temporal Embeddings for Complex Video Analysis
Learning Temporal Embeddings for Complex Video Analysis
Note here: it's a review note on novel work from Feifei-Li's group about video representations, published on ICCV2015.
Motivation:
- Labeled video data is short for learning video representations, we need an unsupervised way.
- Context(temporal structure) is significant for video representations.
Proposed model:
- give one query frame, we can predict corresponding context representations(embeddings) of it through this model.
- Pipline:

\(f_{vj}(s_{vj};w_{e})\): embedding function
(\(W_{e}\) is the only parameter here we need to train for)
- Training:
\(h_{vj}=\frac{1}{2T}\sum_{t=1}^T(f_{vj+t}+f_{vj-t})\): context vector
Unsupervised learning objective (SVM Loss):
\(J(W_{e})=\sum_{v\in V}\sum_{S_{vj\in V},S\neq S_{vj}}max(0,1-(f_{vj}-f_{\_})\cdot h_{vj})\)
(\(f_{vj}\) is the embedding of frame \(S_{vj}\))
(\(f_{\_}\) is a negative frame which is not highly relevant to \(S_{vj}\))
(\(h_{vj}\) is the context embedding of frame \(S_{vj}\))
We’ll go further into the choosing of negative frames and context range later.
Intuition:
This model momorizes the context of specific frame. It utilizes the spatial appearance of the frame to form an embedding vector, which infers its context information.
Spatial feature learned from CNN \(\xrightarrow{\;\;\;W_{e}\;\;projection\;\;\;}\) Temporal feature embeds context
(\(W_{e}\) memorizes the temporal pattern during training)
With the temporal structure, even though some frames are not appearance similar, they can also be near in the feature space as long as they share similar context. Like following:

There’re two takeaways in the training process:
- Multi-resolution sampling: it’s hard to decide a generic context range(T), for videos own different paces, some may be quick while some are slow. This paper proposed a multi-resolution sampling strategy, instead of only sampling the context with same frame gap, it sampling with various gap lengths. That’s a trade-off between semantic relatedness and visual variaty.

- Hard Negative: choosing of negative samples are important for a robust model. It’s natural to come up with sampling negative frames in other videos and context frames from the same video, but this may cause the model overfit for some video-specific, less sementic properties, like lighting, camera characteristics and background. As a result, this paper also samples negative frames that are out of context range from the same video to avoid this problem.
【CV】ICCV2015_Learning Temporal Embeddings for Complex Video Analysis的更多相关文章
- 【CV】ICCV2015_Describing Videos by Exploiting Temporal Structure
Describing Videos by Exploiting Temporal Structure Note here: it's a learning note on the topic of v ...
- 【转载】Hierarchal Temporal Memory (HTM)
最近在看机器学习,看能否根据已有的历史来预测Hardware的故障发生概率.下文是一篇很有意思的文章,转自 http://numenta.org/htm.html. NuPIC是一个开源项目,用来实现 ...
- 【CV】ICCV2015_Unsupervised Learning of Spatiotemporally Coherent Metrics
Unsupervised Learning of Spatiotemporally Coherent Metrics Note here: it's a learning note on the to ...
- 【DB2】SQL0437W Performance for this complex query may be sub-optimal
参考链接 Technote (troubleshooting) Problem(Abstract) Error [IBM][CLI Driver][DB2/6000] SQL0437W Perform ...
- 【CV】CVPR2015_A Discriminative CNN Video Representation for Event Detection
A Discriminative CNN Video Representation for Event Detection Note here: it's a learning note on the ...
- 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction
Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...
- 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos
Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...
- 【题解】[USACO12JAN]视频游戏的连击Video Game Combos
好久没有写博客了,好惭愧啊……虽然这是一道弱题但还是写一下吧. 这道题目的思路应该说是很容易形成:字符串+最大值?自然联想到学过的AC自动机与DP.对于给定的字符串建立出AC自动机,dp状态dp[i] ...
- 【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs
Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...
随机推荐
- iOS时间显示今天昨天
一.前言 今天无意间想起写这个功能,仔细考虑了一下,其实很简单,整体思路如下: 先获取你所要转换的时间的年月日,然后再获取今天和昨天的年月日,然后对比,进而返回不同的字符串. 二.实现步骤 首先,我们 ...
- 基于C#的单元测试(VS2015)
这次来联系怎么用VS2015来进行C#代码的单元测试管理,首先,正好上次写了一个C#的WordCount程序,就用它来进行单元测试联系吧. 首先,根据VS2015的提示,仅支持在共有类或共有方法中支持 ...
- python免费发送短信
pip install twilio from twilio.rest import Client # Your Account SID from twilio.com/console account ...
- vue_02 开发过程中的问题记载
1.package.json 运行 npm start 执行的是npm run dev 实际上执行的是“dev” : node build/dev-server.js这一条 跑的是build目录下d ...
- 【TJOJI\HEOI2016】求和
[TJOI/HEOI2016]求和 这题好难啊!! 斯特林数+NTT. 首先我们将第二类斯特林数用容斥展开,具体原理不解释了. \(\displaystyle S(i,j)=\frac{1}{j!}\ ...
- (12)Python异常
- 洛谷P1144 最短路计数
题目描述 给出一个N个顶点M条边的无向无权图,顶点编号为1-N.问从顶点1开始,到其他每个点的最短路有几条. 输入输出格式 输入格式: 输入第一行包含2个正整数N,M,为图的顶点数与边数. 接下来M行 ...
- PHP的curl查看header信息的功能(包括查看返回header和请求header)
PHP的curl功能十分强大,简单点说,就是一个PHP实现浏览器的基础. 最常用的可能就是抓取远程数据或者向远程POST数据.但是在这个过程中,调试时,可能会有查看header的必要. 如下: ech ...
- xiaowuga poj3735—Training little cats(特殊操作转化为矩阵操作)
题意:有n只猫,对其进行k次操作,然后反复这样操作m次. 其中g 表示 i 猫加1, e表示 i 猫为0:s表示 i 与 j 猫互换. 解释一下样例: 3 1 6g 1g 2g 2s 1 2g 3e ...
- ROS 订阅图像节点(1)
博客 http://blog.csdn.net/github_30605157/article/details/50990493 参考ROS原网站 http://wiki.ros.org/image_ ...