A Discriminative CNN Video Representation for Event Detection

Note here: it's a learning note on the topic of video representation, based on the paper below.

Link: http://arxiv.org/pdf/1411.4006v1.pdf

Motivation:

The use of improved Dense Trajectories (IDT) has led good performance on the task of event detection, while the performance of CNN based video representation is worse than that. The author argues the following three main reasons:

  • Lack of labeled video data to train good models.
  • Video level event labels are too coarse to finetune a pre-trained model for adapting the event detection task.
  • The use of average pooling to generate a discriminative video level representation from CNN frame level descriptors works worse than hand-crafted features like IDT.

Proposed Model:

This paper proposes a model mainly targetting at the third problem, namely how to build a cost-efficient and discrimintive video representation based on CNN.

1)     Firstly, we should extract frame-level descriptor:

We adopt M filters in the last convolutional layers as M latent concept classifiers. Each convolutional filter is corresponding to one latent concept, and each of it will apply on different location of the frame. So we’ll get responses of discrimintive latent concepts on different locations of the frame.

After that, we apply max-pooling operation on all concepts descriptors and concatenate different responses at the same location to form vectors each of which containing various concepts descriptions at this location.

By now, we’ve extract frame-level features.

(Actually, they didn’t do anything special at this step, they just give a new illustration of responses in CNN and rearrange those responses for further process.)

2)     Secondly, we need to encode a discrimintive video-level descriptor from all these frame-level descriptors:

They introduce and compare three different encoding methods in the paper.

However, as I’m not proficient in the mathematical meanings of them, I can just give a briefly look at them instead of going further.

Through experiment, they find out VLAD is better than other encoding methods. (You can refer to the paper for details about that experiment.)

"This is the first work on the video pooling of CNN descriptors and we broaden the encoding methods from local descriptors to CNN descriptors in video analysis."

(That's the takeaway in their work. They're the first to apply these encoding methods on the CNN descriptors. Previously, most of the works utilize Fisher Vector Encoding to encode a general feature of an image from local descriptors like HOG, SIFT, HOF and so on.)

3)     Lastly, we get a video-level descriptor and feed it into a SVM to do detection task.

Two Tricks:

1)     Spatial Pyramid Pooling: they apply four different CNN max-pooling operations to give more spatial locations for a single frame, which makes the descriptor more discrimintive. And that’s also more cost-friendly than applying spatial pyramid on raw frame.

2)     Representation Compression: they do Product Quntization to compress the final representation while still maintain or even slightly improve the original performance.

【CV】CVPR2015_A Discriminative CNN Video Representation for Event Detection的更多相关文章

  1. 论文阅读(Weilin Huang——【TIP2016】Text-Attentional Convolutional Neural Network for Scene Text Detection)

    Weilin Huang--[TIP2015]Text-Attentional Convolutional Neural Network for Scene Text Detection) 目录 作者 ...

  2. 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos

    Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...

  3. 【PSMA】Progressive Sample Mining and Representation Learning for One-Shot Re-ID

    目录 主要挑战 主要的贡献和创新点 提出的方法 总体框架与算法 Vanilla pseudo label sampling (PLS) PLS with adversarial learning Tr ...

  4. 【CV】ICCV2015_Learning Temporal Embeddings for Complex Video Analysis

    Learning Temporal Embeddings for Complex Video Analysis Note here: it's a review note on novel work ...

  5. 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction

    Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...

  6. 【CV】ICCV2015_Unsupervised Learning of Spatiotemporally Coherent Metrics

    Unsupervised Learning of Spatiotemporally Coherent Metrics Note here: it's a learning note on the to ...

  7. 【CV】ICCV2015_Describing Videos by Exploiting Temporal Structure

    Describing Videos by Exploiting Temporal Structure Note here: it's a learning note on the topic of v ...

  8. 【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs

    Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...

  9. 【实战问题】【3】iPhone无法播放video标签中的视频

    问题:视频都是MP4格式,视频可以在手机上正常播放.video标签中的视频在安卓点击可以播放,但在iPhone无法播放 解决方案: 1,视频编码格式问题,具体iPhone手机支持的是哪些格式可见官方的 ...

随机推荐

  1. Html引入百度富文本编辑器ueditor

    在日常工作用,肯定有用到富文本编辑器的时候,富文本编辑器功能强大使用方便,我用的是百度富文本编辑器,首先需要下载好百度编辑器的demo, 然后创建ueditor.html文件,引入百度编辑器,然后在h ...

  2. [BZOJ 3829][POI2014] FarmCraft

    先贴一波题面... 3829: [Poi2014]FarmCraft Time Limit: 20 Sec  Memory Limit: 128 MBSubmit: 421  Solved: 197[ ...

  3. kafka的Java客户端示例代码(kafka_2.11-0.8.2.2)

    1.使用Producer API发送消息到Kafka 从版本0.9开始被KafkaProducer替代. HelloWorldProducer.java package cn.ljh.kafka.ka ...

  4. postgresql中uuid的使用

    本文总共介绍两种方法 : 1.使用create extension命令 create extension "uuid-ossp" 安装扩展成功以后,就可以通过uuid_genera ...

  5. Linter pylint is not installed

    问题 Linter 'pylint' is not installed. Please install it or select another linter". Error: Module ...

  6. 有关科学计算方面的python解决

    在科学计算方面,一般觉得matlab是一个超强的东西.此外还有R. 至于某种语言来说,一般都要讲究一些特别的算法,包含但不限于: 矩阵方面的计算 指数计算 对数计算 多项式运算 各类方程求解 总之.仅 ...

  7. 统计硬币 HDU - 2566 (三种解法:线性代数解法,背包解法,奇思妙想解法 >_< )

    题号放这里自己去找吧. HDU-2566 这题最开始用的dp,然后,被同学用奇思妙想过了.  >_<  开心! -_- !! 然后,被我线性代数给过了. 方法一:dp 将其化为01背包,只 ...

  8. 最近的linux工作记录

    最近的linux工作记录 最近公司走了一些同事,部分服务器交到了我的手里,总结一些常用的操作 注:大写的字符串一般是用来占位,需要替换 创建账户和使用密钥对登陆 1,账户系列 useradd 选项 用 ...

  9. go标准库的学习-mime

    参考:https://studygolang.com/pkgdoc 导入方法: import "mime" mime实现了MIME的部分规定. 什么是MIME: MIME(Mult ...

  10. PAT A1124 Raffle for Weibo Followers (20 分)——数学题

    John got a full mark on PAT. He was so happy that he decided to hold a raffle(抽奖) for his followers ...