【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos

Unsupervised Learning of Visual Representations using Videos

Note here: it's a learning note on Prof. Gupta's novel work published on ICCV2015. It's really exciting to know how unsupervised learning method can contribute to learn visual representations! Also, Feifei-Li's group published a paper on video representation using unsupervised method in ICCV2015 almost at the same time! I also wrote a review on it, check it here!

Link: http://arxiv.org/pdf/1505.00687v2.pdf

Motivation:

- Supervised learning is popular for CNN to train an excellent model on various visual problems, while the application of unsupervised learning leaves blank.

- People learn concepts quickly without numerous instances for training, and we learning things in a dynamic, mostly unsupervised environment.

- We’re short of labeled video data to do supervised learning, but we can easily access to tons of unlabeled data through Internet, which can be made use of by unsupervised learning.

Proposed Model:

Target: learning visual representations from videos in an unsupervised way

Key idea: tracking of moving object provides supervision

Brief introduction:

- Objective function (constraint): capture the first patch p1 of a moving object, keep tracking of it and get another patch p2 after several frames, then randomly select a negative patch p- from other places. The idea of objective function constrains the distance of p1 and p2 in feature space should be shorter than distance of p1 and p-

- Selection of tracking patch: using IDT to obtain SURF interest points to find out which part of the frame moves most. Setting threshold on the ratio of SURF interest points to avoid noise and camera motion.

- Tracking: using KCF tracker to track the patch

- Overrall pipline:

Feed triplet into three identical CNN, put two fully-connected layers on the top of pooling-5 layer to project into feature space, then computing the ranking loss to back-propagate the network. (note that: these three CNN shares parameters)

Training strategy:

There’re many empirical details to train a more powerful CNN in this work, however I’m not going to dive into it, only give some brief reviews on some the trick.

- Choose of negative samples:

- Random selection in the first 10 epochs of training

- Hard negative mining in later epochs, we search for all the possible negative patches and choose the top K patches which give maximum loss

* Intuition on the result:

See from the table above, [unsup + fp(3 ensemble)] outperforms other methods on the detection task of bus, car, person and train, but falls far behind on detecting bird, cat, dog and sofa, which may give us some intuitions.

【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos的更多相关文章

【CV】ICCV2015_Unsupervised Learning of Spatiotemporally Coherent Metrics
Unsupervised Learning of Spatiotemporally Coherent Metrics Note here: it's a learning note on the to ...
【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs
Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...
【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction
Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...
【翻译】我钟爱的Visual Studio前端开发工具/扩展
原文:[翻译]我钟爱的Visual Studio前端开发工具/扩展怎么样让Visual Studio更好地编写HTML5, CSS3, JavaScript, jQuery,换句话说就是如何更好地做 ...
论文解读（SimCLR）《A Simple Framework for Contrastive Learning of Visual Representations》
1 题目 <A Simple Framework for Contrastive Learning of Visual Representations> 作者: Ting Chen, Si ...
A Simple Framework for Contrastive Learning of Visual Representations
目录概主要内容流程 projection head g constractive loss augmentation other 代码 Chen T., Kornblith S., Norouz ...
ZH奶酪：【阅读笔记】Deep Learning, NLP, and Representations
中文译文:深度学习.自然语言处理和表征方法 http://blog.jobbole.com/77709/ 英文原文:Deep Learning, NLP, and Representations ht ...
【RS】CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Collaborative Filtering-CoupledCF：在推荐系统深度协作过滤中学习显式和隐式的用户物品耦合
[论文标题]CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Colla ...
【RS】List-wise learning to rank with matrix factorization for collaborative filtering - 结合列表启发排序和矩阵分解的协同过滤
[论文标题]List-wise learning to rank with matrix factorization for collaborative filtering (RecSys '10 ...

随机推荐

ubuntu服务器配置
首先设置Ubuntu更新源 https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/ sudo cp /etc/apt/sources.list /etc/a ...
Java引用类型转换
java的引用类型转换分为两种: 向上类型转换,是小类型到大类型的转换向下类型转换,是大类型到小类型的转换现存在一个Animal动物类,猫子类和狗子类继承于Animal父类: 1 public c ...
拓普微智能TFT液晶显示模块
关键词: 串口屏, 液晶屏, TFT,人机界面概述: 智能模块(Smart LCD)是专为工业显示应用而设计的TFT液晶显示模块. 模块自带主控IC.Flash存储器.实时嵌入式操作系统,客户主机可 ...
踏得网互联网新技术垂直搜索服务和分享 - HTML5动效/特效/动画搜索
版权声明:本文为博主原创文章,未经博主同意不得转载. https://blog.csdn.net/iefreer/article/details/34917729 当前主流搜索引擎在解决互联网技术创意 ...
转://Oracle 11gR2 RAC ASM磁盘全部丢失后的恢复
一.环境描述 (1)Oracle 11.2.0.3 RAC ON Oracle Linux 6 x86_64,只有一个ASM外部冗余磁盘组--DATA: (2)OCR,VOTEDISK,DATAFIL ...
JQuery获取touchstart,touchmove,touchend坐标
$('#id').on('touchstart',function(e) { ].pageX; }); JQuery如上. document.getElementById("id" ...
离线安装Cloudera Manager 5和CDH5(最新版5.9.3) 完全教程（七）界面安装
一.安装过程 1.1 登录 1.2 接受许可协议 1.3 选择免费版本 1.4 选择下一步 1.5 选择当前管理的主机 1.6 选择使用Parcel安装,选择CDH版本,点击继续 1.7 等待安装此 ...
QT 14 线程使用
1 线程基础 QThread 是对本地平台线程的一个非常好的跨平台抽象.启动一个线程非常简单.让我们看一段代码,它产生另一个线程,该线程打印hello,然后退出. // hellothread/hel ...
day06数据类型----元组、字典、集合
一.元组(tuple): python中将一些不能修改的值称为不可变的,而不可变的列表则被称之为元组. 注意元组一旦被定义则不可修改,因此一般我们不定义空元组. 元组是有序的,可存放多个数据| ...
Android程序的反破解技术
Android 程序的破解一般步骤如下:反编译.静态分析.动态调试.重编译.我们可以从这几个步骤着手反破解反编译我们可以查找反编译器的漏洞,从而使反编译器无法正确解析APK文件静态分析对jav ...

【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos

【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos的更多相关文章

随机推荐

热门专题