Unsupervised Learning of Visual Representations using Videos

Note here: it's a learning note on Prof. Gupta's novel work published on ICCV2015. It's really exciting to know how unsupervised learning method can contribute to learn visual representations! Also, Feifei-Li's group published a paper on video representation using unsupervised method in ICCV2015 almost at the same time! I also wrote a review on it, check it here!

Link: http://arxiv.org/pdf/1505.00687v2.pdf

Motivation:

- Supervised learning is popular for CNN to train an excellent model on various visual problems, while the application of unsupervised learning leaves blank.

- People learn concepts quickly without numerous instances for training, and we learning things in a dynamic, mostly unsupervised environment.

- We’re short of labeled video data to do supervised learning, but we can easily access to tons of unlabeled data through Internet, which can be made use of by unsupervised learning.

Proposed Model:

Target: learning visual representations from videos in an unsupervised way

Key idea: tracking of moving object provides supervision

Brief introduction:

- Objective function (constraint): capture the first patch p1 of a moving object, keep tracking of it and get another patch p2 after several frames, then randomly select a negative patch p- from other places. The idea of objective function constrains the distance of p1 and p2 in feature space should be shorter than distance of p1 and p-

- Selection of tracking patch: using IDT to obtain SURF interest points to find out which part of the frame moves most. Setting threshold on the ratio of SURF interest points to avoid noise and camera motion.

- Tracking: using KCF tracker to track the patch

- Overrall pipline:

Feed triplet into three identical CNN, put two fully-connected layers on the top of pooling-5 layer to project into feature space, then computing the ranking loss to back-propagate the network. (note that: these three CNN shares parameters)

Training strategy:

There’re many empirical details to train a more powerful CNN in this work, however I’m not going to dive into it, only give some brief reviews on some the trick.

- Choose of negative samples:

- Random selection in the first 10 epochs of training

- Hard negative mining in later epochs, we search for all the possible negative patches and choose the top K patches which give maximum loss

* Intuition on the result:

See from the table above, [unsup + fp(3 ensemble)] outperforms other methods on the detection task of bus, car, person and train, but falls far behind on detecting bird, cat, dog and sofa, which may give us some intuitions.

【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos的更多相关文章

  1. 【CV】ICCV2015_Unsupervised Learning of Spatiotemporally Coherent Metrics

    Unsupervised Learning of Spatiotemporally Coherent Metrics Note here: it's a learning note on the to ...

  2. 【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs

    Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...

  3. 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction

    Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...

  4. 【翻译】我钟爱的Visual Studio前端开发工具/扩展

    原文:[翻译]我钟爱的Visual Studio前端开发工具/扩展 怎么样让Visual Studio更好地编写HTML5, CSS3, JavaScript, jQuery,换句话说就是如何更好地做 ...

  5. 论文解读(SimCLR)《A Simple Framework for Contrastive Learning of Visual Representations》

    1 题目 <A Simple Framework for Contrastive Learning of Visual Representations> 作者: Ting Chen, Si ...

  6. A Simple Framework for Contrastive Learning of Visual Representations

    目录 概 主要内容 流程 projection head g constractive loss augmentation other 代码 Chen T., Kornblith S., Norouz ...

  7. ZH奶酪:【阅读笔记】Deep Learning, NLP, and Representations

    中文译文:深度学习.自然语言处理和表征方法 http://blog.jobbole.com/77709/ 英文原文:Deep Learning, NLP, and Representations ht ...

  8. 【RS】CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Collaborative Filtering-CoupledCF:在推荐系统深度协作过滤中学习显式和隐式的用户物品耦合

    [论文标题]CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Colla ...

  9. 【RS】List-wise learning to rank with matrix factorization for collaborative filtering - 结合列表启发排序和矩阵分解的协同过滤

    [论文标题]List-wise learning to rank with matrix factorization for collaborative filtering   (RecSys '10 ...

随机推荐

  1. safari 与 chrome 的小区别大BUG

    safari 与 chrome 的小区别大BUG 时间:2016-11-01 17:33:19 作者:zhongxia 原文地址:https://github.com/zhongxia245/blog ...

  2. Java JDK1.5、1.6、1.7新特性整理

    转载请注明出处:http://www.cnblogs.com/tony-yang-flutter 一.Java JDK1.5的新特性 1.泛型: List<String> strs = n ...

  3. Tomcat 下配置一个ip绑定多个域名

    原文:http://pkblog.blog.sohu.com/68921246.html 在网上找了半天也没找到相关的资料,都说的太含糊.本人对tomcat下配置 一ip对多域名的方法详细如下,按下面 ...

  4. 写给spring版本的那些事儿

    1.远程调用rmi协议 Exception in thread "main" java.rmi.UnmarshalException: error unmarshalling re ...

  5. [Jsoi2015]染色问题

    题目 看到这个限制条件有点多,我们就一直容斥好了 先容斥颜色,我们枚举至少不用\(i\)种颜色 再容斥列,我们枚举至少不用\(j\)列 最后容斥行,枚举至少不用\(k\)行 容斥系数显然是\((-1) ...

  6. 【转】字符编码笔记:ASCII、Unicode、UTF-8 和 Base64

    1. ASCII码 我们知道,在计算机内部,所有的信息最终都表示为一个二进制的字符串.每一个二进制位(bit)有0和1两种状态,因此八个二进制位就可以组合出256种状态(-128~127),这被称为一 ...

  7. 【转】理解js中的原型链,prototype与__proto__的关系

    说到prototype,就不得不先说下new的过程. 我们先看看这样一段代码: 1 <script type="text/javascript"> 2 var Pers ...

  8. laravel orm进行增删改查

    https://laravelacademy.org/post/9699.html 建议用DB门面直接操作数据库,因为ORM性能低.数据查询上面,ORM不会比DB差的,就比如with,是用了sql最基 ...

  9. leetcode 338. Counting Bits,剑指offer二进制中1的个数

    leetcode是求当前所有数的二进制中1的个数,剑指offer上是求某一个数二进制中1的个数 https://www.cnblogs.com/grandyang/p/5294255.html 第三种 ...

  10. MySQL 基础十 性能优化

    1.优化sql 2.建立索引 3.优化表结构,避免多个join查询