【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos
Unsupervised Learning of Visual Representations using Videos
Note here: it's a learning note on Prof. Gupta's novel work published on ICCV2015. It's really exciting to know how unsupervised learning method can contribute to learn visual representations! Also, Feifei-Li's group published a paper on video representation using unsupervised method in ICCV2015 almost at the same time! I also wrote a review on it, check it here!
Link: http://arxiv.org/pdf/1505.00687v2.pdf
Motivation:
- Supervised learning is popular for CNN to train an excellent model on various visual problems, while the application of unsupervised learning leaves blank.
- People learn concepts quickly without numerous instances for training, and we learning things in a dynamic, mostly unsupervised environment.
- We’re short of labeled video data to do supervised learning, but we can easily access to tons of unlabeled data through Internet, which can be made use of by unsupervised learning.
Proposed Model:
Target: learning visual representations from videos in an unsupervised way
Key idea: tracking of moving object provides supervision
Brief introduction:
- Objective function (constraint): capture the first patch p1 of a moving object, keep tracking of it and get another patch p2 after several frames, then randomly select a negative patch p- from other places. The idea of objective function constrains the distance of p1 and p2 in feature space should be shorter than distance of p1 and p-
- Selection of tracking patch: using IDT to obtain SURF interest points to find out which part of the frame moves most. Setting threshold on the ratio of SURF interest points to avoid noise and camera motion.
- Tracking: using KCF tracker to track the patch
- Overrall pipline:
Feed triplet into three identical CNN, put two fully-connected layers on the top of pooling-5 layer to project into feature space, then computing the ranking loss to back-propagate the network. (note that: these three CNN shares parameters)
Training strategy:
There’re many empirical details to train a more powerful CNN in this work, however I’m not going to dive into it, only give some brief reviews on some the trick.
- Choose of negative samples:
- Random selection in the first 10 epochs of training
- Hard negative mining in later epochs, we search for all the possible negative patches and choose the top K patches which give maximum loss
* Intuition on the result:
See from the table above, [unsup + fp(3 ensemble)] outperforms other methods on the detection task of bus, car, person and train, but falls far behind on detecting bird, cat, dog and sofa, which may give us some intuitions.
【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos的更多相关文章
- 【CV】ICCV2015_Unsupervised Learning of Spatiotemporally Coherent Metrics
Unsupervised Learning of Spatiotemporally Coherent Metrics Note here: it's a learning note on the to ...
- 【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs
Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...
- 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction
Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...
- 【翻译】我钟爱的Visual Studio前端开发工具/扩展
原文:[翻译]我钟爱的Visual Studio前端开发工具/扩展 怎么样让Visual Studio更好地编写HTML5, CSS3, JavaScript, jQuery,换句话说就是如何更好地做 ...
- 论文解读(SimCLR)《A Simple Framework for Contrastive Learning of Visual Representations》
1 题目 <A Simple Framework for Contrastive Learning of Visual Representations> 作者: Ting Chen, Si ...
- A Simple Framework for Contrastive Learning of Visual Representations
目录 概 主要内容 流程 projection head g constractive loss augmentation other 代码 Chen T., Kornblith S., Norouz ...
- ZH奶酪:【阅读笔记】Deep Learning, NLP, and Representations
中文译文:深度学习.自然语言处理和表征方法 http://blog.jobbole.com/77709/ 英文原文:Deep Learning, NLP, and Representations ht ...
- 【RS】CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Collaborative Filtering-CoupledCF:在推荐系统深度协作过滤中学习显式和隐式的用户物品耦合
[论文标题]CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Colla ...
- 【RS】List-wise learning to rank with matrix factorization for collaborative filtering - 结合列表启发排序和矩阵分解的协同过滤
[论文标题]List-wise learning to rank with matrix factorization for collaborative filtering (RecSys '10 ...
随机推荐
- 20个最常用的Windows命令行
1. 中断命令执行Ctrl + Z 2. 文件/目录cd 切换目录例:cd // 显示当前目录例:cd .. // 进入父目录 3.创建目录md d:\mp3 // 在C:\建立mp3文件夹md d: ...
- 阿里八八β阶段Scrum(4/5)
今日进度 黄梅玲: 图表绘制与实时更新的完成 刘晓: 数据分析表格部分生成完成 张岳: 初步完成简易的桌面控件 陈裕鹏: 事件添加TAG标签的功能完成,此外信息抽取算法也基本完成并PULL,但与项目产 ...
- Flex布局新写法兼容写法详解
很久之前用过flex,但是没有考虑过兼容性问题,为了兼容ios一定要加上-webkit前缀: ul{ display: flex; /* 新版本语法: Opera 12.1, Firefox 22+ ...
- 【opatch打补丁】oracle10.2.0.5.0升级10.2.0.5.9 for linux
https://wenku.baidu.com/view/c38702b56edb6f1afe001f59.html 这篇文章也不错,可参考 任务:oracle 10.2.0.5.0 打补丁升级 ...
- mybatis collection使用注意
背景 今天我在使用collection时候,出现数据库有两条数据,但是却返回一条,在复制这条数据到四条后,依然返回一条 分析 这四条数据,数据库的每个字段值完全相同,所以估计是当成一条处理了 如果随便 ...
- Vxlan学习笔记——原理(转)
文章转自http://www.cnblogs.com/hbgzy/p/5279269.html 1. 为什么需要Vxlan 普通的VLAN数量只有4096个,无法满足大规模云计算IDC的需求,而IDC ...
- mysql 存儲emjoy表情是報錯Incorrect string value:
解决方法: [mysqld] character-set-client-handshake=FALSE character-set-server=utf8mb4 collation-server=ut ...
- 解决VC++6.0打开文件或添加文件到工程出错的问题
相信很多朋友在安装VC++6.0之后,发现无法使用打开文件命令.同时,打开了工程,却无法实现文件添加到工程的问题.一旦进行如此操作,便会出现应用程序错误,需要关闭应用程序.为此,不胜其烦.更有甚者,以 ...
- PAT A1012 The Best Rank (25 分)——多次排序,排名
To evaluate the performance of our first year CS majored students, we consider their grades of three ...
- springzuul本地路由和跨服务器路由问题
阿里云服务器在旧新服务迁移过程中,发现路由到认证中心找不到服务 解决办法: 在路由配置里面使用下面的配置 #zuul.routes.claimconf.path=/claimconf/**#zuul. ...