论文笔记：Real-Time MDNet

Real-Time MDNet

ECCV 2018 2018-10-22 15:52:01

Paper：http://openaccess.thecvf.com/content_ECCV_2018/papers/Ilchae_Jung_Real-Time_MDNet_ECCV_2018_paper.pdf

Code (PyTorch): https://github.com/IlchaeJung/RT-MDNet

Reference Paper：

1. Learning Multi-Domain Convolutional Neural Networks for Visual Tracking　　CVPR-2016 　　 paper code

2. BranchOut: Regularization for Online Ensemble Tracking with Convolutional Neural Networks 　　CVPR-2017　　paper

3. "Meta-Tracker: Fast and Robust Online Adaptation for Visual Object Trackers."　　 ECCV-2018 Paper Code

上面两个流程图分别是 MDNet 以及 MDNet 的一个改进 Branchout。本文是基于 MDNet 进行改进的，主要是在速度上进行大幅度的提升，因为原始的 MDNet 采用的是 RCNN 的思路，暴力的进行特征的提取，而本文采用改进的 ROI Align 的方法进行更加高效的特征提取。此外，作者提出一种新的 loss function 使其能够取更好的区分前景背景。主要的贡献如下：

本文所提出的网络结构如下所示：

Efficient Feature Extraction and Discriminative Feature Learning：

1. Network Architecture：

如图1所示，网络结构上与 MDNet 基本一致，最大的改动就是采用改进的 ROI Align 算法替换掉了原本的暴力的特征提取流程。所以，该网络结构就变成了：3 conv + Adaptive ROI Align layer + fc layers 。

2. Improved RoIAlign for Visual Tracking：

直接采用 RoIAlign 算法得到的 feature map 是比较粗糙的（ compared to the ones from individual proposal bounding box）。为了提升 RoIs 的质量，我们需要构建一个 feature map，使得该 feature map 有较高的分辨率以及丰富的语义信息。这些需求可以通过获取更加 dense 的全卷机特征图以及扩张每一个激活的感受野来实现（by computing a denser fully convolutional feature map and enlarging the receptive field of each activitation）。所以，我们移除了 a max pooling layer followed by conv2 layer in VGG-M network，然后利用空洞卷积来提升分辨率（with rate r =3）。这个策略可以得到比常规的卷积更大的 feature map。它可以提取到更大的 feature maps，可以很大程度上改善表达的质量。图2展示了常规的 MDNet 与加入了 dilated layers 之后的网络，进行了对比。

3. Pretraining for Dsicriminative Instance Embedding:

我们的学习算法的目标是训练一个判别性的特征映射，来应用到 multiple domains。MDNet 划分出 shared and domain separate layers 来学习表示以区分出前景和背景。除了这个目标之外，我们提出一种新的 loss，即：instance embedding loss，enforces target objects in different domains to be embedded far from each other in a shared feature space and enables to learn discriminative represenations of the unseen target objects in new test sequences. 换句话说，MDNet 仅仅考虑在单独的 domain 来区分 target 和 background，可能在不同 domains 之间来判断 foreground objects 没那么好，特别是当前景物体属于同一个 semantic class 或者有类似的外观时。这可能是由于原始的 CNN 是用来训练做分类的。为了解决这个问题，我们的算法将额外的约束考虑进来，对前景物体进行 embedding，使得在不同 videos 之间彼此远离（embeds foreground objects from multiple videos to be apart from each other）。

给定一张图像 $x_d$，在domain d，以及 BBox R，网络输出的得分，记为 f^d，通过 concatenating 最后的 fc layers 的激活来构成：

其中，是一个 2D binary classification score in domain d，D 是训练结合中 domain 的个数。输出的 feature 被送到 softmax function 中进行二分类，来确定是否一个 BBox R 是前景或者背景图像 in domain d。另外，输出的 feature 通过另一个 softmax operator 来进行 multiple domains 的 instances 判断。这两个 softmax 可以表达为：

其中，比较了在每一个 domain 中，目标物体和背景物体之间的得分，对比了所有 domains 的物体的 pos score。

我们网络优化一个多任务的 loss L，可以表达为：

其中，$L_{cls}$ 以及 $L_{inst}$ 分别是 binary classification 与 discriminative instance embedding 的 loss function。详细的表达式，可以分别记为：

注意到，the instance embedding loss 仅仅对 positive examples 进行处理。

Online Tracking Algorithm：

4.2 Online Model Updates：

We perform two complementary update strategies as in MDNet [1]: long-term and short-term updates to maintain robustness and adaptiveness, respectively. Long-term updates are regularly applied using the samples collected for a long period of time, while short-term updates are triggered whenever the score of the estimated target is below a threshold and the result is unreliable.

与 MDNet 不同的是，作者并没有利用 VOT 训练 OTB 测试或者相反的思路，而是用 ImageNet-VID 上的视频，将近有 4500 个视频，作者随机挑选了 100 videos 来进行offline pretraining。

5. Experiments：

可以看到，作者在后续跟踪过程中，采用了 BBox regression 的技术，但是没有提到是否采用了 MDNet 中用到的 Hard Negative Mining（没有说，默认就是没有用咯 o(╯□╰)o）。

论文笔记：Real-Time MDNet的更多相关文章

论文笔记之：Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning
论文笔记之:Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning 2017-06-06 21: ...
Deep Learning论文笔记之（四）CNN卷积神经网络推导和实现（转）
Deep Learning论文笔记之(四)CNN卷积神经网络推导和实现 zouxy09@qq.com http://blog.csdn.net/zouxy09 自己平时看了一些论文, ...
论文笔记之：Visual Tracking with Fully Convolutional Networks
论文笔记之:Visual Tracking with Fully Convolutional Networks ICCV 2015 CUHK 本文利用 FCN 来做跟踪问题,但开篇就提到并非将其看做 ...
Deep Learning论文笔记之（八）Deep Learning最新综述
Deep Learning论文笔记之(八)Deep Learning最新综述 zouxy09@qq.com http://blog.csdn.net/zouxy09 自己平时看了一些论文,但老感觉看完 ...
Twitter 新一代流处理利器——Heron 论文笔记之Heron架构
Twitter 新一代流处理利器--Heron 论文笔记之Heron架构标签(空格分隔): Streaming-process realtime-process Heron Architecture ...
Deep Learning论文笔记之（六）Multi-Stage多级架构分析
Deep Learning论文笔记之(六)Multi-Stage多级架构分析 zouxy09@qq.com http://blog.csdn.net/zouxy09 自己平时看了一些 ...
Multimodal —— 看图说话（Image Caption）任务的论文笔记（一）评价指标和NIC模型
看图说话(Image Caption)任务是结合CV和NLP两个领域的一种比较综合的任务,Image Caption模型的输入是一幅图像,输出是对该幅图像进行描述的一段文字.这项任务要求模型可以识别图 ...
论文笔记(1)：Deep Learning.
论文笔记1:Deep Learning 2015年,深度学习三位大牛(Yann LeCun,Yoshua Bengio & Geoffrey Hinton),合作在Nature ...
论文笔记(2)：A fast learning algorithm for deep belief nets.
论文笔记(2):A fast learning algorithm for deep belief nets. 这几天继续学习一篇论文,Hinton的A Fast Learning Algorithm ...

随机推荐

Nestjs 链接mysql
文档下插件 λ yarn add @nestjs/typeorm typeorm mysql 创建 cats模块, 控制器,service λ nest g mo cats λ nest g co ...
js 日期排序(sort)
按创建时间日期排序例如 eg 1.升序 2.降序返回的结果: 注: 支持IE和Chrome其他的浏览器可自行测试
oracle ORA-01991错误--重建密码文件问题
问题现象描述: 统计服务器测试没问题,刚好上次配置系统的时候有点问题,故重装一次,配置好安全策略(最近在研究如何新配置一台服务器的时候,第一时间配置好相关的安全设置,有空再写下来). 为了省事,直接冷 ...
win10下Redis安装
环境:win64 1.因为Redis官方不支持Windows,所在只能在GitHub上下载,下载地址:https://github.com/ServiceStack/redis-windows/blo ...
ubuntu系统上如何添加新的根证书
如果自己部署了一个CA系统,或者使用openssl生成了一个自签名的证书,如何让ubuntu系统信任这些证书呢添加证书: 首先,复制pem格式的根证书,重命名为 .crt格式然后,执行下边的命令 ...
Golang中mac地址+时间戳加入rand.Seed()产生随机数
记录一下用mac地址+local时间作为seed来产生随机数 // 首先记录一下rand.Seed()怎么用 // 官方说明,传入int64数据为Seed func (r *Rand) Seed(se ...
Coroutines declared with async/await syntax is the preferred way of writing asyncio applications. For example, the following snippet of code (requires Python 3.7+) prints “hello”, waits 1 second, and
小结: 1.异步io 协程 Coroutines and Tasks — Python 3.7.3 documentation https://docs.python.org/3/library/a ...
Luogu4451 [国家集训队]整数的lqp拆分
题目链接:洛谷题目大意:求对于所有$n$的拆分$a_i$,使得$\sum_{i=1}^ma_i=n$,$\prod_{i=1}^mf_{a_i}$之和.其中$f_i$为斐波那契数列的第$i$项. 数 ...
判断是手机请求还是pc请求
网址 http://detectmobilebrowsers.com/ string u = Request.ServerVariables["HTTP_USER_AGENT"]; ...
mongodb 3.2 分片 + 副本集
从图中可以看到有四个组件:mongos.config server.shard.replica set. mongos,数据库集群请求的入口,所有的请求都通过mongos进行协调,不需要在应用程序添加 ...

论文笔记：Real-Time MDNet

论文笔记：Real-Time MDNet的更多相关文章

随机推荐

热门专题