2019-ICCV-PDARTS-Progressive Differentiable Architecture Search Bridging the Depth Gap Between Search and Evaluation-论文阅读
P-DARTS
2019-ICCV-Progressive Differentiable Architecture Search Bridging the Depth Gap Between Search and Evaluation
- Tongji University && Huawei
- GitHub: 200+ stars
- Citation:49
Motivation
Question:
DARTS has to search the architecture in a shallow network while evaluate in a deeper one.
DARTS在浅层网络上搜索,在深层网络上评估(cifar search in 8-depth, eval in 20-depth)。
This brings an issue named the depth gap (see Figure 1(a)), which means that the search stage finds some operations that work well in a shallow architecture, but the evaluation stage actually prefers other operations that fit a deep architecture better.
Such gap hinders these approaches in their application to more complex visual recognition tasks.
Contribution
propose Progressive DARTS (P-DARTS), a novel and efficient algorithm to bridge the depth gap.
Bring two questions:
Q1: While a deeper architecture requires heavier computational overhead
we propose search space approximation which, as the depth increases, reduces the number of candidates (operations) according to their scores in the elapsed search process.
Q2:Another issue, lack of stability, emerges with searching over a deep architecture, in which the algorithm can be biased heavily towards skip-connect as it often leads to rapidest error decay during optimization, but, actually, a better option often resides in learnable operations such as convolution.
we propose search space regularization, which (i) introduces operation-level Dropout [25] to alleviate the dominance of skip-connect during training, and (ii) controls the appearance of skip-connect during evaluation.
Method
search space approximation
在初始阶段,搜索网络相对较浅,但是cell中每条边上的候选操作最多(所有操作)。在阶段 \(S_{k-1}\)中,根据学习到的网络结构参数(权值)来排序并筛选出权值(重要性)较高的 \(O_k\) 个操作,并由此搭建一个拥有 \(L_k\) 个cell的搜索网络用于下一阶段的搜索,其中, \(L_k > L_{k-1} , O_k < O_{k-1}\) .
这个过程可以渐进而持续地增加搜索网络的深度,直到足够接近测试网络深度。
search space regularization
we observe that information prefers to flow through skip-connect instead of convolution or pooling, which is arguably due to the reason that skip-connect often leads to rapid gradient descent.
实验结果表明在本文采用的框架下,信息往往倾向于通过skip-connect流动,而不是卷积。这是因为skip-connect通常处在梯度下降最速的路径上。
the search process tends to generate architectures with many skip-connect operations, which limits the number of learnable parameters and thus produces unsatisfying performance at the evaluation stage.
在这种情况下,最终搜索得到的结构往往包含大量的skip-connect操作,可训练参数较少,从而使得性能下降。
We address this problem by search space regularization, which consists of two parts.
First, we insert operation-level Dropout [25] after each skip-connect operation, so as to partially ‘cut off’ the straightforward path through skip-connect, and facilitate the algorithm to explore other operations.
作者采用搜索空间正则来解决这个问题。一方面,作者在skip-connect操作后添加Operations层面的随机Dropout来部分切断skip-connect的连接,迫使算法去探索其他的操作。
However, if we constantly block the path through skip-connect, the algorithm will drop them by assigning low weights to them, which is harmful to the final performance.
然而,持续地阻断这些路径的话会导致在最终生成结构的时候skip-connect操作仍然受到抑制,可能会影响最终性能。
we gradually decay the Dropout rate during the training process in each search stage, thus the straightforward path through skip-connect is blocked at the beginning and treated equally afterward when parameters of other operations are well learned, leaving the algorithm itself to make the decision.
因此,作者在训练的过程中逐渐地衰减Dropout的概率,在训练初期施加较强的Dropout,在训练后期将其衰减到很轻微的程度,使其不影响最终的网络结构参数的学习。
Despite the use of Dropout, we still observe that skip-connect, as a special kind of operation, has a significant impact on recognition accuracy at the evaluation stage.
另一方面,尽管使用了Operations层面的Dropout,作者依然观察到了skip-connect操作对实验性能的强烈影响。
This motivates us to design the second regularization rule, architecture refinement, which simply controls the number of preserved skip-connects, after the final search stage, to be a constant M.
因此,作者提出第二个搜索空间正则方法,即在最终生成的网络结构中,保留固定数量的skip-connect操作。具体的,作者根据最终阶段的结构参数,只保留权值最大的M个skip-connect操作,这一正则方法保证了搜索过程的稳定性。在本文中, M=2 .
We emphasize that the second regularization technique must be applied on top of the first one, otherwise, in the situations without operation-level Dropout, the search process is producing low-qualityarchitectureweights, basedon which we could not build up a powerful architecture even with a fixed number of skip-connects.
需要强调的是,第二种搜索空间正则是建立在第一种搜索空间正则的基础上的。在没有执行第一种正则的情况下,即使执行第二种正则,算法依旧会生成低质量的网络结构。
Experiments
Cell arch in different Search Stage
cifar10
ImageNet
·
Conclusion
we propose a progressive version of differentiable architecture search to bridge the depth gap between search and evaluation scenarios.
The core idea is to gradually increase the depth of candidate architectures during the search process.
- 2Q: computational overhead and instability
Search space approximate and Search space regularize
Our research defends the importance of depth in differentiable architecture search, depth is still the dominant factor in exploring the architecture space.
2019-ICCV-PDARTS-Progressive Differentiable Architecture Search Bridging the Depth Gap Between Search and Evaluation-论文阅读的更多相关文章
- 论文笔记:Progressive Differentiable Architecture Search:Bridging the Depth Gap between Search and Evaluation
Progressive Differentiable Architecture Search:Bridging the Depth Gap between Search and Evaluation ...
- 论文笔记:DARTS: Differentiable Architecture Search
DARTS: Differentiable Architecture Search 2019-03-19 10:04:26accepted by ICLR 2019 Paper:https://arx ...
- 论文笔记:Progressive Neural Architecture Search
Progressive Neural Architecture Search 2019-03-18 20:28:13 Paper:http://openaccess.thecvf.com/conten ...
- (转)Illustrated: Efficient Neural Architecture Search ---Guide on macro and micro search strategies in ENAS
Illustrated: Efficient Neural Architecture Search --- Guide on macro and micro search strategies in ...
- 2019-ICLR-DARTS: Differentiable Architecture Search-论文阅读
DARTS 2019-ICLR-DARTS Differentiable Architecture Search Hanxiao Liu.Karen Simonyan.Yiming Yang GitH ...
- 2019 ICCV、CVPR、ICLR之视频预测读书笔记
2019 ICCV.CVPR.ICLR之视频预测读书笔记 作者 | 文永亮 学校 | 哈尔滨工业大学(深圳) 研究方向 | 视频预测.时空序列预测 ICCV 2019 CVP github地址:htt ...
- 微软的一篇ctr预估的论文:Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsoft’s Bing Search Engine。
周末看了一下这篇论文,觉得挺难的,后来想想是ICML的论文,也就明白为什么了. 先简单记录下来,以后会继续添加内容. 主要参考了论文Web-Scale Bayesian Click-Through R ...
- LeetCode 33 Search in Rotated Sorted Array [binary search] <c++>
LeetCode 33 Search in Rotated Sorted Array [binary search] <c++> 给出排序好的一维无重复元素的数组,随机取一个位置断开,把前 ...
- 33. Search in Rotated Sorted Array & 81. Search in Rotated Sorted Array II
33. Search in Rotated Sorted Array Suppose an array sorted in ascending order is rotated at some piv ...
随机推荐
- C/S程序设计范式
在socket编程之并发回射服务器3篇文章中,提到了3种设计范式: 多进程 父进程阻塞于accept调用,然后为每个连接创建一个子进程. 多线程 主线程阻塞于accept调用,然后为每个连接创建一个子 ...
- SpringBoot:模板引擎 thymeleaf、ContentNegotiatingViewResolver、格式化转换器
目录 模板引擎 thymeleaf ContentNegotiatingViewResolver 格式化转换器 模板引擎 thymeleaf.ContentNegotiatingViewResolve ...
- 步入LTE、多址技术
LTE系统的主要性能和目标 与3G相比,LTE主要性能特性: 带宽灵活配置:支持1.4MHz, 3MHz, 5MHz, 10Mhz, 15Mhz, 20MHz 峰值速率(20MHz带宽):下行100M ...
- 201771030117-祁甜 实验一 软件工程准备—<阅读《现代软件工程——构建之法》提出的三个问题>
项目 内容 课程班级博客链接 https://edu.cnblogs.com/campus/xbsf/nwnu2020SE 这个作业要求链接 https://www.cnblogs.com/nwnu- ...
- 记录一下关于在工具类中更新UI使用RunOnUiThread犯的极其愚蠢的错误
由于Android中不能在子线程中更新ui,所以平时在子线程中需要更新ui时可以使用Android提供的RunOnUiThread接口,但是最近在写联网工具类的时候,有时候会出现联网异常,这个时候为了 ...
- LeetCode--Squares of a Sorted Array && Robot Return to Origin (Easy)
977. Squares of a Sorted Array (Easy)# Given an array of integers A sorted in non-decreasing order, ...
- 基于mykernel 2.0编写一个操作系统内核
一.配置mykernel 2.0,熟悉Linux内核的编译 1.实验环境:VMware 15 Pro,Ubuntu 18.04.4 2.配置环境 1)在电脑上先下载好以下两个文件,之后通过共享文件夹, ...
- 使用npoi导入Excel - 带合并单元格--附代码
之前我们在使用npoi导入excel表格的时候,往往会遇见那种带有合并单元格的数据在导入的时候出现合并为空的问题, 也就是只有第一条有数据,其余均为空白.在网上翻了半天也没有找到合适的解决方案,最后还 ...
- 使用python对oracle进行简单性能测试
一.概述 dba在工作中避不开的两个问题,sql使用绑定变量到底会有多少的性能提升?数据库的审计功能如果打开对数据库的性能会产生多大的影响?最近恰好都碰到了,索性做个实验. sql使用绑定变量对性能的 ...
- 「雕爷学编程」Arduino动手做(38)——joystick双轴摇杆模块
37款传感器与模块的提法,在网络上广泛流传,其实Arduino能够兼容的传感器模块肯定是不止37种的.鉴于本人手头积累了一些传感器和模块,依照实践出真知(一定要动手做)的理念,以学习和交流为目的,这里 ...