PNAS

2018-ECCV-Progressive Neural Architecture Search

Johns Hopkins University(霍普金斯大学) && Google AI && Stanford
GitHub：300+ stars
Citation：504

Motivation

current techniques usually fall into one of two categories: evolutionary algorithms(EA) or reinforcement learning(RL).

Although both EA and RL methods have been able to learn network structures that outperform manually designed architectures, they require significant computational resources.

目前的两种nas方法，EA和RL，存在计算代价高昂的问题

Contribution

we describe a method that requiring 5 times fewer model evaluations during the architecture search.

只需要评估1/5的模型。

We propose to use heuristic search to search the space of cell structures, starting with simple (shallow) models and progressing to complex ones, pruning out unpromising structures as we go.

渐进式的搜索，从浅层网络开始，逐步搜索复杂网络。

Since this process is expensive, we also learn a model or surrogate function(替代函数) which can predict the performance of a structure without needing to training it.

提出一种近似评估模型好坏的评估函数（预测器），直接预测模型性能，而不是从头训练候选网络。

Several advantages：

First, the simple structures train faster, so we get some initial results to train the surrogate quickly.

代理网络比较小，训练速度快（代价可以忽略不计）。

Second, we only ask the surrogate to predict the quality of structures that are slightly different (larger) from the ones it has seen

预测器只需要预测稍微不同的网络。

Third, we factorize(分解) the search space into a product(乘积) of smaller search spaces, allowing us to potentially search models with many more blocks.

将大的搜索空间分解为小的搜索空间的乘积。

we show that our approach is 5 times more efficient than the RL method of [41] in terms of number of models evaluated, and 8 times faster in terms of total compute.

效率相比RL方法提高5倍，总计算量快了8倍。

Method

Search Space

we first learn a cell structure, and then stack this cell a desired number of times, in order to create the final CNN.

先学习cell结构，再堆叠cell到目标层数。

一个cell接收HxWxF的tensor，如果cell的stride=1，输出HxWxF的tensor，如果stride=2，输出H/2 x W/2 x 2F的tensor。

一个cell由B个block组成，每个block有2个input和1个output，每个block可以用一个五元组表示\(\left(I_{1}, I_{2}, O_{1}, O_{2}, C\right)\)，第c个cell的输出表示为\(H^c\)，第c个cell的第b个block的输出表示为\(H^c_b\)。

每个block的输入为当前cell中，在 {此block之前所有block的输出} 和 {上一个cell的输出，上上个cell的输出} 的集合。

Operator的选择空间有8种操作。

we stack a predefined number of copies of the basic cell (with the same structure, but untied weights 不继承权重 ), using either stride 1 or stride 2, as shown in Figure 1 (right).

找到最佳cell结构后，堆叠预定义的层数，构成右边的完整网络，不继承权重（重新训练）。

The number of stride-1 cells between stride-2 cells is then adjusted accordingly with up to N number of repeats.

Normal cell（stride=1）的数量，取决于N（超参）。

we only use one cell type (we do not distinguish between Normal and Reduction cells, but instead emulate a Reduction cell by using a Normal cell with stride 2),

我们没有区分normal cell 和Reduction cell，仅将Normal cell的stride设置为2作为Reduction cell。

Progressive Neural Architecture Search

Many previous approaches directly search in the space of full cells, or worse, full CNNs.

之前的方法直接搜索完整的cell结构，更糟糕的是整个cnn。

While this is a more direct approach, we argue that it is difficult to directly navigate in an exponentially large search space, especially at the beginning where there is no knowledge of what makes a good model.

尽管这种方式很直接，但搜索空间太大，而且一开始我们没有任何先验知识指导我们在巨大的搜索空间往哪个方向搜索。

从每个cell含有1个block开始搜索。训练所有可能的\(B_1\)，用\(B_1\)训练预测器，然后将\(B_1\)展开为\(B_2\)。

训练所有可能的\(B_2\)代价太大，我们使用预测器来评估所有\(B_2\)-cell的性能并选出最佳的K个\(B_2\)-cell，重复此过程（用选出来K个\(B_2\)-cell训练预测器，将选出的K个\(B_2\)-cell展开为\(B_3\)，再用预测器选出最佳的K个...）。

Performance Prediction with Surrogate Model

Requirement of Predictor：

Handle variable-sized inputs（接受可变输入）
Correlated with true performance（预测值与真实值得相关性）
Sample efficiency（简单高效）
The requirement that the predictor be able to handle variable-sized strings immediately suggests the use of an RNN.

Two Predictor method

RNN and MLP（多层感知机）

However, since the sample size is very small, we fit an ensemble of 5 predictors, We observed empirically that this reduced the variance of the predictions.

由于样本很简单，因此集成5个预测器（RNN-ensemble，MLP-ensemble），可以减少方差。

Experiments

Performance of the Surrogate Predictors

we train the predictor on the observed performance of cells with up to b blocks, but we apply it to cells with b+1 blocks.

在{B=b}上训练，在{B=b+1}的集合上预测。

We therefore consider predictive accuracy both for cells with sizes that have been seen before (but which have not been trained on), and for cells which are one block larger than the training data.

同时考虑在{B=b}的未训练的cell集合上的预测准确率，和{B=b+1}的cell集合上的预测准确率。

在所有{B=b}的cell集合中随机选择10k个作为数据集\(U_{b,1 :R}\)，训练20个epochs。

randomly select K = 256 models (each of size b) from \(U_{b,1 :R}\)to generate a training set \(S_{b,t,1:K}\);

从数据集U中随机选择256个作为每轮的训练集S。

一共会训练20*256=5120个数据点。

We now use this random dataset to evaluate the performance of the predictors using the pseudocode(伪代码) in Algorithm 2, where A(H) returns the true validation set accuracies of the models in some set H.

A(H) 返回cell的集合H训练后真实的准确率。

当B=b时，训练集为所有{B=b}的cell的一个子集，第一行为在所有{B=b}的cell的训练集(256*20=5120)上的预测结果和真实结果的相关性，

第二行为在所有{B=b+1}的cell的数据集(10k)上的预测结果和真实结果的相关性。

We see that the predictor performs well on models from the training set, but not so well when predicting larger models. However, performance does increase as the predictor is trained on more (and larger) cells.

预测器在训练集{B=b}上表现良好，但在较大的数据集{B=b+1}上表现不够好，但随着b的增加，越来越好。

We see that for predicting the training set, the RNN does better than the MLP, but for predicting the performance on unseen larger models (which is the setting we care about in practice), the MLP seems to do slightly better.

RNN方法的预测器在训练集{B=b}上表现更好，MLP在较大的数据集{B=b+1}上表现更好(我们关心的)

Conclusion

The main contribution of this work is to show how we can accelerate the search for good CNN structures by using progressive search through the space of increasingly complex graphs

使用渐进式（cell的深度逐渐增加）的搜索加速NAS

combined with a learned prediction function to efficiently identify the most promising models to explore.

使用可学习的预测器来识别潜在的最优网络。（引入P网络来搜索目标网络的最佳结构。eg. 用C网络来搜索B网络的最佳结构，而B网络又是来搜索A网络的最佳结构，套娃）

The resulting models achieve the same level of performance as previous work but with a fraction of the computational cost.

用小代价达到了了SOTA

Appendix

2018-ECCV-PNAS-Progressive Neural Architecture Search-论文阅读的更多相关文章

论文笔记：Progressive Neural Architecture Search
Progressive Neural Architecture Search 2019-03-18 20:28:13 Paper:http://openaccess.thecvf.com/conten ...
论文笔记：Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation
Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation2019-03-18 14:4 ...
小米造最强超分辨率算法 | Fast, Accurate and Lightweight Super-Resolution with Neural Architecture Search
本篇是基于 NAS 的图像超分辨率的文章,知名学术性自媒体 Paperweekly 在该文公布后迅速跟进,发表分析称「属于目前很火的 AutoML / Neural Architecture Sear ...
Research Guide for Neural Architecture Search
Research Guide for Neural Architecture Search 2019-09-19 09:29:04 This blog is from: https://heartbe ...
论文笔记：Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells
Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells 2019-04- ...
论文笔记系列-Neural Architecture Search With Reinforcement Learning
摘要神经网络在多个领域都取得了不错的成绩,但是神经网络的合理设计却是比较困难的.在本篇论文中,作者使用递归网络去省城神经网络的模型描述,并且使用增强学习训练RNN,以使得生成得到的模型在验证集上 ...
论文笔记：ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware 2019-03-19 16:13:18 Pape ...
论文笔记：Progressive Differentiable Architecture Search:Bridging the Depth Gap between Search and Evaluation
Progressive Differentiable Architecture Search:Bridging the Depth Gap between Search and Evaluation ...
（转）Illustrated: Efficient Neural Architecture Search ---Guide on macro and micro search strategies in ENAS
Illustrated: Efficient Neural Architecture Search --- Guide on macro and micro search strategies in ...

随机推荐

C. Coffee Break 贪心思维有点难有意思
C. Coffee Break 这个贪心之前好像写过,还是感觉挺难的,有点不会写. 这个题目大意是:给你一个数列n个元素,然后给你一天的时间,给你一个间隔时间d, 问你最少要用多少天可以把这个数列的所 ...
最终父类【根类】：Object类&Objects类
一.java.lang.Object类 1.Object类介绍 Object类是所有类的父类.一个类都会直接或间接继承自该类: 该类中提供了一些非常常用的方法! 2.toString()方法 A: ...
201771010113 李婷华《面向对象程序设计（Java）》第十二周总结
一.理论知识部分 1.Java的抽象口工具箱( Abstract WindowToolkit, AWT)包含在java.awt包中,它提供了许多用来设计GUI的组件类和容器类. 2.AWT库处理用户界 ...
rsync服务端一键安装rsync脚本(非源码)
export RSYNC_PASSWORD=123 USER=rsync AUTHUSERS=bck MK=backupmk local_dir=/backup yum remove rsync &a ...
【HBase】HBase和Hue的整合
目录一.修改hue.ini配置文件二.启动HBase的thrift server服务三.启动Hue 四.页面访问一.修改hue.ini配置文件 cd /export/servers/hue-3 ...
Excel非常规快捷键
Windows+V,调出剪贴板,是常规快捷键,鼠标右键-W-F,新建文件夹,这是非常规快捷键. 掌握Excel大半菜单和三五十快捷键,Excel也算入门了.对Excel快捷键的学习,其一是常规快捷键, ...
用ArcGIS？37个Arcmap常用操作技巧可能帮到您
1. 要素的剪切与延伸实用工具 TASK 任务栏 Extend/Trim feature 剪切所得内容与你画线的方向有关. 2. 自动捕捉跟踪工具点击Editor工具栏中Snapping来打开Sn ...
三个方法生成python的exe文件
背景:用的python3.8 方法一:用cmd 输入[pip3 install pyinstaller] 上一条指令报错事实上,在python3.8版本时,输入pip也会显示是无法支持的语句,需要用 ...
Java 在Excel中创建透视表
本文内容介绍通过Java程序在Excel表格中根据数据来创建透视表. 环境准备需要使用Excel类库工具—Free Spire.XLS for Java,这里使用的是免费版,可通过官网下载Jar包并 ...
struts2 全局拦截器，显示请求方法和参数
后台系统中应该需要一个功能那就是将每个请求的url地址和请求的参数log出来,方便系统调试和bug追踪,使用struts2时可以使用struts2的全局拦截器实现此功能: import java.ut ...

2018-ECCV-PNAS-Progressive Neural Architecture Search-论文阅读