Illustrated: Efficient Neural Architecture Search

--- Guide on macro and micro search strategies in ENAS

2019-03-27 09:41:07

This blog is copied fromhttps://towardsdatascience.com/illustrated-efficient-neural-architecture-search-5f7387f9fb6

Designing neural networks for various tasks like image classification and natural language understanding often requires significant architecture engineering and expertise. Enter Neural Architecture Search (NAS), a task to automate the manual process of designing neural networks. NAS owes its growing research interest to the increasing prominence of deep learning models of late.

There are many ways to search for or discover neural architectures. Over the past couple of years, the community has seen different search methods proposed including:

In this post, we will look at Efficient Neural Architecture Search (ENAS)which employs reinforcement learning to build convolutional neural networks (CNNs) and recurrent neural networks (RNNs). The authors (Hieu Pham, Melody Guan, Barret Zoph, Quoc V. Le, and Jeff Dean) proposed a predefined neural network to generate new neural networks guided by a reinforcement learning framework using macro and micro search. That’s right – a neural network building another neural network.

The purpose of this article is to provide the readers a tutorial on how the macro and micro search strategies lead to generating neural networks. While the illustrations and animations serve to guide the readers, the sequence of animations do not necessarily reflect the flow of operations (due to vectorisation etc.).

We shall narrow the scope of this tutorial to neural architecture search for CNNs in an image classification task. This article assumes that the reader is familiar with the basics of RNNs, CNNs, and reinforcement learning. Familiarity with deep learning concepts like transfer learning and skip/residual connections will greatly help as they are heavily used in the architecture search. It is not required to have read the paper, but it would speed up your understanding.



0. Overview

In ENAS, there are 2 types of neural networks involved:

  • Controller – a predefined RNN, which is a long short-term memory (LSTM) cell
  • Child model – the desired CNN for image classification

Like most other NAS algorithms, the ENAS involves 3 concepts:

  1. Search space — all the different possible architectures or child models that can possibly be generated
  2. Search strategy — a method to generate these architectures or child models
  3. Performance evaluation — a method to measure the effectiveness of the generated child models

Let’s see how these five ideas form the ENAS story.

The controller controls or directs the building of the child model’s architecture by “generating a set of instructions” (or, more rigorously, making decisions or sampling decisions) using a certain search strategy. These decisions are things like what types of operations (convolutions, pooling etc.) to perform at a particular layer of the child model. Using these decisions, a child model is built. A generated child model is one of the many possible child models that can be built in the search space.

This particular child model is then trained to convergence (~95% training accuracy) using stochastic gradient descent to minimise the expected loss function between the predicted class and ground truth class (for an image classification task). This is done for a specified number of epochs, what I’d like to call child epochs, say 100. Then, a validation accuracy is obtained from this trained model.

Then, we update the controller’s parameters using REINFORCE, a policy-based reinforcement learning algorithm, to maximise the expected reward function which is the validation accuracy. This parameter update hopes to improve the controller in generating better decisions that give higher validation accuracies.

This entire process (from 3 paragraphs before this) is just one epoch — let’s call it controller epoch. We then repeat this for a specified number of controller epochs, say 2000.

Of all the 2000 child models generated, the one with the highest validation accuracy gets the honour to be the neural network for your image classification task. However, this child model must go through just one more round of training (again specified by the number of child epochs), before it can be used for deployment.

A pseudo algorithm for the entire training is written below:

CONTROLLER_EPOCHS = 2000
CHILD_EPOCHS = 100
Build controller network
for i in CONTROLLER_EPOCHS:
     1. Generate a child model
2. Train this child model for CHILD_EPOCHS
3. Obtain val_acc
4. Update controller parameters
Get child model with the highest val_acc
Train this child model for CHILD_EPOCHS

This entire problem is essentially a reinforcement learning framework with the archetypal elements:

  • Agent — Controller
  • Action — The decisions taken to build the child network
  • Reward — Validation accuracy from the child network

The aim of this reinforcement learning task is to maximise the reward (validation accuracy) from the actions taken (decisions taken to build child model architecture) by the agent (controller).

[Back to top]

1. Search Strategy

Recall in the previous section that the controller generates the child model’s architecture using a certain search strategy. There are two questions that you should ask in this statement — (1) how does the controller make decisions and (2) what search strategy?

How does the controller make decisions?

This brings us to the model of the controller, which is an LSTM. This LSTM samples decisions via softmax classifiers, in an auto-regressive fashion: the decision in the previous step is fed as input embedding into the next step.

What are the search strategies?

The authors of ENAS proposed 2 strategies for searching for or generating an architecture.

  1. Macro search
  2. Micro search

Macro search is an approach where the controller designs the entire network. Examples of publications that use this include NAS by Zoph and Le, FractalNet and SMASH. On the other hand, micro search is an approach where the controller designs modules or building blocks, which are combined to build the final network. Some papers that implement this approach are Hierarchical NAS, Progressive NAS and NASNet.

In the following 2 sub-sections we will see how ENAS implements these 2 strategies.

[Macro Search][Micro Search]

1.1 Macro Search

In macro search, the controller makes 2 decisions for every layer in the child model:

  • the operation to perform on the previous layer (see Notes for the list of operations)
  • the previous layer to connect to for skip connections

In this macro search example, we will see how the controller generates a 4-layer child model. Each layer in this child model is colour-coded with red, green, blue and purple respectively.

Convolutional Layer 1 (Red)

We’ll start with running the first time step of the controller. The output of this time step is softmaxed to get a vector, which translates to a conv3×3operation.

What this means for the child model is that the we perform a convolution with a 3×3 filter on the input image.

 

The output from the first time step (conv3×3) of the controller corresponds to building the first layer (red) in the child model. This means the child model will first perform 3×3 convolution on the input image.

I know I mentioned that the controller needs to make 2 decisions but there’s only 1 here. Since this is the first layer, we can only sample one decision which is the operation to perform, because there’s nothing else to connect to except for the input image itself.

Convolutional Layer 2 (Green)

To build the subsequent convolutional layers, the controller makes 2 decisions (no more lies): (i) operation and (ii) layer(s) to connect to. Here, we see that it generated 1 and sep5×5.

What this means for the child model is that we first perform a sep5×5operation on the output of the previous layer. Then, this output is concatenated along the depth together with the output of Layer 1, i.e. the output from the red layer.

 

The outputs from the 2nd and 3rd time step (1 and sep5×5) in the controller correspond to building Convolutional Layer 2 (green) in the child model.

Convolutional Layer 3 (Blue)

We repeat the previous step again to generate the 3rd convolutional layer. Again, we see here that the controller generates 2 things: (i) operation and (ii) layer(s) to connect to. Below, the controller generated 1 and 2, and the operation max3×3.

So, the child model performs the operation max3×3 on the output of the previous layer (Layer 2, green). Then, the result of this operation is concatenated along the depth dimension with Layers 1 and 2.

 

The outputs from the 4th and 5th time step (1,2 and max3×3) in the controller correspond to building Convolutional Layer 3 (blue) in the child model.

Convolutional Layer 4 (Purple)

We repeat the previous step again to generate the 4th convolutional layer. This time the controller generated 1 and 3, and the operation conv5×5.

The child model performs the operation conv5×5 on the output of the previous layer (Layer 3, blue). Then, the result of this operation is concatenated along the depth dimension with Layers 1 and 3.

 

The outputs from the 6th and 7th time step (1,3 and conv5×5) in the controller correspond to building Convolutional Layer 4 (purple) in the child model.

End

And there you have it — a child model generated using the macro search! Now on to micro search. Heads up: micro search isn’t as straightforward as macro search.

[Back to top]

1.2 Micro Search

As mentioned earlier, micro search designs modules or building blocks which are then connected together to form the final architecture. ENAS calls these building blocks convolutional cells and reduction cells. Simply put, a convolutional cell or reduction cell is just a block of operations. Both are similar — the only thing different about reduction cells is that the operations are applied with a stride of 2, thus reducing the spatial dimensions.

How to connect these cells to form the final network, you may ask?

The final network

Below is an image that gives you a quick overview of the final generated child model.

 

Fig. 1.2.1: Overview of the final neural network generated. Image source.

Let’s come back to this in a bit.

Building units for networks derived for micro search

There’s sort of a hierarchy of the ‘building units’ of child networks derived from micro search. From biggest to smallest:

  • block
  • convolutional cell / reduction cell
  • node

A child model consists of several blocks. Each block consists of Nconvolutional cells and 1 reduction cell, in that order, as mentioned above. Each convolutional/reduction cell comprises B nodes. And each node consists of standard convolutional operations (we’ll see this later). (N and B are hyperparameters that can be tuned by the architect.)

Below is a child model with 3 blocks. Each block consists of N=3 convolutional cells and 1 reduction cell. The operations within each cell are not shown here.

 

Fig. 1.2.2: Overview of the final neural network generated. Image source.

So how to generate this child model from micro search, you may ask? Continue reading!

Generate a child model from micro search

For this micro search tutorial, let’s build a child model that has 1 block, for simplicity’s sake. Each block (there’s only one though) comprises N=3 convolutional cells and 1 reduction cell. Each cell comprises B=4 nodes. This means our generated child model should look like this:

 

Fig. 1.2.3: A neural network generated from micro search which has 1 block, consisting of 3 convolutional cells and 1 reduction cell. The individual operations are not shown here.

Let’s now build a convolutional cell!

Fast forward

To explain how to build a convolutional cell, let me take you to a stage where we have already built the first 2 convolutional cells for you. Notice that the last operations from each of these 2 cells are add operations. Let’s just take it for granted for now.

 

Fig. 1.2.4: Two convolutional cells already built in micro search.

With 2 convolutional cells built for us, let’s move on to the third.

Convolutional Cell #3

Now, let’s ‘prepare’ the third convolutional cell — the cell that you and I will be building together.

 

Fig. 1.2.5: ‘Preparing’ the third convolutional cell in micro search.

Recall that every convolutional cell consists of 4 nodes. Now you might say: Okay sure so where are these nodes?

The first two nodes — read this very carefully and slowly — are the two previous cells from the current cell — yes, cells. What about the other 2 nodes? These 2 nodes fall in this very convolutional cell that we are building right now. Let’s make known where these nodes are:

 

Fig. 1.2.6: Identifying the 4 nodes while building Convolutional Cell #3.

From this section onwards, you can safely disregard the ‘Convolutional cell’ labels you see on the image above and concentrate on the ‘Nodes’ labels:

Node 1 — red (Convolutional Cell #1)

Node 2 — blue (Convolutional Cell #2)

Node 3 — green

Node 4 — purple

If you’re wondering if these nodes will change for every convolutional cell we’re building, the answer is yes! Every cell will ‘assign’ the nodes in this manner.

You might also wonder — since we’ve already built the operations in Node 1 and Node 2 (which are Convolutional Cells #1 and #2), what’s there left to build in these nodes? You asked the right question.

Convolutional Cell #3: Node 1 (red) and Node 2 (blue)

For any cell that we’re building, the first 2 nodes do not have to be built but instead become the inputs to the other nodes. In our example, since we are building 4 nodes, so Node 1 and 2 can be inputs to Node 3 and Node 4. So, yay! We don’t have to do anything for Node 1 and Node 2 and we can now move on to building Node 3 and Node 4. Phew!

Convolutional Cell #3: Node 3 (Green)

Node 3 is where the building starts. Unlike in macro search where the controller samples 2 decisions for every layer, here in micro search we have the controller samples 4 decisions for us (or rather 2 sets of decisions):

  • 2 nodes to connect to
  • the respective 2 operations to perform on the nodes to connect to

With 4 decisions to make, the controller runs 4 time steps. Have a look below:

 

Fig. 1.2.7: The outputs of the first four controller time steps (2, 1, avg5×5, sep5×5), which will be used to build Node 3.

From the above we see that the controller sampled 21avg5×5, and sep5×5 from each of the four time steps. How does this translate to the architecture of the child model? Let’s see:

 

Fig. 1.2.8: How the outputs of the first four controller time steps (2, 1, avg5×5, sep5×5) are translated to build Node 3.

From the above, there are three things that just happened:

  1. The output from Node2 (blue) undergoes the avg5×5 operation.
  2. The output from Node 1 (red) undergoes a sep5×5 operation.
  3. Both the results from these two operations undergo an add operation.

The output from this node is the tensor that undergoes the add operation. This explains why Nodes 1 and 2 end with add operations.

Convolutional Cell #3: Node 4 (Purple)

Now for Node 4. We repeat the same steps, just that the controller now has 3 nodes to choose from (Nodes 1, 2 and 3). Below, the controller generated 31id and avg3×3.

 

Fig. 1.2.9: The outputs of the first four controller time steps (3, 1, id, avg3×3), which will be used to build Node 4.

This translates to building the following:

 

Fig. 1.2.10: How the outputs of the first four controller time steps (3, 1, id, avg3×3) are translated to build Node 3.

What just happened?

  1. The output from Node3 (green) undergoes an id operation.
  2. The output from Node 1 (red) undergoes an avg3×3 operation.
  3. Both the results from these two operations undergo an add operation.

And that’s it we’re done for Convolutional Cell #3.

Reduction Cell

Recall that for every N convolutional cells, we need to have a reduction cell. Since N=3 in this tutorial, and we’ve just finished with Convolutional Cell #3, it’s time to build a reduction cell. As mentioned earlier, the design of the reduction cell is similar to Convolutional Cell #3, except that the operations that are sampled have a stride of 2.

End

And so that wraps up generating a child model out of the micro search strategy. Phew! I hope that wasn’t too much for you, because it was for me when I first read the paper.

[Back to top]

2. Notes

Because this post mainly shows the macro and micro search strategies, I’ve left out many small details (especially on the concept of transfer learning). Let me briefly cover them:

  • What’s so ‘efficient’ in ENAS? Answer: transfer learning. If a computation between two nodes has been done (trained) before, the weights from the convolutional filters and 1×1 convolutions (to maintain number of channel outputs; not mentioned in the previous sections) will be reused. This is what makes ENAS faster than its predecessors!
  • It is possible that the controller samples a decision where no skip connection is needed.
  • There are 6 operations available for the controller: convolutions with filter sizes 3×3 and 5×5, depthwise-separable convolutions with filter sizes 3×3 and 5×5, max pooling and average pooling of kernel size 3×3.
  • Do read up on the concatenate operation at the end of each cell which ties up ‘loose ends’ of any nodes.
  • Do read up briefly on the policy gradient algorithm (REINFORCE) reinforcement learning.

[Back to top]

3. Summary

Macro search (for an entire network)

The final child model is as shown below.

 

Fig. 3.1: Generating a convolutional neural network with macro search.

Micro search (for a convolutional cell)

Note that only part of the final child model is shown here.

 

Fig. 3.2: Generating a convolutional neural network with micro search. Only part of the full architecture is shown.

[Back to top]

4. Implementations

[Back to top]

5. References

Efficient Neural Architecture Search via Parameter Sharing

Neural Architecture Search with Reinforcement Learning

Learning Transferable Architectures for Scalable Image Recognition


That’s it! Remember to read the ENAS paper Efficient Neural Architecture Search via Parameter Sharing. If you have any questions, please highlight and leave a comment.

Other Articles on Deep Learning

General

Counting No. of Parameters in Deep Learning Models

Related to NLP

Animated RNN, LSTM and GRU

Attn: Illustrated Attention

Line-by-Line Word2Vec Implementation

Related to Computer Vision

Breaking down Mean Average Precision (mAP)

Optimisation

Step-by-Step Tutorial on Linear Regression with Stochastic Gradient Descent

10 Gradient Descent Optimisation Algorithms + Cheat Sheet

Special thanks to Ren Jie TanDerek, and Yu Xuan Tay for ideas, suggestions and corrections to this article.

==

(转)Illustrated: Efficient Neural Architecture Search ---Guide on macro and micro search strategies in ENAS的更多相关文章

  1. 论文笔记系列-Efficient Neural Architecture Search via Parameter Sharing

    Summary 本文提出超越神经架构搜索(NAS)的高效神经架构搜索(ENAS),这是一种经济的自动化模型设计方法,通过强制所有子模型共享权重从而提升了NAS的效率,克服了NAS算力成本巨大且耗时的缺 ...

  2. Research Guide for Neural Architecture Search

    Research Guide for Neural Architecture Search 2019-09-19 09:29:04 This blog is from: https://heartbe ...

  3. (转)The Evolved Transformer - Enhancing Transformer with Neural Architecture Search

    The Evolved Transformer - Enhancing Transformer with Neural Architecture Search 2019-03-26 19:14:33 ...

  4. 论文笔记系列-Neural Architecture Search With Reinforcement Learning

    摘要 神经网络在多个领域都取得了不错的成绩,但是神经网络的合理设计却是比较困难的.在本篇论文中,作者使用 递归网络去省城神经网络的模型描述,并且使用 增强学习训练RNN,以使得生成得到的模型在验证集上 ...

  5. Neural Architecture Search — Limitations and Extensions

    Neural Architecture Search — Limitations and Extensions 2019-09-16 07:46:09 This blog is from: https ...

  6. 论文笔记:Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells

    Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells 2019-04- ...

  7. 论文笔记:ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware

    ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware 2019-03-19 16:13:18 Pape ...

  8. 论文笔记:Progressive Neural Architecture Search

    Progressive Neural Architecture Search 2019-03-18 20:28:13 Paper:http://openaccess.thecvf.com/conten ...

  9. 论文笔记:Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation

    Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation2019-03-18 14:4 ...

随机推荐

  1. linux和shell的学习记录

    1.16条常用的命令 .文件的权限修改:(把文件1.txt的归属改为mysql的,然后ll查看) chown mysql:mysql .txt .增加当前用户的x权限,然后ll查看: chomd u+ ...

  2. SpringBoot单元测试

    一.Service层Junit单元测试 需要的jar包 <dependency> <groupId>org.springframework.boot</groupId&g ...

  3. 随手科技(随手记)2017招聘Java工程师笔试题

    一  如何解决多台web服务器粘性会话的问题? 粘性session:web服务器会把某个用户的请求,交给tomcat集群中的一个节点,以后此节点就负责该保存该用户的session,如果此节点挂掉,那么 ...

  4. 使用easyui搭建网页架子

    使用踩坑: 一.弹出框上datagrid第二次加载数据,必须在显示状态,datagrid加载数据才会渲染,否则是空白 $('#xq_selKs').window('open').window('cen ...

  5. html5+css基础

    最近在学习html+css3基础教程,整理了一些基础知识点.在此与大家分享. 1.盒模型 定义:css处理网页时,它认为每个元素都包含在一个不可见的盒子里,即我们所熟知的盒模型.其中它的主要属性有:h ...

  6. python-爬虫(3)---lxml匹配css

    百度首页  部分代码 <div class="s_tab_inner"> <b>网页</b> <a href="//www.ba ...

  7. html 类似雷达扫描效果 及 闪屏效果

    //雷达扫描效果 1 <em id="Radar" class="RadarFast"></em> css: .RadarFast{ p ...

  8. .net连接数据库递归

    private void Form1_Load(object sender, EventArgs e) { List<Regions> regions = GetRegions().Whe ...

  9. java操作Maven

    记录瞬间 import org.dom4j.Document; import org.dom4j.DocumentException; import org.dom4j.Element; import ...

  10. Jenkins - ERROR: Exception when publishing, exception message [Failed to connect session for config [IP(projectName)]. Message [Auth fail]]

    今天在处理Jenkins的时候出现了一些异常,看着控制台,编译都是通过的,只是没有部署上来,查看了控制台日志,如下: 刚开始以为磁盘满了(参考:https://www.cnblogs.com/yuch ...