AutoML for Data Augmentation

2019-04-01 09:26:19

This blog is copied from: https://blog.insightdatascience.com/automl-for-data-augmentation-e87cf692c366

 

DeepAugment is an AutoML tool focusing on data augmentation. It utilizes Bayesian optimization for discovering data augmentation strategies tailored to your image dataset. The main benefits and features of DeepAugment are:

  • Reduces the error rate of CNN models (showed 60% decrease in error for CIFAR10 on WRN-28–10)
  • Saves time by automating the process
  • 50 times faster than Google’s previous solution–AutoAugment

The finished package is on PyPI. You can install it from your terminal by running:

$ pip install deepaugment

You can also visit the project’s README or run the Google Colab notebook tutorial. To learn more about how I built this, read on!

Introduction

Data is the most critical piece of AI applications. Not having enough labeled data often leads to overfitting, which means the model will not be able to generalize to unseen examples. This can be mitigated by data augmentation, which effectively increases the amount and diversity of data seen by the network. It is done by artificially producing new data by applying transformations on an original dataset such as rotation, cropping, occlusion, etc. However, determining which augmentations will work best for the dataset at hand is no trivial task. To address this problem, Google published AutoAugment last year, which discovers optimized augmentations for the given dataset using reinforcement learning.

Using Google’s AutoAugment requires powerful computational resources due to the reinforcement learning module. Since obtaining the necessary computational power can be costly, I developed a novel approach, DeepAugment, which employs Bayesian optimization instead of reinforcement learning.

Ways to get better data

Efforts to improve the quality of data often have a higher return on investment than efforts to enhance models. There are three main ways to improve data: collecting more data, synthesizing new data, or augmenting existing data. Collecting additional data is not always possible and can be expensive. Data synthesis, done by GANs, is promising but complicated, and might diverge from realistic examples.

 

Data augmentation, on the other hand, is simple and has high impact. It is applicable to most datasets and is done with simple image transformations. The problem, however, is determining which augmentation technique is best for the dataset at hand. Discovering the proper method requires time-consuming experimentation. Even after many experiments, a machine learning (ML) engineer may still not discover the best option.

Effective augmentation strategies are different for each image dataset, and some augmentation techniques may even be detrimental to the model. For example, applying rotations would make your model worse if you are using it with the MNIST digits dataset, because a 180 degree rotation on a “6” would make it look like a “9”, while still being labeled as a 6. On the other hand, applying rotation to satellite images can improve results significantly since a car image from the air will still be a car, no matter how much it is rotated.

DeepAugment: lightning fast autoML

DeepAugment is designed as a fast and flexible autoML data augmentation solution. More specifically, it is designed as a faster and more flexible alternative to AutoAugment (Cubuk et al., 2018blog). AutoAugment was one of the most exciting publications in 2018, and the first method using reinforcement learning for this particular problem. At the time of this article, the open source version of AutoAugment did not provide the controller module, which prevents users from running it for their own datasets. Moreover, it takes 15,000 iterations to learn augmentation policies, requiring huge computational resources. Most people could not benefit from it even if its source code was fully available.

DeepAugment addresses these problems with the following design goals:

  1. Minimize the computational complexity of the optimization of data augmentation while maintaining the quality of results.
  2. Be modular and user-friendly.

In order to achieve the first goal, DeepAugment was designed with the following differences, as compared to AutoAugment:

  • Utilizes Bayesian optimization instead of reinforcement learning (requires fewer iterations) (~100x speed-up)
  • Minimizes size of child model (decreases computational complexity of each training) (~20x speed-up)
  • Less stochastic augmentation search space design (decreases number of iterations needed)

To achieve the second goal, making DeepAugment modular and user-friendly, the user interface is designed in a way that gives the user a broad configuration of possibilities and model selections (e.g. selecting the child model or inputting a self-designed child model, see configuration options).

Designing augmentation policies

DeepAugment aims to find the best augmentation policy for a given image dataset. An augmentation policy is defined as the sum of five sub-policies, which are made from two types of augmentation techniques and two real-values [0, 1], determining how powerfully each augmentation technique will be applied. I implemented augmentation techniques using the imgaugpackage, which is known for its large collection of augmentation techniques (see below).

 

Augmentations are most effective when they are diverse and randomly applied. For instance, instead of rotating every image, it is better to rotate some portion of images, shear another portion, and apply a color inversion for another. Based on this observation, DeepAugment applies one of five sub-policies (consisting of two augmentations) randomly to the images. During the optimization process, each image has an equal chance (16%) of being augmented by one of five sub-policies and a 20% chance of not being augmented at all.

While I was inspired by AutoAugment for this policy design, there is one main difference: I do not use any parameters for the probability of applying sub-policies in order to make policies less stochastic and allow optimization in fewer iterations.

 

This policy design creates a 20-dimensional search space for the Bayesian optimizer, where 10 dimensions are categorical (type of augmentation technique) and the other 10 are real-values (magnitudes). Since categorical values are involved, I configured the Bayesian optimizer to use a random forest estimator.

How DeepAugment finds the best policies

The three major components of DeepAugment are the controller (Bayesian optimizer), the augmenter, and the child model, with the overall workflow as follows: the controller samples new augmentation policies, the augmenter transforms images by the new policy, and the child model is trained from scratch by the augmented images.

A reward is calculated from the child model’s training history. The reward is returned back to the controller, and it updates its surrogate model with this reward and associated augmentation policy (see section “How Bayesian optimization works” below). The controller then samples new policies again and the same steps repeat. This process cycles until the user-determined maximum number of iterations are reached.

The controller (Bayesian optimizer) is implemented using scikit-optimizelibrary’s ask-and-tell method. It is configured to use a random forest estimator as its base estimator and expected improvement as its acquisition function.

 

Basic workflow of DeepAugment

How Bayesian optimization works

The aim of Bayesian optimization is to find a set of parameters that maximize the value of the objective function. A working cycle of Bayesian optimization can be summarized as:

  1. Build a surrogate model of the objective function
  2. Find parameters that perform best on the surrogate
  3. Execute the objective function with these parameters
  4. Update the surrogate model with these parameters and the score of the objective function
  5. Repeat steps 2–4 until the maximum number of iterations is reached

For more information about Bayesian optimization, read this blog explaining it at a high-level, or take a glance at this review paper.

 

A 2D depiction of Bayesian optimization, where x and y axes represent types of augmentation, and the color at point (i,j) represents CNN model accuracy when it is trained with the data augmented by augmentation i and j.

Trade-offs of Bayesian optimization

Currently, the standard approaches used for hyper-parameter optimization are random search, grid search, Bayesian optimization, evolutionary algorithms, and reinforcement learning, in the order of method complexity. Bayesian optimization is a better choice than grid search and random search in terms of accuracy, cost, and computation time for hyper-parameter tuning (see an empirical comparison here). This is due to the fact that Bayesian optimization learns from runs with the previous parameters, contrary to grid search and random search.

When Bayesian optimization is compared against reinforcement learning and evolutionary algorithms, it provides competitive accuracies while requiring far fewer iterations. Google’s AutoAugment, for example, iterates 15,000 times in order to learn good policies (which means training the child CNN model 15,000 times). Bayesian optimization, on the other hand, learns good polices in 100–300 iterations. A rule of thumb for Bayesian optimization is making the number of iterations as much as the number of optimized parameters times 10.

 

An intuitive comparison of hyper-parameter optimization approaches. Number of plus signs (+) indicates how good the approach is, by comparison category.

Challenges and solutions

Challenge 1: Optimizing for augmentation requires a lot of computational resources, since the child model should be trained from scratch over and over. This dramatically slowed down the development process of my tool. Even though usage of Bayesian optimization made it faster, the optimization process was still not fast enough to make development feasible.

Solutions: I developed two solutions. First, I optimized the child CNN model (see below), which is the computational bottleneck of the process. Second, I designed augmentation policies in a more deterministic way, making the Bayesian optimizer require fewer iterations.

 

Designed child CNN model. It trains in ~30 seconds (120 epochs) with 32x32 images on an AWS p3.2x large instance (Tesla V100 GPU with 112 TensorFLOPS)

Challenge 2: I encountered an interesting problem during the development of DeepAugment. During the optimization of augmentations by training the child model over and over, they started to overfit to the validation set. I discovered that my best-found policies perform poorly when I changed the validation set. This is an interesting case because it is different than overfitting, in the general sense, where model weights are overfitting to the noise in the data.

Solution: Instead of using the same validation set, I reserved the rest of the data and the training data as the "seed validation set", and sampled a validation set with 1000 images at each training of the child CNN model (see data pipeline below). This solved the augmentation overfitting problem.

 

How to integrate into your ML pipeline

DeepAugment is published on PyPI. You can install it from your terminal by running:

$ pip install deepaugment

And usage is easy:

from deepaugment.deepaugment import DeepAugment

deepaug = DeepAugment(my_images, my_labels)

best_policies = deepaug.optimize()

A more advanced usage, by configuring DeepAugment:

from keras.datasets import cifar10

# my configuration
my_config = {
"model": "basiccnn",
"method": "bayesian_optimization",
"train_set_size": 2000,
"opt_samples": 3,
"opt_last_n_epochs": 3,
"opt_initial_points": 10,
"child_epochs": 50,
"child_first_train_epochs": 0,
"child_batch_size": 64
} (x_train, y_train), (x_test, y_test) = cifar10.load_data()
# X_train.shape -> (N, M, M, 3)
# y_train.shape -> (N)
deepaug = DeepAugment(x_train, y_train, config=my_config) best_policies = deepaug.optimize(300)

For more detailed installation/usage information, visit the project's READMEor run the Google Colab notebook tutorial.

Conclusion

To our knowledge, DeepAugment is the first method utilizing Bayesian optimization to find the best data augmentations. Optimization of data augmentation is a recent research area, and AutoAugment was one of the first methods tackling this problem.

The main contribution of DeepAugment to the open-source community is that it makes the process scalable, enabling users to optimize augmentation policies without needing huge computational resources*. It is very modular and >50 times faster than the previous solution, AutoAugment.

 

Comparison of validation accuracies of WideResNet-28-10 CNN model with CIFAR10 images when they are augmented by policies found by DeepAugment, and when they are not augmented. Validation accuracy is increased by 8.5%, equivalent to 60% reduction in error.

DeepAugment is shown to reduce error by 60% for a WideResNet-28-10 model using the CIFAR-10 small image dataset when compared to the same model and dataset without augmentation.

DeepAugment currently only optimizes augmentations for the image classification task. It could be expanded to optimize for object detection or segmentation tasks, and I welcome your contributions if you would like to do so. However, I would expect that the best augmentation policies are very dependent on the type of dataset, and less so on the task (such as classification or object detection). This means AutoAugment should find similar strategies regardless of the task, but it would be very interesting if these strategies end up being very different!

While DeepAugment currently works for image datasets, it would be very interesting to extend it for text, audio or video datasets. The same concept is applicable to other types of datasets as well.

*DeepAugment takes 4.2 hours (500 iterations) on CIFAR-10 dataset which costs around $13 using AWS p3.x2large instance.

Want to advance your career in Data Science and Artificial Intelligence? The deadline to apply is April 1st for SV and NYC! Learn more about the Artificial Intelligence program at Insight!


Resources

GitHub: github.com/barisozmen/deepaugment

Demo slide deck: bit.ly/deepaugmentslides

Colab tutorial: bit.ly/deepaugmentusage

Thanks to Holly Szafarek and Amber Roberts

==

(转)AutoML for Data Augmentation的更多相关文章

  1. 深度学习中的Data Augmentation方法(转)基于keras

    在深度学习中,当数据量不够大时候,常常采用下面4中方法: 1. 人工增加训练集的大小. 通过平移, 翻转, 加噪声等方法从已有数据中创造出一批"新"的数据.也就是Data Augm ...

  2. 常见的数据扩充(data augmentation)方法

    G~L~M~R~S 一.data augmentation 常见的数据扩充(data augmentation)方法:文中图片均来自吴恩达教授的deeplearning.ai课程 1.Mirrorin ...

  3. 图像数据增强 (Data Augmentation in Computer Vision)

    1.1 简介 深层神经网络一般都需要大量的训练数据才能获得比较理想的结果.在数据量有限的情况下,可以通过数据增强(Data Augmentation)来增加训练样本的多样性, 提高模型鲁棒性,避免过拟 ...

  4. Keras Data augmentation(数据扩充)

    在深度学习中,我们经常需要用到一些技巧(比如将图片进行旋转,翻转等)来进行data augmentation, 来减少过拟合. 在本文中,我们将主要介绍如何用深度学习框架keras来自动的进行data ...

  5. data augmentation 总结

    data augmentation 几种方法总结 在深度学习中,有的时候训练集不够多,或者某一类数据较少,或者为了防止过拟合,让模型更加鲁棒性,data augmentation是一个不错的选择. 常 ...

  6. keras对图像数据进行增强 | keras data augmentation

    本文首发于个人博客https://kezunlin.me/post/8db507ff/,欢迎阅读最新内容! keras data augmentation Guide code # import th ...

  7. paper 147:Deep Learning -- Face Data Augmentation(一)

    1. 在深度学习中,当数据量不够大时候,常常采用下面4中方法:  (1)人工增加训练集的大小. 通过平移, 翻转, 加噪声等方法从已有数据中创造出一批"新"的数据.也就是Data ...

  8. 【48】数据扩充(Data augmentation)

    数据扩充(Data augmentation) 大部分的计算机视觉任务使用很多的数据,所以数据扩充是经常使用的一种技巧来提高计算机视觉系统的表现.我认为计算机视觉是一个相当复杂的工作,你需要输入图像的 ...

  9. Regularizing Deep Networks with Semantic Data Augmentation

    目录 概 主要内容 代码 Wang Y., Huang G., Song S., Pan X., Xia Y. and Wu C. Regularizing Deep Networks with Se ...

随机推荐

  1. 核心思想:决定你是富人还是穷人的11条标准(有强烈的赚钱意识,这也是他血液里的东西,太精彩了)good

    原文地址:决定你是富人还是穷人的11条标准作者:谢仲华 1.自我认知 穷人:很少想到如何去赚钱和如何才能赚到钱,认为自己一辈子就该这样,不相信会有什么改变. 富人:骨子里就深信自己生下来不是要做穷人, ...

  2. 解决配置Windows Update失败问题

    大家都清楚电脑总是需要更新一些补丁,不过,很多系统用户发现更新了补丁之后,开机会出现windows update更新失败的情况,提示“配置Windows Update失败,还原更改,请勿关闭计算机”信 ...

  3. 学习h264 的语法规则,如何才能看懂H264 的官方文档

    1. 今天想查h264 的帧率,查找资料如下: 首先要解析sps,得到两个关键的数值: num_units_in_tick, time_scale fps=time_scale/num_units_i ...

  4. jsp的环境搭建

    JSP : 动态网页 一.静态和动态的区别: 1.是否会随着时间.地点.用户操作的改变而改变. 2.动态网页需要使用服务器端的脚本语言(JSP) 二.BS CS 1.CS:QQ.微信.CS游戏. 缺点 ...

  5. mysql 状态锁 连接数

    show OPEN TABLES where In_use > 0; show status like 'Table%'; SELECT * FROM information_schema.in ...

  6. FPC导通阻抗计算

    pc线路板是有导电功能的,那么如何仅适用手工计算出线路的阻值能?那么就需要使用到一个公式: W*R*T=6000 W是指铜箔的宽度单位是密耳mil. T是指铜箔厚度单位是盎司oz. R是指铜箔的电阻单 ...

  7. Oracle中exp导出与imp导入的参数(full,owner/formuser/touser)测试

    1.exp导出的参数(FULL,OWNER)测试 先知道的一点是full不能与owner共存,还有都是以用户的方式导出(在这里),其中不仅仅包括表,这可能就是下面报warnings的原因,因为Orac ...

  8. git二、基本使用

    1:创建仓库  git init - 当前目录下初始化仓库,根目录产生.git文件-包含元数据文件,为其他git命令提供环境 2:克隆仓库  git clone url - 拷贝一个 Git 仓库到本 ...

  9. 60.Vue:将px转化为rem,适配移动端

    1.下载lib-flexible 我使用的是vue-cli+webpack,所以是通过npm来安装的 npm i lib-flexible --save 2.引入lib-flexible 在main. ...

  10. UIManager

    创建UIManager,管理所有UI面板 准备工作: 1. 创建Canvas并设置Tag为Main Canvas 2. 在Canvas下新建五个层级节点,因为UGUI显示层级越往下越靠前 using ...