Dropout & Maxout

This is the 8th post of a series of posts I planned about a journal of myself studying deep learning in Professor Bhiksha Raj's course, deep learning lab. I decided to write these posts as notes of my learning process and I hope these posts can help the others with similar background. 
Back to Content Page
--------------------------------------------------------------------
PDF Version Available Here
--------------------------------------------------------------------
In the last post when we looked at the techniques for convolutional neural networks, we have mentioned dropout as a technique to control sparsity. Here let's look at the details of it and let's look at another similar technique called maxout. Again, these techniques are not constrained only to convolutional neural networks, but can be applied to almost any deep networks, or at least feedforward deep networks.

Dropout

Dropout is famous, and powerful, and simple. Despite the fact that dropout is widely used and very powerful, the idea is actually simple: randomly dropping out some of the units while training. One case can be showed as in the following figure.

Figure 1. An illustration of the idea of dropout

To state this a little more formally: one each training case, each hidden unit is randomly omitted from the network with a probability of p. One thing to notice though, the selected dropout units are different for each training instance, that's why this is more of a training problem, rather than an architecture problem.
As stated in the origin paper by Hilton et al, another view to look at dropout makes this solution interesting. Dropout can be seen as an efficient way to perform model averaging across a large number of different neural networks, where overfitting can be avoided with much less cost of computation.
Initially in the paper, dropout is discussed under p=0.5, but of course it could basically set up to any probability.  

Maxout

Maxout is an idea derived for dropout. It is simply an activation function that takes the max of input, but when it works with dropout, it can reinforce the properties dropout has: improve the accuracy of fast model averaging technique and facilitate optimization. 
Different from max-pooling, maxout is based on a whole hidden layer that is built upon the layer we are interested in, so it's more like a layerwise activation function. As stated by the original paper from Ian, with these hidden layers that only consider the max of input, the network remains the same power of universal approximation. The reasoning is not very different from what we did in the 3rd post of this series on universal approximation power.  
 
Despite of the fact that maxout is an idea that works derived on dropout and works better, maxout can only be implemented on feedforward neural networks like multi-layer perceptron or convolutional neural networks. In contrast, dropout is a fundamental idea, though simple, that can work for basically any networks. Dropout is more like the idea of bagging, both in the sense of bagging's ability to increase accuracy by model averaging, and in the sense of bagging's widely adaption that can be integrated with almost any machine learning algorithm. 
 
In this post we have talked about two simple and powerful ideas that can help to increase the accuracy with model averaging technique. In the next post, let's move back to the track of network architectures and start to talk generative models' network architecture. 
----------------------------------------------
If you find this helpful, please cite:
Wang, Haohan, and Bhiksha Raj. "A Survey: Time Travel in Deep Learning Space: An Introduction to Deep Learning Models and How Deep Learning Models Evolved from the Initial Ideas." arXiv preprint arXiv:1510.04781 (2015).
----------------------------------------------

By Haohan Wang
Note: I am still a student learning everything, there may be mistakes due to my limited knowledge. Please feel free to tell me wherever you find incorrect or uncomfortable with. Thank you.

Main Reference:

  1. Hinton, Geoffrey E., et al. "Improving neural networks by preventing co-adaptation of feature detectors." arXiv preprint arXiv:1207.0580 (2012).
  2. Goodfellow, Ian J., et al. "Maxout networks." arXiv preprint arXiv:1302.4389 (2013).

Dropout & Maxout的更多相关文章

  1. Deep learning:四十五(maxout简单理解)

    maxout出现在ICML2013上,作者Goodfellow将maxout和dropout结合后,号称在MNIST, CIFAR-10, CIFAR-100, SVHN这4个数据上都取得了start ...

  2. [转]理解dropout

    理解dropout 原文地址:http://blog.csdn.net/stdcoutzyx/article/details/49022443     理解dropout 注意:图片都在github上 ...

  3. 激活函数(ReLU, Swish, Maxout)

    神经网络中使用激活函数来加入非线性因素,提高模型的表达能力. ReLU(Rectified Linear Unit,修正线性单元) 形式如下: \[ \begin{equation} f(x)= \b ...

  4. 【机器学习】激活函数(ReLU, Swish, Maxout)

    https://blog.csdn.net/ChenVast/article/details/81382939 神经网络中使用激活函数来加入非线性因素,提高模型的表达能力. ReLU(Rectifie ...

  5. 激活函数--(Sigmoid,tanh,Relu,maxout)

    Question? 激活函数是什么? 激活函数有什么用? 激活函数怎么用? 激活函数有哪几种?各自特点及其使用场景? 1.激活函数 1.1激活函数是什么? 激活函数的主要作用是提供网络的非线性建模能力 ...

  6. 深度学习方法(十):卷积神经网络结构变化——Maxout Networks,Network In Network,Global Average Pooling

    欢迎转载,转载请注明:本文出自Bin的专栏blog.csdn.net/xbinworld. 技术交流QQ群:433250724,欢迎对算法.技术感兴趣的同学加入. 最近接下来几篇博文会回到神经网络结构 ...

  7. 理解dropout

    理解dropout 注意:图片都在github上放着,如果刷不开的话,可以考虑FQ. 转载请注明:http://blog.csdn.net/stdcoutzyx/article/details/490 ...

  8. 激活函数,Batch Normalization和Dropout

    神经网络中还有一些激活函数,池化函数,正则化和归一化函数等.需要详细看看,啃一啃吧.. 1. 激活函数 1.1 激活函数作用 在生物的神经传导中,神经元接受多个神经的输入电位,当电位超过一定值时,该神 ...

  9. 在RNN中使用Dropout

    dropout在前向神经网络中效果很好,但是不能直接用于RNN,因为RNN中的循环会放大噪声,扰乱它自己的学习.那么如何让它适用于RNN,就是只将它应用于一些特定的RNN连接上.   LSTM的长期记 ...

随机推荐

  1. Gradle用户使用指南

    转载请事先沟通,未经允许,谢绝转载. 1. 新工具介绍(Introduction) 能够复用代码和资源能够构建几种不同版本参数的应用能够配置.扩展.自定义构建过程1.1 为什么选择Gradle(Why ...

  2. 大话重构 之 原来反OO天天见

    在OO(面向对象)时代长大的小伙伴们一定记得: 面向对象的基石:把数据和依赖该数据的行为封装在一起. 但我们经常遇到一个类依赖其它类的数据的情况.不多的话,正常,对象间势必存在交互,毕竟完全独立的类无 ...

  3. 关于VS2017提示I/O文件操作函数需要加上_s的解决办法

    最近不论是在写网络编程还是在写小项目时候,都会提示让我用加个_s的函数........ 其实加上_s这个函数是为了确保函数的安全性,确保不会有内存不够或者溢出的情况.但是每次都需要重新看一下_s函数的 ...

  4. 转 用Oracle自带脚本 重建WMSYS用户的WMSYS.WM_CONCAT函数

    https://blog.csdn.net/huaishuming/article/details/41726659?locationNum=1

  5. D. Vitya and Strange Lesson

    http://codeforces.com/contest/842/problem/D 1.整体的数组是不用变的,比如数组a[]经过一次询问x后,然后再询问y,相当于询问x ^ y ^ a[i]后的m ...

  6. list map set常用方法之list

    list 常用方法: 默认添加:list.add(e); 指定下标添加(添加后下标后的元素向后挪一位):list.add(index,e); 获得集合内元素个数:list.size(); 返回是否删除 ...

  7. Chapter 18 MySQL NDB Cluster 7.3 and NDB Cluster 7.4渣翻

    Table of Contents 18.1 NDB Cluster Overview      18.2 NDB Cluster Installation      18.3 Configurati ...

  8. MS Chart 条状图【转】

    private void Form1_Load(object sender, EventArgs e) {            string sql1 = "select  类别,coun ...

  9. jQuery 获取和设置表单元素

    jQuery提供了val()方法,使用它我们可以快速地获取和设置表单的文本框.单选按钮.以及单选按钮的值. 使用val()不带参数,表示获取元素的值 使用val()给定参数,则表示把值赋给元素 如下: ...

  10. 【转】sql server数据库操作大全——常用语句/技巧集锦/经典语句

    本文为累计整理,有点乱,凑合着看吧! ☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆ ☆ ☆ ☆ sql 宝 典 ☆ ☆ ☆ 2012年-8月 修订版 ☆ ...