Dropout & Maxout

This is the 8th post of a series of posts I planned about a journal of myself studying deep learning in Professor Bhiksha Raj's course, deep learning lab. I decided to write these posts as notes of my learning process and I hope these posts can help the others with similar background. 
Back to Content Page
--------------------------------------------------------------------
PDF Version Available Here
--------------------------------------------------------------------
In the last post when we looked at the techniques for convolutional neural networks, we have mentioned dropout as a technique to control sparsity. Here let's look at the details of it and let's look at another similar technique called maxout. Again, these techniques are not constrained only to convolutional neural networks, but can be applied to almost any deep networks, or at least feedforward deep networks.

Dropout

Dropout is famous, and powerful, and simple. Despite the fact that dropout is widely used and very powerful, the idea is actually simple: randomly dropping out some of the units while training. One case can be showed as in the following figure.

Figure 1. An illustration of the idea of dropout

To state this a little more formally: one each training case, each hidden unit is randomly omitted from the network with a probability of p. One thing to notice though, the selected dropout units are different for each training instance, that's why this is more of a training problem, rather than an architecture problem.
As stated in the origin paper by Hilton et al, another view to look at dropout makes this solution interesting. Dropout can be seen as an efficient way to perform model averaging across a large number of different neural networks, where overfitting can be avoided with much less cost of computation.
Initially in the paper, dropout is discussed under p=0.5, but of course it could basically set up to any probability.  

Maxout

Maxout is an idea derived for dropout. It is simply an activation function that takes the max of input, but when it works with dropout, it can reinforce the properties dropout has: improve the accuracy of fast model averaging technique and facilitate optimization. 
Different from max-pooling, maxout is based on a whole hidden layer that is built upon the layer we are interested in, so it's more like a layerwise activation function. As stated by the original paper from Ian, with these hidden layers that only consider the max of input, the network remains the same power of universal approximation. The reasoning is not very different from what we did in the 3rd post of this series on universal approximation power.  
 
Despite of the fact that maxout is an idea that works derived on dropout and works better, maxout can only be implemented on feedforward neural networks like multi-layer perceptron or convolutional neural networks. In contrast, dropout is a fundamental idea, though simple, that can work for basically any networks. Dropout is more like the idea of bagging, both in the sense of bagging's ability to increase accuracy by model averaging, and in the sense of bagging's widely adaption that can be integrated with almost any machine learning algorithm. 
 
In this post we have talked about two simple and powerful ideas that can help to increase the accuracy with model averaging technique. In the next post, let's move back to the track of network architectures and start to talk generative models' network architecture. 
----------------------------------------------
If you find this helpful, please cite:
Wang, Haohan, and Bhiksha Raj. "A Survey: Time Travel in Deep Learning Space: An Introduction to Deep Learning Models and How Deep Learning Models Evolved from the Initial Ideas." arXiv preprint arXiv:1510.04781 (2015).
----------------------------------------------

By Haohan Wang
Note: I am still a student learning everything, there may be mistakes due to my limited knowledge. Please feel free to tell me wherever you find incorrect or uncomfortable with. Thank you.

Main Reference:

  1. Hinton, Geoffrey E., et al. "Improving neural networks by preventing co-adaptation of feature detectors." arXiv preprint arXiv:1207.0580 (2012).
  2. Goodfellow, Ian J., et al. "Maxout networks." arXiv preprint arXiv:1302.4389 (2013).

Dropout & Maxout的更多相关文章

  1. Deep learning:四十五(maxout简单理解)

    maxout出现在ICML2013上,作者Goodfellow将maxout和dropout结合后,号称在MNIST, CIFAR-10, CIFAR-100, SVHN这4个数据上都取得了start ...

  2. [转]理解dropout

    理解dropout 原文地址:http://blog.csdn.net/stdcoutzyx/article/details/49022443     理解dropout 注意:图片都在github上 ...

  3. 激活函数(ReLU, Swish, Maxout)

    神经网络中使用激活函数来加入非线性因素,提高模型的表达能力. ReLU(Rectified Linear Unit,修正线性单元) 形式如下: \[ \begin{equation} f(x)= \b ...

  4. 【机器学习】激活函数(ReLU, Swish, Maxout)

    https://blog.csdn.net/ChenVast/article/details/81382939 神经网络中使用激活函数来加入非线性因素,提高模型的表达能力. ReLU(Rectifie ...

  5. 激活函数--(Sigmoid,tanh,Relu,maxout)

    Question? 激活函数是什么? 激活函数有什么用? 激活函数怎么用? 激活函数有哪几种?各自特点及其使用场景? 1.激活函数 1.1激活函数是什么? 激活函数的主要作用是提供网络的非线性建模能力 ...

  6. 深度学习方法(十):卷积神经网络结构变化——Maxout Networks,Network In Network,Global Average Pooling

    欢迎转载,转载请注明:本文出自Bin的专栏blog.csdn.net/xbinworld. 技术交流QQ群:433250724,欢迎对算法.技术感兴趣的同学加入. 最近接下来几篇博文会回到神经网络结构 ...

  7. 理解dropout

    理解dropout 注意:图片都在github上放着,如果刷不开的话,可以考虑FQ. 转载请注明:http://blog.csdn.net/stdcoutzyx/article/details/490 ...

  8. 激活函数,Batch Normalization和Dropout

    神经网络中还有一些激活函数,池化函数,正则化和归一化函数等.需要详细看看,啃一啃吧.. 1. 激活函数 1.1 激活函数作用 在生物的神经传导中,神经元接受多个神经的输入电位,当电位超过一定值时,该神 ...

  9. 在RNN中使用Dropout

    dropout在前向神经网络中效果很好,但是不能直接用于RNN,因为RNN中的循环会放大噪声,扰乱它自己的学习.那么如何让它适用于RNN,就是只将它应用于一些特定的RNN连接上.   LSTM的长期记 ...

随机推荐

  1. [Xcode 实际操作]四、常用控件-(5)UILabel文本标签自定义文字样式

    目录:[Swift]Xcode实际操作 本文将演示给标签对象添加描边效果,在项目文件夹上,点击鼠标右键菜单, 选择[Create File]->[Cocoa Touch Class]->[ ...

  2. nodejs + webpack4 + babel6 结合写Chrome浏览器插件记录

    最近任务不忙,有时间了看一下Chrome插件相关的东西,于是想用nodejs + webpack写一个能直出插件的小工具. 1.nodejs + babel6 + webpack4 在写之前,是有把它 ...

  3. React中的高阶组件

    高阶组件(HOC, High-Order Component)是React中用于重组组件逻辑的高级技术,是一种编程模式而不是React的api. 直观来讲,高阶组件是以某一组件作为参数返回一个新组件的 ...

  4. 2017ACM/ICPC广西邀请赛

    A.A Math Problem #include <bits/stdc++.h> using namespace std; typedef long long ll; inline ll ...

  5. Exadata Smart Flash Logging工作原理

    Exadata在V2时代,ORACLE为了进一步拓宽客户人群,除了宣称Exadata适用OLAP系统,同时也适用于OLTP系统,那怎么才能满足OLTP系统的高IOPS要求呢?于是Exadata引入了闪 ...

  6. JMETER进行REST API测试(分步指南)

    我确定你在这里是因为你需要加载测试Json Rest API.这并不奇怪,因为Rest API现在越来越受欢迎. 这本指南的目的:帮助您进行负载测试一个Json的 REST API 通过一个具体的例子 ...

  7. 根据不同环境配置pom

    clean install clean package -P jt808_dev clean package -P tanway_test -X gps-parent <?xml version ...

  8. js如何删除json里的值

    思路 第一种方法:通过把json中需要的值取出来,重新生成json对象,这种方法比较笨 第二种方法:通过delete 删除属性,这种方法比较常用,在第三方js库中经常能看到,推荐 举例 1 2 3 4 ...

  9. Unity SceneManager 对场景的操作

    用 SceneManager 之前要引用using UnityEngine.SceneManagement; 命名空间. 1.拿到当前场景的名字:SceneManager.GetActiveScene ...

  10. 如何在mssql中获取最新自增ID的值

    @@IDENTITY 返回最后一个插入 IDENTITY 的值,这些操作包括:INSERT, SELECT INTO,或者 bulk copy.如果在给没有 IDENTITY 列的其他表插入记录,系统 ...