The state of the art of non-linearity is to use ReLU instead of sigmoid function in deep neural network, what are the advantages?

I know that training a network when ReLU is used would be faster, and it is more biological inspired, what are the other advantages? (That is, any disadvantages of using sigmoid)?

Best answer in stackexchange:

Two additional major benefits of ReLUs are sparsity and a reduced likelihood of vanishing gradient. But first recall the definition of a ReLU is h=max(0,a)h=max(0,a) where a=Wx+ba=Wx+b.

One major benefit is the reduced likelihood of the gradient to vanish. This arises when a>0a>0. In this regime the gradient has a constant value. In contrast, the gradient of sigmoids becomes increasingly small as the absolute value of x increases. The constant gradient of ReLUs results in faster learning.

The other benefit of ReLUs is sparsity. Sparsity arises when a≤0a≤0. The more such units that exist in a layer the more sparse the resulting representation. Sigmoids on the other hand are always likely to generate some non-zero value resulting in dense representations. Sparse representations seem to be more beneficial than dense representations.

Reference: http://stats.stackexchange.com/questions/126238/what-are-the-advantages-of-relu-over-sigmoid-function-in-deep-neural-network

ReLU

ReLU的全称是rectified linear unit。上面的回答基本上涵盖了它胜过sigmoid function的几个方面：

faster
more biological inspired
sparsity
less chance of vanishing gradient (梯度消失问题)

早期使用sigmoid或tanh激活函数的DL在做unsupervised learning时因为 gradient vanishing problem 的问题会无法收敛。ReLU则这没有这个问题。

What are the advantages of ReLU over sigmoid function in deep neural network?的更多相关文章

Sigmoid function in NN
X = [ones(m, ) X]; temp = X * Theta1'; t = size(temp, ); temp = [ones(t, ) temp]; h = temp * Theta2' ...
S性能 Sigmoid Function or Logistic Function
S性能 Sigmoid Function or Logistic Function octave码 x = -10:0.1:10; y = zeros(length(x), 1); for i = 1 ...
logistic function 和 sigmoid function
简单说, 只要曲线是 “S”形的函数都是sigmoid function: 满足公式<1>的形式的函数都是logistic function. 两者的相同点是: 函数曲线都是“S”形. ...
Sigmoid Function
本系列文章由 @yhl_leo 出品,转载请注明出处. 文章链接: http://blog.csdn.net/yhl_leo/article/details/51734189 Sigmodi 函数是一 ...
sigmoid function vs softmax function
DIFFERENCE BETWEEN SOFTMAX FUNCTION AND SIGMOID FUNCTION 二者主要的区别见于, softmax 用于多分类,sigmoid 则主要用于二分类: ...
sigmoid function的直观解释
Sigmoid function也叫Logistic function, 在logistic regression中扮演将回归估计值h(x)从 [-inf, inf]映射到[0,1]的角色. 公式为: ...
神经网络中的激活函数具体是什么？为什么ReLu要好过于tanh和sigmoid function?（转）
为什么引入激活函数? 如果不用激励函数(其实相当于激励函数是f(x) = x),在这种情况下你每一层输出都是上层输入的线性函数,很容易验证,无论你神经网络有多少层,输出都是输入的线性组合,与没有隐藏层 ...
ReLU 和sigmoid 函数对比
详细对比请查看:http://www.zhihu.com/question/29021768/answer/43517930 . 激活函数的作用: 是为了增加神经网络模型的非线性.否则你想想,没有激活 ...
小白学习之pytorch框架(5)-多层感知机(MLP)-(tensor、variable、计算图、ReLU()、sigmoid()、tanh())
先记录一下一开始学习torch时未曾记录(也未好好弄懂哈)导致又忘记了的tensor.variable.计算图计算图计算图直白的来说,就是数学公式(也叫模型)用图表示,这个图即计算图.借用 htt ...

随机推荐

HDFS的概念
1.数据块每个磁盘都有默认的数据块大小,这是磁盘进行数据读/写的最小单位.构建于单个磁盘之上的文件系统通过磁盘块来管理该文件系统中的块,该文件系统块的大小可以是磁盘块的整数倍.文件系统快一半为几千字 ...
控制Wordpress对搜索引擎的可见性
网站通过Robots协议告诉搜索引擎哪些页面可以抓取,哪些页面不能抓取,这些通过robots.txt体现. wordpress本身没有robots.txt,但是用根目录访问/robots.txt,如果 ...
连接mysql问题 mysqlnd cannot connect to MySQL 4.1+ using old authentication
第一篇:PHP5.3开始使用MySqlND作为默认的MySql访问驱动,而且从这个版本开始将不再支持使用旧的用户接口链接Mysql了,你可能会看到类似的提示: #2000 - mysqlnd cann ...
Oracle序列和索引
序列和索引一.序列 1.序列的概念: 序列(Sequence)是用来生成连续的整数数据的对象.它常常用来作为主键的增长列,可以升序,也可以降序. 2.创建序列: 语法:创建序列 ...
sql 在not in 子查询有null值情况下经常出现的陷阱
如果下:TempSalesPriceFixedValues表和SalesPriceFixedValues表,要求查询出在TempSalesPriceFixedValues表中且不在SalesPrice ...
vim - Highlight unwanted spaces
http://vim.wikia.com/wiki/VimTip396 precondition: set hlsearch" Show all tabs:/\t" Show tr ...
分类指标准确率(Precision)和正确率(Accuracy)的区别
http://www.cnblogs.com/fengfenggirl/p/classification_evaluate.html 一.引言分类算法有很多,不同分类算法又用很多不同的变种.不同的分 ...
jquery判断页面是否滑动到最底部
// 滚动到底部,向下的箭头消失 var $down = $('.down'); var $window = $(window); var $document = $(document); $wind ...
python 安装pillow
安装警告 Pillow >= 2.1.0 不支持 “import _imaging”.请使用 “from PIL.Image import core as _imaging” 代替. 警告 P ...
突破软件试用期的"土方法"
机器上有个试用版的vs2008. 想能正常用,又木有去搜序列号.于是用python纠结了这么一段代码: import time import subprocess import os timeNow ...

What are the advantages of ReLU over sigmoid function in deep neural network?

The state of the art of non-linearity is to use ReLU instead of sigmoid function in deep neural network, what are the advantages?

I know that training a network when ReLU is used would be faster, and it is more biological inspired, what are the other advantages? (That is, any disadvantages of using sigmoid)?

ReLU

What are the advantages of ReLU over sigmoid function in deep neural network?的更多相关文章

随机推荐

热门专题