Theano3.4-练习之多层感知机
来自http://deeplearning.net/tutorial/mlp.html#mlp
Multilayer Perceptron
note:这部分假设读者已经通读之前的一个练习 Classifying
MNIST digits using Logistic Regression.(http://blog.csdn.net/shouhuxianjian/article/details/46375461)。另外,它使用新的theano函数和概念: T.tanh, shared
variables, basic
arithmetic ops, T.grad, L1
and L2 regularization, floatX。如果你想要在GPU上运行代码,记得看GPU.
note:这部分的代码可以从这里下载here.
接下来要呈现的使用theano的架构是单隐藏层多层感知机(MLP)。一个MLP可以被视为一个逻辑回归分类器,其中的输入首先通过学到的非线性 来转换。该转换是将输入数据映射到一个空间中,在该空间中不同的类别可以线性可分。中间层也就是指隐藏层。一个隐藏层已经足够让MLPs成为一个通用的逼近器。然而我们随后看到的是在使用许多这样的隐藏层之后可以得到很大的好处,即深度学习的前提条件(指的是隐藏层必须超过一层)。可以看这些课程的笔记:ntroduction
to MLPs, the back-propagation algorithm, and how to train MLPs.
该教程依然是在MNIST数字分类上来介绍的。
一、模型
有着单一隐藏层的MLP(或者人工神经网络,ANN)的图示如下:
正式的说,一层隐藏层MLP表示为函数形式: ,这里
是输入向量
的size,
是输出向量
的size,表示矩阵符号形式如下:
有着偏置向量 ,
,权重矩阵
,
,激活函数
和s。
向量构成这个隐藏层。
是连接输入向量到隐藏层之间的权重矩阵。每一列
表示输入单元到第
i 个隐藏单元的权重。对s 的选择通常是tanh: 或者逻辑sigmoid函数:
。我们在这个教程中将会使用tanh,因为它的训练速度一般可以更快(而且有时候有着更好的局部最小)。tanh和sigmoid都是标量to标量的函数,不过它们自然的扩展到向量和张量的时候都是逐元素计算的(例如,在向量的每个元素上独立计算,生成一个同样size的向量)。
输出向量计算结果为:。读者应该在前一个练习(Theano3.3-练习之逻辑回归)就该看过这个形式了,和之前一样,属于哪一类的概率可以通过选择
为softmax函数来计算得到(多类分类情况下)。
为了训练一个MLP,我们需要对这个模型的所有参数进行学习,这里我们使用带有minibatch的 Stochastic
Gradient Descent 。需要学习的参数集就是: 。可以通过BP算法(导数链式规则的特殊情况)来得到梯度
。不过幸运的是,因为theano可以自动的进行求导微分,我们不需要在本教程中介绍如何求导
二、从LR到MLP
该教程关注的是单隐藏层的MLP。所以先编写单层隐藏层的类。为了构造这个MLP,我们随后只需要在顶部放上一个逻辑回归层就好:
class HiddenLayer(object):
def __init__(self, rng, input, n_in, n_out, W=None, b=None,
activation=T.tanh):
"""
Typical hidden layer of a MLP: units are fully-connected and have
sigmoidal activation function. Weight matrix W is of shape (n_in,n_out)
and the bias vector b is of shape (n_out,). NOTE : The nonlinearity used here is tanh Hidden unit activation is given by: tanh(dot(input,W) + b) :type rng: numpy.random.RandomState
:param rng: a random number generator used to initialize weights :type input: theano.tensor.dmatrix
:param input: a symbolic tensor of shape (n_examples, n_in) :type n_in: int
:param n_in: dimensionality of input :type n_out: int
:param n_out: number of hidden units :type activation: theano.Op or function
:param activation: Non linearity to be applied in the hidden
layer
"""
self.input = input
隐藏层
i 的权重的初始化值需要从依赖于激活函数的对称间隔上统一采样得到。对于tanh激活函数,在 [Xavier10] 中的获得的结果上来看,这个间隔应该是。这里
是第
-th层的单元个数,
是第
-th层的单元个数。对于sigmoid函数来说,间隔是
。在训练的早期,这个初始化是可以确保每个神经元会在它的激活函数的变化较大的区域部分,使得能够很容易往上传播(从输入到输出方向)和往回传播(梯度从输出到输入方向):
# `W` is initialized with `W_values` which is uniformely sampled
# from sqrt(-6./(n_in+n_hidden)) and sqrt(6./(n_in+n_hidden))
# for tanh activation function
# the output of uniform if converted using asarray to dtype
# theano.config.floatX so that the code is runable on GPU
# Note : optimal initialization of weights is dependent on the
# activation function used (among other things).
# For example, results presented in [Xavier10] suggest that you
# should use 4 times larger initial weights for sigmoid
# compared to tanh
# We have no info for other function, so we use the same as
# tanh.
if W is None:
W_values = numpy.asarray(
rng.uniform(
low=-numpy.sqrt(6. / (n_in + n_out)),
high=numpy.sqrt(6. / (n_in + n_out)),
size=(n_in, n_out)
),
dtype=theano.config.floatX
)
if activation == theano.tensor.nnet.sigmoid:
W_values *= 4 W = theano.shared(value=W_values, name='W', borrow=True) if b is None:
b_values = numpy.zeros((n_out,), dtype=theano.config.floatX)
b = theano.shared(value=b_values, name='b', borrow=True) self.W = W
self.b = b
注意到我们使用了一个给定的非线性函数作为隐藏层的激活函数。默认情况下是tanh,不过在许多情况下我们想使用其他激活函数:
lin_output = T.dot(input, self.W) + self.b
self.output = (
lin_output if activation is None
else activation(lin_output)
)
如果深入原理部分,这个类实现graph的时候需要计算隐藏层的值 。如果给graph的输入和LogisticRegression类一样,就像之前的教程一样,就可以得到MLP的输出。下面的是MLP类的简单实现代码:
class MLP(object):
"""Multi-Layer Perceptron Class A multilayer perceptron is a feedforward artificial neural network model
that has one layer or more of hidden units and nonlinear activations.
Intermediate layers usually have as activation function tanh or the
sigmoid function (defined here by a ``HiddenLayer`` class) while the
top layer is a softmax layer (defined here by a ``LogisticRegression``
class).
""" def __init__(self, rng, input, n_in, n_hidden, n_out):
"""Initialize the parameters for the multilayer perceptron :type rng: numpy.random.RandomState
:param rng: a random number generator used to initialize weights :type input: theano.tensor.TensorType
:param input: symbolic variable that describes the input of the
architecture (one minibatch) :type n_in: int
:param n_in: number of input units, the dimension of the space in
which the datapoints lie :type n_hidden: int
:param n_hidden: number of hidden units :type n_out: int
:param n_out: number of output units, the dimension of the space in
which the labels lie """ # Since we are dealing with a one hidden layer MLP, this will translate
# into a HiddenLayer with a tanh activation function connected to the
# LogisticRegression layer; the activation function can be replaced by
# sigmoid or any other nonlinear function
self.hiddenLayer = HiddenLayer(
rng=rng,
input=input,
n_in=n_in,
n_out=n_hidden,
activation=T.tanh
) # The logistic regression layer gets as input the hidden units
# of the hidden layer
self.logRegressionLayer = LogisticRegression(
input=self.hiddenLayer.output,
n_in=n_hidden,
n_out=n_out
)
在这个教程中,我们同样会使用L1和L2正则化( L1
and L2 regularization)。同时我们需要计算L1范数和权重 的L2范数的平方:
# L1 norm ; one regularization option is to enforce L1 norm to
# be small
self.L1 = (
abs(self.hiddenLayer.W).sum()
+ abs(self.logRegressionLayer.W).sum()
) # square of L2 norm ; one regularization option is to enforce
# square of L2 norm to be small
self.L2_sqr = (
(self.hiddenLayer.W ** 2).sum()
+ (self.logRegressionLayer.W ** 2).sum()
) # negative log likelihood of the MLP is given by the negative
# log likelihood of the output of the model, computed in the
# logistic regression layer
self.negative_log_likelihood = (
self.logRegressionLayer.negative_log_likelihood
)
# same holds for the function computing the number of errors
self.errors = self.logRegressionLayer.errors # the parameters of the model are the parameters of the two layer it is
# made out of
self.params = self.hiddenLayer.params + self.logRegressionLayer.params
就像之前一样,通过MSGD来训练,不同之处在于我们会修改cost函数,使得它包含正则化项。L1_reg和L2_reg是用来控制整个cost函数中的正则化项权重的超参数。新的cost的代码如下:
# the cost we minimize during training is the negative log likelihood of
# the model plus the regularization terms (L1 and L2); cost is expressed
# here symbolically
cost = (
classifier.negative_log_likelihood(y)
+ L1_reg * classifier.L1
+ L2_reg * classifier.L2_sqr
)
然后使用梯度来更新模型的参数。这里的代码差不多和逻辑回归的代码一样。只有参数的个数不同。为了避开这个问题(然代码可以用在任意数量的参数上),我们将会创建带有params的模型来生成参数列表然后对它进行解析,每一步计算一个替代:
# compute the gradient of cost with respect to theta (sotred in params)
# the resulting gradients will be stored in a list gparams
gparams = [T.grad(cost, param) for param in classifier.params] # specify how to update the parameters of the model as a list of
# (variable, update expression) pairs # given two list the zip A = [a1, a2, a3, a4] and B = [b1, b2, b3, b4] of
# same length, zip generates a list C of same size, where each element
# is a pair formed from the two lists :
# C = [(a1, b1), (a2, b2), (a3, b3), (a4, b4)]
updates = [
(param, param - learning_rate * gparam)
for param, gparam in zip(classifier.params, gparams)
] # compiling a Theano function `train_model` that returns the cost, but
# in the same time updates the parameter of the model based on the rules
# defined in `updates`
train_model = theano.function(
inputs=[index],
outputs=cost,
updates=updates,
givens={
x: train_set_x[index * batch_size: (index + 1) * batch_size],
y: train_set_y[index * batch_size: (index + 1) * batch_size]
}
)
三、将上面的部分合并到一起
在了解了基本的概念之后,写一个MLP类变得非常容易。下面的代码就是过程,类似于之前的LR实现的方式:
"""
This tutorial introduces the multilayer perceptron using Theano. A multilayer perceptron is a logistic regressor where
instead of feeding the input to the logistic regression you insert a
intermediate layer, called the hidden layer, that has a nonlinear
activation function (usually tanh or sigmoid) . One can use many such
hidden layers making the architecture deep. The tutorial will also tackle
the problem of MNIST digit classification. .. math:: f(x) = G( b^{(2)} + W^{(2)}( s( b^{(1)} + W^{(1)} x))), References: - textbooks: "Pattern Recognition and Machine Learning" -
Christopher M. Bishop, section 5 """
__docformat__ = 'restructedtext en' import os
import sys
import time import numpy import theano
import theano.tensor as T from logistic_sgd import LogisticRegression, load_data # start-snippet-1
class HiddenLayer(object):
def __init__(self, rng, input, n_in, n_out, W=None, b=None,
activation=T.tanh):
"""
Typical hidden layer of a MLP: units are fully-connected and have
sigmoidal activation function. Weight matrix W is of shape (n_in,n_out)
and the bias vector b is of shape (n_out,). NOTE : The nonlinearity used here is tanh Hidden unit activation is given by: tanh(dot(input,W) + b) :type rng: numpy.random.RandomState
:param rng: a random number generator used to initialize weights :type input: theano.tensor.dmatrix
:param input: a symbolic tensor of shape (n_examples, n_in) :type n_in: int
:param n_in: dimensionality of input :type n_out: int
:param n_out: number of hidden units :type activation: theano.Op or function
:param activation: Non linearity to be applied in the hidden
layer
"""
self.input = input
# end-snippet-1 # `W` is initialized with `W_values` which is uniformely sampled
# from sqrt(-6./(n_in+n_hidden)) and sqrt(6./(n_in+n_hidden))
# for tanh activation function
# the output of uniform if converted using asarray to dtype
# theano.config.floatX so that the code is runable on GPU
# Note : optimal initialization of weights is dependent on the
# activation function used (among other things).
# For example, results presented in [Xavier10] suggest that you
# should use 4 times larger initial weights for sigmoid
# compared to tanh
# We have no info for other function, so we use the same as
# tanh.
if W is None:
W_values = numpy.asarray(
rng.uniform(
low=-numpy.sqrt(6. / (n_in + n_out)),
high=numpy.sqrt(6. / (n_in + n_out)),
size=(n_in, n_out)
),
dtype=theano.config.floatX
)
if activation == theano.tensor.nnet.sigmoid:
W_values *= 4 W = theano.shared(value=W_values, name='W', borrow=True) if b is None:
b_values = numpy.zeros((n_out,), dtype=theano.config.floatX)
b = theano.shared(value=b_values, name='b', borrow=True) self.W = W
self.b = b lin_output = T.dot(input, self.W) + self.b
self.output = (
lin_output if activation is None
else activation(lin_output)
)
# parameters of the model
self.params = [self.W, self.b] # start-snippet-2
class MLP(object):
"""Multi-Layer Perceptron Class A multilayer perceptron is a feedforward artificial neural network model
that has one layer or more of hidden units and nonlinear activations.
Intermediate layers usually have as activation function tanh or the
sigmoid function (defined here by a ``HiddenLayer`` class) while the
top layer is a softmax layer (defined here by a ``LogisticRegression``
class).
""" def __init__(self, rng, input, n_in, n_hidden, n_out):
"""Initialize the parameters for the multilayer perceptron :type rng: numpy.random.RandomState
:param rng: a random number generator used to initialize weights :type input: theano.tensor.TensorType
:param input: symbolic variable that describes the input of the
architecture (one minibatch) :type n_in: int
:param n_in: number of input units, the dimension of the space in
which the datapoints lie :type n_hidden: int
:param n_hidden: number of hidden units :type n_out: int
:param n_out: number of output units, the dimension of the space in
which the labels lie """ # Since we are dealing with a one hidden layer MLP, this will translate
# into a HiddenLayer with a tanh activation function connected to the
# LogisticRegression layer; the activation function can be replaced by
# sigmoid or any other nonlinear function
self.hiddenLayer = HiddenLayer(
rng=rng,
input=input,
n_in=n_in,
n_out=n_hidden,
activation=T.tanh
) # The logistic regression layer gets as input the hidden units
# of the hidden layer
self.logRegressionLayer = LogisticRegression(
input=self.hiddenLayer.output,
n_in=n_hidden,
n_out=n_out
)
# end-snippet-2 start-snippet-3
# L1 norm ; one regularization option is to enforce L1 norm to
# be small
self.L1 = (
abs(self.hiddenLayer.W).sum()
+ abs(self.logRegressionLayer.W).sum()
) # square of L2 norm ; one regularization option is to enforce
# square of L2 norm to be small
self.L2_sqr = (
(self.hiddenLayer.W ** 2).sum()
+ (self.logRegressionLayer.W ** 2).sum()
) # negative log likelihood of the MLP is given by the negative
# log likelihood of the output of the model, computed in the
# logistic regression layer
self.negative_log_likelihood = (
self.logRegressionLayer.negative_log_likelihood
)
# same holds for the function computing the number of errors
self.errors = self.logRegressionLayer.errors # the parameters of the model are the parameters of the two layer it is
# made out of
self.params = self.hiddenLayer.params + self.logRegressionLayer.params
# end-snippet-3 def test_mlp(learning_rate=0.01, L1_reg=0.00, L2_reg=0.0001, n_epochs=1000,
dataset='mnist.pkl.gz', batch_size=20, n_hidden=500):
"""
Demonstrate stochastic gradient descent optimization for a multilayer
perceptron This is demonstrated on MNIST. :type learning_rate: float
:param learning_rate: learning rate used (factor for the stochastic
gradient :type L1_reg: float
:param L1_reg: L1-norm's weight when added to the cost (see
regularization) :type L2_reg: float
:param L2_reg: L2-norm's weight when added to the cost (see
regularization) :type n_epochs: int
:param n_epochs: maximal number of epochs to run the optimizer :type dataset: string
:param dataset: the path of the MNIST dataset file from
http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz """
datasets = load_data(dataset) train_set_x, train_set_y = datasets[0]
valid_set_x, valid_set_y = datasets[1]
test_set_x, test_set_y = datasets[2] # compute number of minibatches for training, validation and testing
n_train_batches = train_set_x.get_value(borrow=True).shape[0] / batch_size
n_valid_batches = valid_set_x.get_value(borrow=True).shape[0] / batch_size
n_test_batches = test_set_x.get_value(borrow=True).shape[0] / batch_size ######################
# BUILD ACTUAL MODEL #
######################
print '... building the model' # allocate symbolic variables for the data
index = T.lscalar() # index to a [mini]batch
x = T.matrix('x') # the data is presented as rasterized images
y = T.ivector('y') # the labels are presented as 1D vector of
# [int] labels rng = numpy.random.RandomState(1234) # construct the MLP class
classifier = MLP(
rng=rng,
input=x,
n_in=28 * 28,
n_hidden=n_hidden,
n_out=10
) # start-snippet-4
# the cost we minimize during training is the negative log likelihood of
# the model plus the regularization terms (L1 and L2); cost is expressed
# here symbolically
cost = (
classifier.negative_log_likelihood(y)
+ L1_reg * classifier.L1
+ L2_reg * classifier.L2_sqr
)
# end-snippet-4 # compiling a Theano function that computes the mistakes that are made
# by the model on a minibatch
test_model = theano.function(
inputs=[index],
outputs=classifier.errors(y),
givens={
x: test_set_x[index * batch_size:(index + 1) * batch_size],
y: test_set_y[index * batch_size:(index + 1) * batch_size]
}
) validate_model = theano.function(
inputs=[index],
outputs=classifier.errors(y),
givens={
x: valid_set_x[index * batch_size:(index + 1) * batch_size],
y: valid_set_y[index * batch_size:(index + 1) * batch_size]
}
) # start-snippet-5
# compute the gradient of cost with respect to theta (sotred in params)
# the resulting gradients will be stored in a list gparams
gparams = [T.grad(cost, param) for param in classifier.params] # specify how to update the parameters of the model as a list of
# (variable, update expression) pairs # given two list the zip A = [a1, a2, a3, a4] and B = [b1, b2, b3, b4] of
# same length, zip generates a list C of same size, where each element
# is a pair formed from the two lists :
# C = [(a1, b1), (a2, b2), (a3, b3), (a4, b4)]
updates = [
(param, param - learning_rate * gparam)
for param, gparam in zip(classifier.params, gparams)
] # compiling a Theano function `train_model` that returns the cost, but
# in the same time updates the parameter of the model based on the rules
# defined in `updates`
train_model = theano.function(
inputs=[index],
outputs=cost,
updates=updates,
givens={
x: train_set_x[index * batch_size: (index + 1) * batch_size],
y: train_set_y[index * batch_size: (index + 1) * batch_size]
}
)
# end-snippet-5 ###############
# TRAIN MODEL #
###############
print '... training' # early-stopping parameters
patience = 10000 # look as this many examples regardless
patience_increase = 2 # wait this much longer when a new best is
# found
improvement_threshold = 0.995 # a relative improvement of this much is
# considered significant
validation_frequency = min(n_train_batches, patience / 2)
# go through this many
# minibatche before checking the network
# on the validation set; in this case we
# check every epoch best_validation_loss = numpy.inf
best_iter = 0
test_score = 0.
start_time = time.clock() epoch = 0
done_looping = False while (epoch < n_epochs) and (not done_looping):
epoch = epoch + 1
for minibatch_index in xrange(n_train_batches): minibatch_avg_cost = train_model(minibatch_index)
# iteration number
iter = (epoch - 1) * n_train_batches + minibatch_index if (iter + 1) % validation_frequency == 0:
# compute zero-one loss on validation set
validation_losses = [validate_model(i) for i
in xrange(n_valid_batches)]
this_validation_loss = numpy.mean(validation_losses) print(
'epoch %i, minibatch %i/%i, validation error %f %%' %
(
epoch,
minibatch_index + 1,
n_train_batches,
this_validation_loss * 100.
)
) # if we got the best validation score until now
if this_validation_loss < best_validation_loss:
#improve patience if loss improvement is good enough
if (
this_validation_loss < best_validation_loss *
improvement_threshold
):
patience = max(patience, iter * patience_increase) best_validation_loss = this_validation_loss
best_iter = iter # test it on the test set
test_losses = [test_model(i) for i
in xrange(n_test_batches)]
test_score = numpy.mean(test_losses) print((' epoch %i, minibatch %i/%i, test error of '
'best model %f %%') %
(epoch, minibatch_index + 1, n_train_batches,
test_score * 100.)) if patience <= iter:
done_looping = True
break end_time = time.clock()
print(('Optimization complete. Best validation score of %f %% '
'obtained at iteration %i, with test performance %f %%') %
(best_validation_loss * 100., best_iter + 1, test_score * 100.))
print >> sys.stderr, ('The code for file ' +
os.path.split(__file__)[1] +
' ran for %.2fm' % ((end_time - start_time) / 60.)) if __name__ == '__main__':
test_mlp()
用户可以如下这样运行这个代码:
python code/mlp.py
输出会有如下的形式:
Optimization complete. Best validation score of 1.690000 % obtained at iteration 2070000, with test performance 1.650000 %
The code for file mlp.py ran for 97.34m
在Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz 上,该代码的速度大约为10.3 epoch/minute,并且在828 epochs的时候达到了测试错误率为1.65%。为了更好的了解MNIST上的结果,推荐读者去 this 看看不同算法结果比较。
四、训练MLPs的提示和技巧
在上面的代码中有许多超参数,它们不是被(通常来说也不能被)梯度下降而优化的。严格来说,找到一组最优超参数的值不是个容易解决的问题。首先,我们不能简单的将它们独立的进行优化。其次,我们不能容易的使用和前面介绍的梯度技术来处理(部分原因是因为一些参数离散值而另一些是实值)。第三,这个最优化问题不是凸优化和找到一个(局部)最小值的工作量可不小。好消息是在过去的25年中,研究者发明了各种经验规则来选择NN中的超参数。一个非常好的有关这些技巧的综述是由Yann
LeCun ,Leon Bottou, Genevieve Orr, and Klaus-Robert Mueller写的 Efficient
BackProp。这里,我们归纳下这些同样的问题,并重点关注我们代码中实际用到的参数和技术。
非线性
两个最常用的激活函数就是tanh和sigmoid函数。和 Section
4.4,中解释的原因一样,这两个非线性是中心对称的,这它们就能在下一层的时候生成的是0均值的输入(这是一个理想的属性)。经验上来时,我们发现tanh有着更好的收敛特性。(当然在2015年现在有relu和prelu等其他的激活函数,有兴趣的可以了解下)。
权重初始化
在初始化的时候,我们想要权重围绕着原点(即数值0)足够小,这样激活函数就能呈现线性操作的趋势(这个看了sigmoid的函数图就能明白,在0点附近趋近于线性),在这个区域上梯度是最大的。其他理想的特性,特别对于深度网络来说,保存的激活函数的方差就像是从层到层的BP梯度的方差一样。这使得信息能够在网络中向上和向下传播,并且减少层间的差异。在某些假设的基础上,一个介于这两个约束条件的折中会导致有下面的初始化区别:
tanh的初始化:
sigmoid的初始化:
这里 是输入的个数,
是隐藏单元的个数。数学上的思考可以参考 [Xavier10]。
学习率
有许多文献是关注于如何选取一个好的学习率。最简单的解决方法就是简单的选择一个常量。经验规则:尝试几个log空间的值(),并缩小(对数)网格区域搜索到你得到的最低验证集误差的那个区域。
随着时间来降低学习率是一个好想法,简单的方法就是 ,这里
是初始化率(一般是用上面说的网格搜索技术来选择的),
被称为“下降常量”用来控制学习率下降的速率(通常来说,是一个更小的正数,
或者更小),
是epoch//stage。
Section
4.7 详细介绍了为每个参数(权重)选择一个学习率的过程,和基于分类器的误差来自适应的对它们进行选择。
隐藏单元个数
超参数是非常的数据集依赖的。含糊的说,更复杂的输入分布就需要具有更大能力(capacity)的网络来对它进行建模,同样的也就需要更多的隐藏单元(注意到一层中权重的个数,这通常是一个更加直观的可以用来测量网络能力(capacity)的方法,也就是 (
是输入单元的数量,而
是隐藏单元的数量))。
除非我们使用一些正则化方案(早期停止或者L1/L2惩罚),隐藏单元个数
vs 泛化效果graph这两者呈现的是U的形状(即在中间某个点上是最好的权衡点,两头都是独立上升的)。
正则化参数
通常用来试探L1/L2正则化参数 的值是
。在这个框架中,我们到目前介绍的
优化这些参数不会明显的得到更好的结果,不过却值得探索。
参考资料:
[1] 官网:http://deeplearning.net/tutorial/mlp.html#mlp
[2] Deep learning with Theano
官方中文教程(翻译)(三)——多层感知机(MLP):http://www.cnblogs.com/charleshuang/p/3648804.html
Theano3.4-练习之多层感知机的更多相关文章
- DeepLearning学习(1)--多层感知机
想直接学习卷积神经网络,结果发现因为神经网络的基础较弱,学习起来比较困难,所以准备一步步学.并记录下来,其中会有很多摘抄. (一)什么是多层感知器和反向传播 1,单个神经元 神经网络的基本单元就是神经 ...
- 学习笔记TF026:多层感知机
隐含层,指除输入.输出层外,的中间层.输入.输出层对外可见.隐含层对外不可见.理论上,只要隐含层节点足够多,只有一个隐含层,神经网络可以拟合任意函数.隐含层越多,越容易拟合复杂函数.拟合复杂函数,所需 ...
- 『TensorFlow』读书笔记_多层感知机
多层感知机 输入->线性变换->Relu激活->线性变换->Softmax分类 多层感知机将mnist的结果提升到了98%左右的水平 知识点 过拟合:采用dropout解决,本 ...
- MXNET:多层感知机
从零开始 前面了解了多层感知机的原理,我们来实现一个多层感知机. # -*- coding: utf-8 -*- from mxnet import init from mxnet import nd ...
- 基于theano的多层感知机的实现
1.引言 一个多层感知机(Multi-Layer Perceptron,MLP)可以看做是,在逻辑回归分类器的中间加了非线性转换的隐层,这种转换把数据映射到一个线性可分的空间.一个单隐层的MLP就可以 ...
- (数据科学学习手札44)在Keras中训练多层感知机
一.简介 Keras是有着自主的一套前端控制语法,后端基于tensorflow和theano的深度学习框架,因为其搭建神经网络简单快捷明了的语法风格,可以帮助使用者更快捷的搭建自己的神经网络,堪称深度 ...
- (数据科学学习手札34)多层感知机原理详解&Python与R实现
一.简介 机器学习分为很多个领域,其中的连接主义指的就是以神经元(neuron)为基本结构的各式各样的神经网络,规范的定义是:由具有适应性的简单单元组成的广泛并行互连的网络,它的组织能够模拟生物神经系 ...
- DeepLearning tutorial(3)MLP多层感知机原理简介+代码详解
本文介绍多层感知机算法,特别是详细解读其代码实现,基于python theano,代码来自:Multilayer Perceptron,如果你想详细了解多层感知机算法,可以参考:UFLDL教程,或者参 ...
- 动手学深度学习10- pytorch多层感知机从零实现
多层感知机 定义模型的参数 定义激活函数 定义模型 定义损失函数 训练模型 小结 多层感知机 import torch import numpy as np import sys sys.path.a ...
随机推荐
- mysql / mysqld_safe / mysqld 常见错误处理
1. FATAL ERROR: Could not find ./bin/my_print_defaults [root@localhost scripts]# ./mysql_install_db ...
- yum或apt基本源设置指南
关于: 管理Linux服务器的运维或开发人员经常需要安装软件,最常用方式应该是通过Linux系统提供的包管理工具来在线安装,比如centos的yum,ubuntu或debian的apt-get.当然这 ...
- ES6函数剩余参数(Rest Parameters)
我们知道JS函数内部有个arguments对象,可以拿到全部实参.现在ES6给我们带来了一个新的对象,可以拿到除开始参数外的参数,即剩余参数(废话好多 O(∩_∩)O~). 这个新的对象和argume ...
- Java 集合系列13之 WeakHashMap详细介绍(源码解析)和使用示例
概要 这一章,我们对WeakHashMap进行学习.我们先对WeakHashMap有个整体认识,然后再学习它的源码,最后再通过实例来学会使用WeakHashMap.第1部分 WeakHashMap介绍 ...
- 单片机实现60s定时器
单片机573+数码管+按钮 实现60秒的定时器 知识: IE寄存器 TCON寄存器 TMOD 寄存器 /***************** 2个定时中断,2个按钮中断 **************** ...
- [转]Bootstrap 3.0.0 with ASP.NET Web Forms – Step by Step – Without NuGet Package
本文转自:http://www.mytecbits.com/microsoft/dot-net/bootstrap-3-0-0-with-asp-net-web-forms In my earlier ...
- C#读写app.config中的数据
C#读写app.config中的数据 读语句: String str = ConfigurationManager.AppSettings["DemoKey"]; 写语句: Con ...
- jsp前三章测试
(选择一项) A: B: C: D: 正确答案是 B ,B/S架构并不是C/S架构的替代品,有些程序例如大型的网络游戏一般使用的是C/S架构. (选择多项) A: B: C: D: 正确答案是 A,C ...
- hibernate之关联映射
No.1 映射一对多双向关联关系: 当类与类之间建立了关联,就可以方便的从一个对象导航到另一个或另一组与它关联的对象. 步骤一: 注意:hibernate要求在持久化类中定义集合类属性时,必须把属性类 ...
- noip模拟赛(10.4) 序列(sequence)
序列(sequence) [题目描述] 给定一个1~n的排列x,每次你可以将x1~xi翻转.你需要求出将序列变为升序的最小操作次数.有多组数据. [输入数据] 第一行一个整数t表示数据组数. 每组数据 ...