基于theano的降噪自动编码器(Denoising Autoencoders--DA)
1.自动编码器
自动编码器首先通过下面的映射,把输入 $x\in[0,1]^{d}$映射到一个隐层 $y\in[0,1]^{d^{'}}$(编码器):
$y=s(Wx+b)$
其中 $s$ 是非线性的函数,例如sigmoid. 隐层表示 $y$ ,即编码然后被映射回(通过解码器)一个重构的 $z$,形状与输入$x$ 一样:
$z=s(W^{'}y+b^{'})$
这里 $W^{'}$ 不是表示 $W$ 的转置。$z$ 应该看作在给定编码 $y$ 的条件下对 $x$ 的预测。反向映射的权重矩阵 $W^{'}$ 可能会被约束为前向映射的转置:
$W^{'}=W^{T}$,这就是所谓的tied weights, 所以这个模型的参数($W$, $b$, $b^{'}$)的优化就是使得平均重构误差最小化。
重构误差有很多方式衡量,主要取决于对输入给定的编码的分布的合适的假设。传统的平方差 $L(x,z)=||x-z||^{2}$ 可以应用到此。如果输入可以看作是位向量或者概率向量,交叉熵也可以用来衡量:
$L_{H}(x,z)=-\sum_{k=1}^{d}[x_{k}\log z_{k}+(1-x_{k})\log(1-x_{k})]$
这就是希望编码 $y$ 是一种分布的表示,这种表示能够捕捉到输入数据变化的主元的坐标,有一点类似于PCA.
确实,如果有一层线性隐层,并且用均方差最小化的准则来训练网络,那么 $k$ 个隐单元就是学习去把输入映射到数据最开始的 $k$ 个主元的的范围之内。如果隐层是非线性的,自动编码器就不同于PCA,因为自动编码器能够捕获数据分布的不同方面。
因为 $y$ 可以看做是 $x$ 的有损失的压缩,所以很难做到对所有的输入 $x$ 都能产生很好(损失小)的压缩。优化就是要使得对于训练样本的压缩很好,并且希望对于其他输入同样有很好的压缩,但是这里的其他输入并不是指任意的输入,一般要求其他输入要和训练集是服从同一分布的。这就是一个自动编码器泛化的意义:对于与训练样本来自同一分布的测试样本,产生很小的重构误差,但是对于在输入空间随意采样的的其他输入通常产生很高的重构误差。
首先用theano来实现自动编码器----
- class dA(object):
- """Auto-Encoder class
- 一个降噪自动编码器就是去通过把一些加噪声的输入映射到隐层空间,然后再映射回输入空间来
- 重构真实的输入。
- (1)首先把真实输入加噪声
- (2)把加噪声的输入映射到隐层空间
- (3)重构真实输入
- (4)计算重构误差
- .. math::
- \tilde{x} ~ q_D(\tilde{x}|x) (1)
- y = s(W \tilde{x} + b) (2)
- x = s(W' y + b') (3)
- L(x,z) = -sum_{k=1}^d [x_k \log z_k + (1-x_k) \log( 1-z_k)] (4)
- """
- def __init__(
- self,
- numpy_rng,
- theano_rng=None,
- input=None,
- n_visible=784,
- n_hidden=500,
- W=None,
- bhid=None,
- bvis=None
- ):
- """
- dA类的初始化:可视层单元数量,隐层单元数量,加噪声程度。重构过程还需要输入,权重
- 和偏置。
- :type numpy_rng: numpy.random.RandomState
- :param numpy_rng: number random generator used to generate weights
- :type theano_rng: theano.tensor.shared_randomstreams.RandomStreams
- :param theano_rng: Theano random generator; if None is given one is
- generated based on a seed drawn from `rng`
- :type input: theano.tensor.TensorType
- :param input: a symbolic description of the input or None for
- standalone dA
- :type n_visible: int
- :param n_visible: number of visible units
- :type n_hidden: int
- :param n_hidden: number of hidden units
- :type W: theano.tensor.TensorType
- :param W: Theano variable pointing to a set of weights that should be
- shared belong the dA and another architecture; if dA should
- be standalone set this to None
- :type bhid: theano.tensor.TensorType
- :param bhid: Theano variable pointing to a set of biases values (for
- hidden units) that should be shared belong dA and another
- architecture; if dA should be standalone set this to None
- :type bvis: theano.tensor.TensorType
- :param bvis: Theano variable pointing to a set of biases values (for
- visible units) that should be shared belong dA and another
- architecture; if dA should be standalone set this to None
- """
- self.n_visible = n_visible
- self.n_hidden = n_hidden
- # create a Theano random generator that gives symbolic random values
- if not theano_rng:
- theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))
- # note : W' was written as `W_prime` and b' as `b_prime`
- if not W:
- # W is initialized with `initial_W` which is uniformely sampled
- # from -4*sqrt(6./(n_visible+n_hidden)) and
- # 4*sqrt(6./(n_hidden+n_visible))the output of uniform if
- # converted using asarray to dtype
- # theano.config.floatX so that the code is runable on GPU
- initial_W = numpy.asarray(
- numpy_rng.uniform(
- low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),
- high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),
- size=(n_visible, n_hidden)
- ),
- dtype=theano.config.floatX
- )
- W = theano.shared(value=initial_W, name='W', borrow=True)
- if not bvis:
- bvis = theano.shared(
- value=numpy.zeros(
- n_visible,
- dtype=theano.config.floatX
- ),
- borrow=True
- )
- if not bhid:
- bhid = theano.shared(
- value=numpy.zeros(
- n_hidden,
- dtype=theano.config.floatX
- ),
- name='b',
- borrow=True
- )
- self.W = W
- # b corresponds to the bias of the hidden
- self.b = bhid
- # b_prime corresponds to the bias of the visible
- self.b_prime = bvis
- # tied weights, therefore W_prime is W transpose
- self.W_prime = self.W.T
- self.theano_rng = theano_rng
- # if no input is given, generate a variable representing the input
- if input is None:
- # we use a matrix because we expect a minibatch of several
- # examples, each example being a row
- self.x = T.dmatrix(name='input')
- else:
- self.x = input
- self.params = [self.W, self.b, self.b_prime]
在栈式自动编码器中,前一层的输出将作为后面一层的输入。
现在计算隐层表示和重构信号:
- def get_hidden_values(self, input):
- """ Computes the values of the hidden layer """
- return T.nnet.sigmoid(T.dot(input, self.W) + self.b)
- def get_reconstructed_input(self, hidden):
- """Computes the reconstructed input given the values of the
- hidden layer
- """
- return T.nnet.sigmoid(T.dot(hidden, self.W_prime) + self.b_prime)
利用这些函数计算损失和更新参数:
- def get_cost_updates(self, corruption_level, learning_rate):
- """ This function computes the cost and the updates for one trainng
- step of the dA """
- tilde_x = self.get_corrupted_input(self.x, corruption_level)
- y = self.get_hidden_values(tilde_x)
- z = self.get_reconstructed_input(y)
- # note : we sum over the size of a datapoint; if we are using
- # minibatches, L will be a vector, with one entry per
- # example in minibatch
- L = - T.sum(self.x * T.log(z) + (1 - self.x) * T.log(1 - z), axis=1)
- # note : L is now a vector, where each element is the
- # cross-entropy cost of the reconstruction of the
- # corresponding example of the minibatch. We need to
- # compute the average of all these to get the cost of
- # the minibatch
- cost = T.mean(L)
- # compute the gradients of the cost of the `dA` with respect
- # to its parameters
- gparams = T.grad(cost, self.params)
- # generate the list of updates
- updates = [
- (param, param - learning_rate * gparam)
- for param, gparam in zip(self.params, gparams)
- ]
- return (cost, updates)
现在可以定义一个函数迭代更新参数使得重构误差最小化:
- da = dA(
- numpy_rng=rng,
- theano_rng=theano_rng,
- input=x,
- n_visible=28 * 28,
- n_hidden=500
- )
- cost, updates = da.get_cost_updates(
- corruption_level=0.,
- learning_rate=learning_rate
- )
- train_da = theano.function(
- [index],
- cost,
- updates=updates,
- givens={
- x: train_set_x[index * batch_size: (index + 1) * batch_size]
- }
- )
如果除了重构误差最小之外没有其他约束,我们自然希望重构的数据与输入完全一样最好,即隐层的编码维度与输入数据维度一样。
然而在[Bengio07] 中的实验表明,在实际中,隐单元数目比输入层单元多(称为超完备)的非线性自动编码器能够产生更加有用的表示(这里”有用“意思是产生更低的分类误差)。
为了获得对连续输入较好的重构,单隐层的自动编码器的非线性单元需要在第一层(编码)时权重较小,这样使得隐单元的非线性数据能够处于激活函数的近似线性范围内,而解码时,需要很大的权重。
对于二元输入,同样需要最小化重构误差。由于显示或者隐式的正则化,使得解码时的权重很难达到一个较大的值,优化算法会找到只适合与训练集相似的样本的编码,这正是我们想要的。
2.降噪自动编码器(DA)
DA的思路很简单:为了使得隐层学习出更加鲁棒的表示,防止简单地学习出一个等价的表示,我们训练一个DA,使得自动编码器能能够从加噪声的的数据中重构出真实的数据。
DA主要做两件事:对输入进行编码和消除噪声的负面影响。
在[Vincent08]中,随机加噪声就是随机把输入中的一些数据置为0,因此自动编码器就是试图通过没有加噪声的数据预测出加噪声的数据。
为了把上面的自动编码器转换成DA,需要加上对输入随机加噪声的操作,加噪声的方式很多,这里只是随机将输入中的部分数据置为0.
下面做法就是生成与输入形状一样的二项分布随机数,然后与输入相乘即可:
- from theano.tensor.shared_randomstreams import RandomStreams
- def get_corrupted_input(self, input, corruption_level):
- """ This function keeps ``1-corruption_level`` entries of the inputs the same
- and zero-out randomly selected subset of size ``coruption_level``
- Note : first argument of theano.rng.binomial is the shape(size) of
- random numbers that it should produce
- second argument is the number of trials
- third argument is the probability of success of any trial
- this will produce an array of 0s and 1s where 1 has a probability of
- 1 - ``corruption_level`` and 0 with ``corruption_level``
- """
- return self.theano_rng.binomial(size=input.shape, n=1, p=1 - corruption_level) * input
最终DA类代码变成下面:
- class dA(object):
- """Denoising Auto-Encoder class (dA)
- A denoising autoencoders tries to reconstruct the input from a corrupted
- version of it by projecting it first in a latent space and reprojecting
- it afterwards back in the input space. Please refer to Vincent et al.,2008
- for more details. If x is the input then equation (1) computes a partially
- destroyed version of x by means of a stochastic mapping q_D. Equation (2)
- computes the projection of the input into the latent space. Equation (3)
- computes the reconstruction of the input, while equation (4) computes the
- reconstruction error.
- .. math::
- \tilde{x} ~ q_D(\tilde{x}|x) (1)
- y = s(W \tilde{x} + b) (2)
- x = s(W' y + b') (3)
- L(x,z) = -sum_{k=1}^d [x_k \log z_k + (1-x_k) \log( 1-z_k)] (4)
- """
- def __init__(self, numpy_rng, theano_rng=None, input=None, n_visible=784, n_hidden=500,
- W=None, bhid=None, bvis=None):
- """
- Initialize the dA class by specifying the number of visible units (the
- dimension d of the input ), the number of hidden units ( the dimension
- d' of the latent or hidden space ) and the corruption level. The
- constructor also receives symbolic variables for the input, weights and
- bias. Such a symbolic variables are useful when, for example the input is
- the result of some computations, or when weights are shared between the
- dA and an MLP layer. When dealing with SdAs this always happens,
- the dA on layer 2 gets as input the output of the dA on layer 1,
- and the weights of the dA are used in the second stage of training
- to construct an MLP.
- :type numpy_rng: numpy.random.RandomState
- :param numpy_rng: number random generator used to generate weights
- :type theano_rng: theano.tensor.shared_randomstreams.RandomStreams
- :param theano_rng: Theano random generator; if None is given one is generated
- based on a seed drawn from `rng`
- :type input: theano.tensor.TensorType
- :paran input: a symbolic description of the input or None for standalone
- dA
- :type n_visible: int
- :param n_visible: number of visible units
- :type n_hidden: int
- :param n_hidden: number of hidden units
- :type W: theano.tensor.TensorType
- :param W: Theano variable pointing to a set of weights that should be
- shared belong the dA and another architecture; if dA should
- be standalone set this to None
- :type bhid: theano.tensor.TensorType
- :param bhid: Theano variable pointing to a set of biases values (for
- hidden units) that should be shared belong dA and another
- architecture; if dA should be standalone set this to None
- :type bvis: theano.tensor.TensorType
- :param bvis: Theano variable pointing to a set of biases values (for
- visible units) that should be shared belong dA and another
- architecture; if dA should be standalone set this to None
- """
- self.n_visible = n_visible
- self.n_hidden = n_hidden
- # create a Theano random generator that gives symbolic random values
- if not theano_rng :
- theano_rng = RandomStreams(rng.randint(2 ** 30))
- # note : W' was written as `W_prime` and b' as `b_prime`
- if not W:
- # W is initialized with `initial_W` which is uniformely sampled
- # from -4.*sqrt(6./(n_visible+n_hidden)) and 4.*sqrt(6./(n_hidden+n_visible))
- # the output of uniform if converted using asarray to dtype
- # theano.config.floatX so that the code is runable on GPU
- initial_W = numpy.asarray(numpy_rng.uniform(
- low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),
- high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),
- size=(n_visible, n_hidden)), dtype=theano.config.floatX)
- W = theano.shared(value=initial_W, name='W')
- if not bvis:
- bvis = theano.shared(value = numpy.zeros(n_visible,
- dtype=theano.config.floatX), name='bvis')
- if not bhid:
- bhid = theano.shared(value=numpy.zeros(n_hidden,
- dtype=theano.config.floatX), name='bhid')
- self.W = W
- # b corresponds to the bias of the hidden
- self.b = bhid
- # b_prime corresponds to the bias of the visible
- self.b_prime = bvis
- # tied weights, therefore W_prime is W transpose
- self.W_prime = self.W.T
- self.theano_rng = theano_rng
- # if no input is given, generate a variable representing the input
- if input == None:
- # we use a matrix because we expect a minibatch of several examples,
- # each example being a row
- self.x = T.dmatrix(name='input')
- else:
- self.x = input
- self.params = [self.W, self.b, self.b_prime]
- def get_corrupted_input(self, input, corruption_level):
- """ This function keeps ``1-corruption_level`` entries of the inputs the same
- and zero-out randomly selected subset of size ``coruption_level``
- Note : first argument of theano.rng.binomial is the shape(size) of
- random numbers that it should produce
- second argument is the number of trials
- third argument is the probability of success of any trial
- this will produce an array of 0s and 1s where 1 has a probability of
- 1 - ``corruption_level`` and 0 with ``corruption_level``
- """
- return self.theano_rng.binomial(size=input.shape, n=1, p=1 - corruption_level) * input
- def get_hidden_values(self, input):
- """ Computes the values of the hidden layer """
- return T.nnet.sigmoid(T.dot(input, self.W) + self.b)
- def get_reconstructed_input(self, hidden ):
- """ Computes the reconstructed input given the values of the hidden layer """
- return T.nnet.sigmoid(T.dot(hidden, self.W_prime) + self.b_prime)
- def get_cost_updates(self, corruption_level, learning_rate):
- """ This function computes the cost and the updates for one trainng
- step of the dA """
- tilde_x = self.get_corrupted_input(self.x, corruption_level)
- y = self.get_hidden_values( tilde_x)
- z = self.get_reconstructed_input(y)
- # note : we sum over the size of a datapoint; if we are using minibatches,
- # L will be a vector, with one entry per example in minibatch
- L = -T.sum(self.x * T.log(z) + (1 - self.x) * T.log(1 - z), axis=1 )
- # note : L is now a vector, where each element is the cross-entropy cost
- # of the reconstruction of the corresponding example of the
- # minibatch. We need to compute the average of all these to get
- # the cost of the minibatch
- cost = T.mean(L)
- # compute the gradients of the cost of the `dA` with respect
- # to its parameters
- gparams = T.grad(cost, self.params)
- # generate the list of updates
- updates = []
- for param, gparam in zip(self.params, gparams):
- updates.append((param, param - learning_rate * gparam))
- return (cost, updates)
3.训练
为了视觉上看出,学习出来的权重到底是什么样子,将使用下面代码:
- import numpy
- def scale_to_unit_interval(ndar, eps=1e-8):
- """ Scales all values in the ndarray ndar to be between 0 and 1 """
- ndar = ndar.copy()
- ndar -= ndar.min()
- ndar *= 1.0 / (ndar.max() + eps)
- return ndar
- def tile_raster_images(X, img_shape, tile_shape, tile_spacing=(0, 0),
- scale_rows_to_unit_interval=True,
- output_pixel_vals=True):
- """
- Transform an array with one flattened image per row, into an array in
- which images are reshaped and layed out like tiles on a floor.
- This function is useful for visualizing datasets whose rows are images,
- and also columns of matrices for transforming those rows
- (such as the first layer of a neural net).
- :type X: a 2-D ndarray or a tuple of 4 channels, elements of which can
- be 2-D ndarrays or None;
- :param X: a 2-D array in which every row is a flattened image.
- :type img_shape: tuple; (height, width)
- :param img_shape: the original shape of each image
- :type tile_shape: tuple; (rows, cols)
- :param tile_shape: the number of images to tile (rows, cols)
- :param output_pixel_vals: if output should be pixel values (i.e. int8
- values) or floats
- :param scale_rows_to_unit_interval: if the values need to be scaled before
- being plotted to [0,1] or not
- :returns: array suitable for viewing as an image.
- (See:`Image.fromarray`.)
- :rtype: a 2-d array with same dtype as X.
- """
- assert len(img_shape) == 2
- assert len(tile_shape) == 2
- assert len(tile_spacing) == 2
- # The expression below can be re-written in a more C style as
- # follows :
- #
- # out_shape = [0,0]
- # out_shape[0] = (img_shape[0]+tile_spacing[0])*tile_shape[0] -
- # tile_spacing[0]
- # out_shape[1] = (img_shape[1]+tile_spacing[1])*tile_shape[1] -
- # tile_spacing[1]
- out_shape = [
- (ishp + tsp) * tshp - tsp
- for ishp, tshp, tsp in zip(img_shape, tile_shape, tile_spacing)
- ]
- if isinstance(X, tuple):
- assert len(X) == 4
- # Create an output numpy ndarray to store the image
- if output_pixel_vals:
- out_array = numpy.zeros((out_shape[0], out_shape[1], 4),
- dtype='uint8')
- else:
- out_array = numpy.zeros((out_shape[0], out_shape[1], 4),
- dtype=X.dtype)
- #colors default to 0, alpha defaults to 1 (opaque)
- if output_pixel_vals:
- channel_defaults = [0, 0, 0, 255]
- else:
- channel_defaults = [0., 0., 0., 1.]
- for i in xrange(4):
- if X[i] is None:
- # if channel is None, fill it with zeros of the correct
- # dtype
- dt = out_array.dtype
- if output_pixel_vals:
- dt = 'uint8'
- out_array[:, :, i] = numpy.zeros(
- out_shape,
- dtype=dt
- ) + channel_defaults[i]
- else:
- # use a recurrent call to compute the channel and store it
- # in the output
- out_array[:, :, i] = tile_raster_images(
- X[i], img_shape, tile_shape, tile_spacing,
- scale_rows_to_unit_interval, output_pixel_vals)
- return out_array
- else:
- # if we are dealing with only one channel
- H, W = img_shape
- Hs, Ws = tile_spacing
- # generate a matrix to store the output
- dt = X.dtype
- if output_pixel_vals:
- dt = 'uint8'
- out_array = numpy.zeros(out_shape, dtype=dt)
- for tile_row in xrange(tile_shape[0]):
- for tile_col in xrange(tile_shape[1]):
- if tile_row * tile_shape[1] + tile_col < X.shape[0]:
- this_x = X[tile_row * tile_shape[1] + tile_col]
- if scale_rows_to_unit_interval:
- # if we should scale values to be between 0 and 1
- # do this by calling the `scale_to_unit_interval`
- # function
- this_img = scale_to_unit_interval(
- this_x.reshape(img_shape))
- else:
- this_img = this_x.reshape(img_shape)
- # add the slice to the corresponding position in the
- # output array
- c = 1
- if output_pixel_vals:
- c = 255
- out_array[
- tile_row * (H + Hs): tile_row * (H + Hs) + H,
- tile_col * (W + Ws): tile_col * (W + Ws) + W
- ] = this_img * c
- return out_array
所以整个训练代码如下:
- """
- This tutorial introduces denoising auto-encoders (dA) using Theano.
- Denoising autoencoders are the building blocks for SdA.
- They are based on auto-encoders as the ones used in Bengio et al. 2007.
- An autoencoder takes an input x and first maps it to a hidden representation
- y = f_{\theta}(x) = s(Wx+b), parameterized by \theta={W,b}. The resulting
- latent representation y is then mapped back to a "reconstructed" vector
- z \in [0,1]^d in input space z = g_{\theta'}(y) = s(W'y + b'). The weight
- matrix W' can optionally be constrained such that W' = W^T, in which case
- the autoencoder is said to have tied weights. The network is trained such
- that to minimize the reconstruction error (the error between x and z).
- For the denosing autoencoder, during training, first x is corrupted into
- \tilde{x}, where \tilde{x} is a partially destroyed version of x by means
- of a stochastic mapping. Afterwards y is computed as before (using
- \tilde{x}), y = s(W\tilde{x} + b) and z as s(W'y + b'). The reconstruction
- error is now measured between z and the uncorrupted input x, which is
- computed as the cross-entropy :
- - \sum_{k=1}^d[ x_k \log z_k + (1-x_k) \log( 1-z_k)]
- References :
- - P. Vincent, H. Larochelle, Y. Bengio, P.A. Manzagol: Extracting and
- Composing Robust Features with Denoising Autoencoders, ICML'08, 1096-1103,
- 2008
- - Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle: Greedy Layer-Wise
- Training of Deep Networks, Advances in Neural Information Processing
- Systems 19, 2007
- """
- import os
- import sys
- import time
- import numpy
- import theano
- import theano.tensor as T
- from theano.tensor.shared_randomstreams import RandomStreams
- from logistic_sgd import load_data
- from utils import tile_raster_images
- try:
- import PIL.Image as Image
- except ImportError:
- import Image
- # start-snippet-1
- class dA(object):
- """Denoising Auto-Encoder class (dA)
- A denoising autoencoders tries to reconstruct the input from a corrupted
- version of it by projecting it first in a latent space and reprojecting
- it afterwards back in the input space. Please refer to Vincent et al.,2008
- for more details. If x is the input then equation (1) computes a partially
- destroyed version of x by means of a stochastic mapping q_D. Equation (2)
- computes the projection of the input into the latent space. Equation (3)
- computes the reconstruction of the input, while equation (4) computes the
- reconstruction error.
- .. math::
- \tilde{x} ~ q_D(\tilde{x}|x) (1)
- y = s(W \tilde{x} + b) (2)
- x = s(W' y + b') (3)
- L(x,z) = -sum_{k=1}^d [x_k \log z_k + (1-x_k) \log( 1-z_k)] (4)
- """
- def __init__(
- self,
- numpy_rng,
- theano_rng=None,
- input=None,
- n_visible=784,
- n_hidden=500,
- W=None,
- bhid=None,
- bvis=None
- ):
- """
- Initialize the dA class by specifying the number of visible units (the
- dimension d of the input ), the number of hidden units ( the dimension
- d' of the latent or hidden space ) and the corruption level. The
- constructor also receives symbolic variables for the input, weights and
- bias. Such a symbolic variables are useful when, for example the input
- is the result of some computations, or when weights are shared between
- the dA and an MLP layer. When dealing with SdAs this always happens,
- the dA on layer 2 gets as input the output of the dA on layer 1,
- and the weights of the dA are used in the second stage of training
- to construct an MLP.
- :type numpy_rng: numpy.random.RandomState
- :param numpy_rng: number random generator used to generate weights
- :type theano_rng: theano.tensor.shared_randomstreams.RandomStreams
- :param theano_rng: Theano random generator; if None is given one is
- generated based on a seed drawn from `rng`
- :type input: theano.tensor.TensorType
- :param input: a symbolic description of the input or None for
- standalone dA
- :type n_visible: int
- :param n_visible: number of visible units
- :type n_hidden: int
- :param n_hidden: number of hidden units
- :type W: theano.tensor.TensorType
- :param W: Theano variable pointing to a set of weights that should be
- shared belong the dA and another architecture; if dA should
- be standalone set this to None
- :type bhid: theano.tensor.TensorType
- :param bhid: Theano variable pointing to a set of biases values (for
- hidden units) that should be shared belong dA and another
- architecture; if dA should be standalone set this to None
- :type bvis: theano.tensor.TensorType
- :param bvis: Theano variable pointing to a set of biases values (for
- visible units) that should be shared belong dA and another
- architecture; if dA should be standalone set this to None
- """
- self.n_visible = n_visible
- self.n_hidden = n_hidden
- # create a Theano random generator that gives symbolic random values
- if not theano_rng:
- theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))
- # note : W' was written as `W_prime` and b' as `b_prime`
- if not W:
- # W is initialized with `initial_W` which is uniformely sampled
- # from -4*sqrt(6./(n_visible+n_hidden)) and
- # 4*sqrt(6./(n_hidden+n_visible))the output of uniform if
- # converted using asarray to dtype
- # theano.config.floatX so that the code is runable on GPU
- initial_W = numpy.asarray(
- numpy_rng.uniform(
- low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),
- high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),
- size=(n_visible, n_hidden)
- ),
- dtype=theano.config.floatX
- )
- W = theano.shared(value=initial_W, name='W', borrow=True)
- if not bvis:
- bvis = theano.shared(
- value=numpy.zeros(
- n_visible,
- dtype=theano.config.floatX
- ),
- borrow=True
- )
- if not bhid:
- bhid = theano.shared(
- value=numpy.zeros(
- n_hidden,
- dtype=theano.config.floatX
- ),
- name='b',
- borrow=True
- )
- self.W = W
- # b corresponds to the bias of the hidden
- self.b = bhid
- # b_prime corresponds to the bias of the visible
- self.b_prime = bvis
- # tied weights, therefore W_prime is W transpose
- self.W_prime = self.W.T
- self.theano_rng = theano_rng
- # if no input is given, generate a variable representing the input
- if input is None:
- # we use a matrix because we expect a minibatch of several
- # examples, each example being a row
- self.x = T.dmatrix(name='input')
- else:
- self.x = input
- self.params = [self.W, self.b, self.b_prime]
- # end-snippet-1
- def get_corrupted_input(self, input, corruption_level):
- """This function keeps ``1-corruption_level`` entries of the inputs the
- same and zero-out randomly selected subset of size ``coruption_level``
- Note : first argument of theano.rng.binomial is the shape(size) of
- random numbers that it should produce
- second argument is the number of trials
- third argument is the probability of success of any trial
- this will produce an array of 0s and 1s where 1 has a
- probability of 1 - ``corruption_level`` and 0 with
- ``corruption_level``
- The binomial function return int64 data type by
- default. int64 multiplicated by the input
- type(floatX) always return float64. To keep all data
- in floatX when floatX is float32, we set the dtype of
- the binomial to floatX. As in our case the value of
- the binomial is always 0 or 1, this don't change the
- result. This is needed to allow the gpu to work
- correctly as it only support float32 for now.
- """
- return self.theano_rng.binomial(size=input.shape, n=1,
- p=1 - corruption_level,
- dtype=theano.config.floatX) * input
- def get_hidden_values(self, input):
- """ Computes the values of the hidden layer """
- return T.nnet.sigmoid(T.dot(input, self.W) + self.b)
- def get_reconstructed_input(self, hidden):
- """Computes the reconstructed input given the values of the
- hidden layer
- """
- return T.nnet.sigmoid(T.dot(hidden, self.W_prime) + self.b_prime)
- def get_cost_updates(self, corruption_level, learning_rate):
- """ This function computes the cost and the updates for one trainng
- step of the dA """
- tilde_x = self.get_corrupted_input(self.x, corruption_level)
- y = self.get_hidden_values(tilde_x)
- z = self.get_reconstructed_input(y)
- # note : we sum over the size of a datapoint; if we are using
- # minibatches, L will be a vector, with one entry per
- # example in minibatch
- L = - T.sum(self.x * T.log(z) + (1 - self.x) * T.log(1 - z), axis=1)
- # note : L is now a vector, where each element is the
- # cross-entropy cost of the reconstruction of the
- # corresponding example of the minibatch. We need to
- # compute the average of all these to get the cost of
- # the minibatch
- cost = T.mean(L)
- # compute the gradients of the cost of the `dA` with respect
- # to its parameters
- gparams = T.grad(cost, self.params)
- # generate the list of updates
- updates = [
- (param, param - learning_rate * gparam)
- for param, gparam in zip(self.params, gparams)
- ]
- return (cost, updates)
- def test_dA(learning_rate=0.1, training_epochs=15,
- dataset='mnist.pkl.gz',
- batch_size=20, output_folder='dA_plots'):
- """
- This demo is tested on MNIST
- :type learning_rate: float
- :param learning_rate: learning rate used for training the DeNosing
- AutoEncoder
- :type training_epochs: int
- :param training_epochs: number of epochs used for training
- :type dataset: string
- :param dataset: path to the picked dataset
- """
- datasets = load_data(dataset)
- train_set_x, train_set_y = datasets[0]
- # compute number of minibatches for training, validation and testing
- n_train_batches = train_set_x.get_value(borrow=True).shape[0] / batch_size
- # allocate symbolic variables for the data
- index = T.lscalar() # index to a [mini]batch
- x = T.matrix('x') # the data is presented as rasterized images
- if not os.path.isdir(output_folder):
- os.makedirs(output_folder)
- os.chdir(output_folder)
- ####################################
- # BUILDING THE MODEL NO CORRUPTION #
- ####################################
- rng = numpy.random.RandomState(123)
- theano_rng = RandomStreams(rng.randint(2 ** 30))
- da = dA(
- numpy_rng=rng,
- theano_rng=theano_rng,
- input=x,
- n_visible=28 * 28,
- n_hidden=500
- )
- cost, updates = da.get_cost_updates(
- corruption_level=0.,
- learning_rate=learning_rate
- )
- train_da = theano.function(
- [index],
- cost,
- updates=updates,
- givens={
- x: train_set_x[index * batch_size: (index + 1) * batch_size]
- }
- )
- start_time = time.clock()
- ############
- # TRAINING #
- ############
- # go through training epochs
- for epoch in xrange(training_epochs):
- # go through trainng set
- c = []
- for batch_index in xrange(n_train_batches):
- c.append(train_da(batch_index))
- print 'Training epoch %d, cost ' % epoch, numpy.mean(c)
- end_time = time.clock()
- training_time = (end_time - start_time)
- print >> sys.stderr, ('The no corruption code for file ' +
- os.path.split(__file__)[1] +
- ' ran for %.2fm' % ((training_time) / 60.))
- image = Image.fromarray(
- tile_raster_images(X=da.W.get_value(borrow=True).T,
- img_shape=(28, 28), tile_shape=(10, 10),
- tile_spacing=(1, 1)))
- image.save('filters_corruption_0.png')
- #####################################
- # BUILDING THE MODEL CORRUPTION 30% #
- #####################################
- rng = numpy.random.RandomState(123)
- theano_rng = RandomStreams(rng.randint(2 ** 30))
- da = dA(
- numpy_rng=rng,
- theano_rng=theano_rng,
- input=x,
- n_visible=28 * 28,
- n_hidden=500
- )
- cost, updates = da.get_cost_updates(
- corruption_level=0.3,
- learning_rate=learning_rate
- )
- train_da = theano.function(
- [index],
- cost,
- updates=updates,
- givens={
- x: train_set_x[index * batch_size: (index + 1) * batch_size]
- }
- )
- start_time = time.clock()
- ############
- # TRAINING #
- ############
- # go through training epochs
- for epoch in xrange(training_epochs):
- # go through trainng set
- c = []
- for batch_index in xrange(n_train_batches):
- c.append(train_da(batch_index))
- print 'Training epoch %d, cost ' % epoch, numpy.mean(c)
- end_time = time.clock()
- training_time = (end_time - start_time)
- print >> sys.stderr, ('The 30% corruption code for file ' +
- os.path.split(__file__)[1] +
- ' ran for %.2fm' % (training_time / 60.))
- image = Image.fromarray(tile_raster_images(
- X=da.W.get_value(borrow=True).T,
- img_shape=(28, 28), tile_shape=(10, 10),
- tile_spacing=(1, 1)))
- image.save('filters_corruption_30.png')
- os.chdir('../')
- if __name__ == '__main__':
- test_dA()
- ... loading data
- Training epoch 0, cost 63.2891694201
- Training epoch 1, cost 55.7866565443
- Training epoch 2, cost 54.7631168984
- Training epoch 3, cost 54.2420533514
- Training epoch 4, cost 53.888670659
- Training epoch 5, cost 53.6203505434
- Training epoch 6, cost 53.4037459012
- Training epoch 7, cost 53.2219976788
- Training epoch 8, cost 53.0658010178
- Training epoch 9, cost 52.9295596873
- Training epoch 10, cost 52.8094163525
- Training epoch 11, cost 52.7024367362
- Training epoch 12, cost 52.606310148
- Training epoch 13, cost 52.5191693641
- Training epoch 14, cost 52.4395240004
- The no corruption code for file dA.py ran for 10.21m
- Training epoch 0, cost 81.7714190632
- Training epoch 1, cost 73.4285756365
- Training epoch 2, cost 70.8632686268
- Training epoch 3, cost 69.3396642015
- Training epoch 4, cost 68.4134660704
- Training epoch 5, cost 67.723705304
- Training epoch 6, cost 67.2401360252
- Training epoch 7, cost 66.849303071
- Training epoch 8, cost 66.5663948395
- Training epoch 9, cost 66.3591257941
- Training epoch 10, cost 66.1336658308
- Training epoch 11, cost 65.9893924612
- Training epoch 12, cost 65.8344131768
- Training epoch 13, cost 65.7185348901
- Training epoch 14, cost 65.6010749532
- The 30% corruption code for file dA.py ran for 10.37m
没有加噪声训练出的权重可视化结果:
加了30%噪声训练出的权重可视化结果:
学习资料来源:http://deeplearning.net/tutorial/dA.html#daa
基于theano的降噪自动编码器(Denoising Autoencoders--DA)的更多相关文章
- 降噪自动编码器(Denoising Autoencoder)
起源:PCA.特征提取.... 随着一些奇怪的高维数据出现,比如图像.语音,传统的统计学-机器学习方法遇到了前所未有的挑战. 数据维度过高,数据单调,噪声分布广,传统方法的“数值游戏”很难奏效.数据挖 ...
- theano学习指南5(翻译)- 降噪自动编码器
降噪自动编码器是经典的自动编码器的一种扩展,它最初被当作深度网络的一个模块使用 [Vincent08].这篇指南中,我们首先也简单的讨论一下自动编码器. 自动编码器 文献[Bengio09] 给出了自 ...
- 堆叠式降噪自动编码器(SDA)
1.1 自动编码器 自动编码器(AutoEncoder,AE)就是一种尽可能复现输入信号的神经网络,其输出向量与输入向量同维,常按照输入向量的某种形式,通过隐层学习一个数据的表示或对原始数据进行有效 ...
- (转) 基于Theano的深度学习(Deep Learning)框架Keras学习随笔-01-FAQ
特别棒的一篇文章,仍不住转一下,留着以后需要时阅读 基于Theano的深度学习(Deep Learning)框架Keras学习随笔-01-FAQ
- 基于Theano的深度学习框架keras及配合SVM训练模型
https://blog.csdn.net/a819825294/article/details/51334397 1.介绍 Keras是基于Theano的一个深度学习框架,它的设计参考了Torch, ...
- 基于Theano的DL的开源小框架:Dragon
Link:https://github.com/neopenx/Dragon 起因 最近看了Hinton的Dropout,发现原来的乱代码只能在Softmax层前面加Dropout.索性把整个Thea ...
- Keras:基于Theano和TensorFlow的深度学习库
catalogue . 引言 . 一些基本概念 . Sequential模型 . 泛型模型 . 常用层 . 卷积层 . 池化层 . 递归层Recurrent . 嵌入层 Embedding 1. 引言 ...
- [DL] 基于theano.tensor.dot的逻辑回归代码中的SGD部分的疑问探幽
在Hinton的教程中, 使用Python的theano库搭建的CNN是其中重要一环, 而其中的所谓的SGD - stochastic gradient descend算法又是如何实现的呢? 看下面源 ...
- 基于theano的深度卷积神经网络
使用了两个卷积层.一个全连接层和一个softmax分类器. 在测试数据集上正确率可以达到99.22%. 代码参考了neural-networks-and-deep-learning #coding:u ...
随机推荐
- The Unique MST POJ - 1679 (次小生成树)
Given a connected undirected graph, tell if its minimum spanning tree is unique. Definition 1 (Spann ...
- 2017ACM/ICPC亚洲区沈阳站-重现赛
HDU 6222 Heron and His Triangle 链接:http://acm.hdu.edu.cn/showproblem.php?pid=6222 思路: 打表找规律+大数运算 首先我 ...
- [luogu1912][bzoj4196][NOI2015]软件管理器
题解 树剖模板题,每次改变是\(1\)或者是\(0\),区间求和和区间修改就可了. ac代码 # include <cstdio> # include <cstring> # ...
- luogu3295 萌萌哒 (并查集+ST表)
如果给相同的位置连边,最后联通块数是n,最后答案就是$9*10^{n-1}$ 但直接连边是$O(n^2)$的 所以事先处理出一个ST表,每次O(1)地给那个ST表连边 最后再一点一点下放,就是把在这层 ...
- luogu2336 喵星球上的点名 (SA+二分答案+树状数组)
离散化一下然后把姓名串和询问串都放一起做SA 和bzoj3277串类似地,满足某一询问的后缀(就是和这个询问对应的后缀的LCP>=这个询问长度的后缀)的排名也是一个区间,把这个区间二分出来即可 ...
- shell一些不为人知的技巧
!$!$是一个特殊的环境变量,它代表了上一个命令的最后一个字符串.如:你可能会这样:$mkdir mydir$mv mydir yourdir$cd yourdir可以改成:$mkdir mydir$ ...
- A1036. Boys vs Girls
This time you are asked to tell the difference between the lowest grade of all the male students and ...
- JAVA注释的另一种神奇用法
每个JAVA程序员在写程序的时候一定都会用到注释,本篇博客不是讲怎么定义注释,而是说明注释神奇的一种写法. /** * 这是一个测试类 */ public class Test { /** * 程序的 ...
- C++11并发——多线程std::thread (一)
https://www.cnblogs.com/haippy/p/3284540.html 与 C++11 多线程相关的头文件 C++11 新标准中引入了四个头文件来支持多线程编程,他们分别是< ...
- Code-force 1003 E Tree Constructing
E. T ...