1.自动编码器

自动编码器首先通过下面的映射,把输入 $x\in[0,1]^{d}$映射到一个隐层 $y\in[0,1]^{d^{'}}$(编码器):

$y=s(Wx+b)$

其中 $s$ 是非线性的函数,例如sigmoid. 隐层表示 $y$ ,即编码然后被映射回(通过解码器)一个重构的 $z$,形状与输入$x$ 一样:

$z=s(W^{'}y+b^{'})$

这里 $W^{'}$ 不是表示 $W$ 的转置。$z$ 应该看作在给定编码 $y$ 的条件下对 $x$ 的预测。反向映射的权重矩阵 $W^{'}$ 可能会被约束为前向映射的转置:

$W^{'}=W^{T}$,这就是所谓的tied weights, 所以这个模型的参数($W$, $b$, $b^{'}$)的优化就是使得平均重构误差最小化。

重构误差有很多方式衡量,主要取决于对输入给定的编码的分布的合适的假设。传统的平方差 $L(x,z)=||x-z||^{2}$ 可以应用到此。如果输入可以看作是位向量或者概率向量,交叉熵也可以用来衡量:

$L_{H}(x,z)=-\sum_{k=1}^{d}[x_{k}\log z_{k}+(1-x_{k})\log(1-x_{k})]$

这就是希望编码 $y$ 是一种分布的表示,这种表示能够捕捉到输入数据变化的主元的坐标,有一点类似于PCA.

确实,如果有一层线性隐层,并且用均方差最小化的准则来训练网络,那么 $k$ 个隐单元就是学习去把输入映射到数据最开始的 $k$ 个主元的的范围之内。如果隐层是非线性的,自动编码器就不同于PCA,因为自动编码器能够捕获数据分布的不同方面。

因为 $y$ 可以看做是 $x$ 的有损失的压缩,所以很难做到对所有的输入 $x$ 都能产生很好(损失小)的压缩。优化就是要使得对于训练样本的压缩很好,并且希望对于其他输入同样有很好的压缩,但是这里的其他输入并不是指任意的输入,一般要求其他输入要和训练集是服从同一分布的。这就是一个自动编码器泛化的意义:对于与训练样本来自同一分布的测试样本,产生很小的重构误差,但是对于在输入空间随意采样的的其他输入通常产生很高的重构误差。

首先用theano来实现自动编码器----

  1. class dA(object):
  2. """Auto-Encoder class
  3.  
  4. 一个降噪自动编码器就是去通过把一些加噪声的输入映射到隐层空间,然后再映射回输入空间来
  5. 重构真实的输入。
  6. (1)首先把真实输入加噪声
  7. (2)把加噪声的输入映射到隐层空间
  8. (3)重构真实输入
  9. (4)计算重构误差
  10. .. math::
  11.  
  12. \tilde{x} ~ q_D(\tilde{x}|x) (1)
  13.  
  14. y = s(W \tilde{x} + b) (2)
  15.  
  16. x = s(W' y + b') (3)
  17.  
  18. L(x,z) = -sum_{k=1}^d [x_k \log z_k + (1-x_k) \log( 1-z_k)] (4)
  19.  
  20. """
  21.  
  22. def __init__(
  23. self,
  24. numpy_rng,
  25. theano_rng=None,
  26. input=None,
  27. n_visible=784,
  28. n_hidden=500,
  29. W=None,
  30. bhid=None,
  31. bvis=None
  32. ):
  33. """
  34. dA类的初始化:可视层单元数量,隐层单元数量,加噪声程度。重构过程还需要输入,权重
  35. 和偏置。
  36. :type numpy_rng: numpy.random.RandomState
  37. :param numpy_rng: number random generator used to generate weights
  38.  
  39. :type theano_rng: theano.tensor.shared_randomstreams.RandomStreams
  40. :param theano_rng: Theano random generator; if None is given one is
  41. generated based on a seed drawn from `rng`
  42.  
  43. :type input: theano.tensor.TensorType
  44. :param input: a symbolic description of the input or None for
  45. standalone dA
  46.  
  47. :type n_visible: int
  48. :param n_visible: number of visible units
  49.  
  50. :type n_hidden: int
  51. :param n_hidden: number of hidden units
  52.  
  53. :type W: theano.tensor.TensorType
  54. :param W: Theano variable pointing to a set of weights that should be
  55. shared belong the dA and another architecture; if dA should
  56. be standalone set this to None
  57.  
  58. :type bhid: theano.tensor.TensorType
  59. :param bhid: Theano variable pointing to a set of biases values (for
  60. hidden units) that should be shared belong dA and another
  61. architecture; if dA should be standalone set this to None
  62.  
  63. :type bvis: theano.tensor.TensorType
  64. :param bvis: Theano variable pointing to a set of biases values (for
  65. visible units) that should be shared belong dA and another
  66. architecture; if dA should be standalone set this to None
  67.  
  68. """
  69. self.n_visible = n_visible
  70. self.n_hidden = n_hidden
  71.  
  72. # create a Theano random generator that gives symbolic random values
  73. if not theano_rng:
  74. theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))
  75.  
  76. # note : W' was written as `W_prime` and b' as `b_prime`
  77. if not W:
  78. # W is initialized with `initial_W` which is uniformely sampled
  79. # from -4*sqrt(6./(n_visible+n_hidden)) and
  80. # 4*sqrt(6./(n_hidden+n_visible))the output of uniform if
  81. # converted using asarray to dtype
  82. # theano.config.floatX so that the code is runable on GPU
  83. initial_W = numpy.asarray(
  84. numpy_rng.uniform(
  85. low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),
  86. high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),
  87. size=(n_visible, n_hidden)
  88. ),
  89. dtype=theano.config.floatX
  90. )
  91. W = theano.shared(value=initial_W, name='W', borrow=True)
  92.  
  93. if not bvis:
  94. bvis = theano.shared(
  95. value=numpy.zeros(
  96. n_visible,
  97. dtype=theano.config.floatX
  98. ),
  99. borrow=True
  100. )
  101.  
  102. if not bhid:
  103. bhid = theano.shared(
  104. value=numpy.zeros(
  105. n_hidden,
  106. dtype=theano.config.floatX
  107. ),
  108. name='b',
  109. borrow=True
  110. )
  111.  
  112. self.W = W
  113. # b corresponds to the bias of the hidden
  114. self.b = bhid
  115. # b_prime corresponds to the bias of the visible
  116. self.b_prime = bvis
  117. # tied weights, therefore W_prime is W transpose
  118. self.W_prime = self.W.T
  119. self.theano_rng = theano_rng
  120. # if no input is given, generate a variable representing the input
  121. if input is None:
  122. # we use a matrix because we expect a minibatch of several
  123. # examples, each example being a row
  124. self.x = T.dmatrix(name='input')
  125. else:
  126. self.x = input
  127.  
  128. self.params = [self.W, self.b, self.b_prime]

在栈式自动编码器中,前一层的输出将作为后面一层的输入。

现在计算隐层表示和重构信号:

  1. def get_hidden_values(self, input):
  2. """ Computes the values of the hidden layer """
  3. return T.nnet.sigmoid(T.dot(input, self.W) + self.b)
  1. def get_reconstructed_input(self, hidden):
  2. """Computes the reconstructed input given the values of the
  3. hidden layer
  4.  
  5. """
  6. return T.nnet.sigmoid(T.dot(hidden, self.W_prime) + self.b_prime)

利用这些函数计算损失和更新参数:

  1. def get_cost_updates(self, corruption_level, learning_rate):
  2. """ This function computes the cost and the updates for one trainng
  3. step of the dA """
  4.  
  5. tilde_x = self.get_corrupted_input(self.x, corruption_level)
  6. y = self.get_hidden_values(tilde_x)
  7. z = self.get_reconstructed_input(y)
  8. # note : we sum over the size of a datapoint; if we are using
  9. # minibatches, L will be a vector, with one entry per
  10. # example in minibatch
  11. L = - T.sum(self.x * T.log(z) + (1 - self.x) * T.log(1 - z), axis=1)
  12. # note : L is now a vector, where each element is the
  13. # cross-entropy cost of the reconstruction of the
  14. # corresponding example of the minibatch. We need to
  15. # compute the average of all these to get the cost of
  16. # the minibatch
  17. cost = T.mean(L)
  18.  
  19. # compute the gradients of the cost of the `dA` with respect
  20. # to its parameters
  21. gparams = T.grad(cost, self.params)
  22. # generate the list of updates
  23. updates = [
  24. (param, param - learning_rate * gparam)
  25. for param, gparam in zip(self.params, gparams)
  26. ]
  27.  
  28. return (cost, updates)

  现在可以定义一个函数迭代更新参数使得重构误差最小化:

  1. da = dA(
  2. numpy_rng=rng,
  3. theano_rng=theano_rng,
  4. input=x,
  5. n_visible=28 * 28,
  6. n_hidden=500
  7. )
  8.  
  9. cost, updates = da.get_cost_updates(
  10. corruption_level=0.,
  11. learning_rate=learning_rate
  12. )
  13.  
  14. train_da = theano.function(
  15. [index],
  16. cost,
  17. updates=updates,
  18. givens={
  19. x: train_set_x[index * batch_size: (index + 1) * batch_size]
  20. }
  21. )

如果除了重构误差最小之外没有其他约束,我们自然希望重构的数据与输入完全一样最好,即隐层的编码维度与输入数据维度一样。

然而在[Bengio07] 中的实验表明,在实际中,隐单元数目比输入层单元多(称为超完备)的非线性自动编码器能够产生更加有用的表示(这里”有用“意思是产生更低的分类误差)。

为了获得对连续输入较好的重构,单隐层的自动编码器的非线性单元需要在第一层(编码)时权重较小,这样使得隐单元的非线性数据能够处于激活函数的近似线性范围内,而解码时,需要很大的权重。

对于二元输入,同样需要最小化重构误差。由于显示或者隐式的正则化,使得解码时的权重很难达到一个较大的值,优化算法会找到只适合与训练集相似的样本的编码,这正是我们想要的。

2.降噪自动编码器(DA)

DA的思路很简单:为了使得隐层学习出更加鲁棒的表示,防止简单地学习出一个等价的表示,我们训练一个DA,使得自动编码器能能够从加噪声的的数据中重构出真实的数据。

DA主要做两件事:对输入进行编码和消除噪声的负面影响。

[Vincent08]中,随机加噪声就是随机把输入中的一些数据置为0,因此自动编码器就是试图通过没有加噪声的数据预测出加噪声的数据。

为了把上面的自动编码器转换成DA,需要加上对输入随机加噪声的操作,加噪声的方式很多,这里只是随机将输入中的部分数据置为0.

下面做法就是生成与输入形状一样的二项分布随机数,然后与输入相乘即可:

  1. from theano.tensor.shared_randomstreams import RandomStreams
  2.  
  3. def get_corrupted_input(self, input, corruption_level):
  4. """ This function keeps ``1-corruption_level`` entries of the inputs the same
  5. and zero-out randomly selected subset of size ``coruption_level``
  6. Note : first argument of theano.rng.binomial is the shape(size) of
  7. random numbers that it should produce
  8. second argument is the number of trials
  9. third argument is the probability of success of any trial
  10.  
  11. this will produce an array of 0s and 1s where 1 has a probability of
  12. 1 - ``corruption_level`` and 0 with ``corruption_level``
  13. """
  14. return self.theano_rng.binomial(size=input.shape, n=1, p=1 - corruption_level) * input

最终DA类代码变成下面:

  1. class dA(object):
  2. """Denoising Auto-Encoder class (dA)
  3.  
  4. A denoising autoencoders tries to reconstruct the input from a corrupted
  5. version of it by projecting it first in a latent space and reprojecting
  6. it afterwards back in the input space. Please refer to Vincent et al.,2008
  7. for more details. If x is the input then equation (1) computes a partially
  8. destroyed version of x by means of a stochastic mapping q_D. Equation (2)
  9. computes the projection of the input into the latent space. Equation (3)
  10. computes the reconstruction of the input, while equation (4) computes the
  11. reconstruction error.
  12.  
  13. .. math::
  14.  
  15. \tilde{x} ~ q_D(\tilde{x}|x) (1)
  16.  
  17. y = s(W \tilde{x} + b) (2)
  18.  
  19. x = s(W' y + b') (3)
  20.  
  21. L(x,z) = -sum_{k=1}^d [x_k \log z_k + (1-x_k) \log( 1-z_k)] (4)
  22.  
  23. """
  24.  
  25. def __init__(self, numpy_rng, theano_rng=None, input=None, n_visible=784, n_hidden=500,
  26. W=None, bhid=None, bvis=None):
  27. """
  28. Initialize the dA class by specifying the number of visible units (the
  29. dimension d of the input ), the number of hidden units ( the dimension
  30. d' of the latent or hidden space ) and the corruption level. The
  31. constructor also receives symbolic variables for the input, weights and
  32. bias. Such a symbolic variables are useful when, for example the input is
  33. the result of some computations, or when weights are shared between the
  34. dA and an MLP layer. When dealing with SdAs this always happens,
  35. the dA on layer 2 gets as input the output of the dA on layer 1,
  36. and the weights of the dA are used in the second stage of training
  37. to construct an MLP.
  38.  
  39. :type numpy_rng: numpy.random.RandomState
  40. :param numpy_rng: number random generator used to generate weights
  41.  
  42. :type theano_rng: theano.tensor.shared_randomstreams.RandomStreams
  43. :param theano_rng: Theano random generator; if None is given one is generated
  44. based on a seed drawn from `rng`
  45.  
  46. :type input: theano.tensor.TensorType
  47. :paran input: a symbolic description of the input or None for standalone
  48. dA
  49.  
  50. :type n_visible: int
  51. :param n_visible: number of visible units
  52.  
  53. :type n_hidden: int
  54. :param n_hidden: number of hidden units
  55.  
  56. :type W: theano.tensor.TensorType
  57. :param W: Theano variable pointing to a set of weights that should be
  58. shared belong the dA and another architecture; if dA should
  59. be standalone set this to None
  60.  
  61. :type bhid: theano.tensor.TensorType
  62. :param bhid: Theano variable pointing to a set of biases values (for
  63. hidden units) that should be shared belong dA and another
  64. architecture; if dA should be standalone set this to None
  65.  
  66. :type bvis: theano.tensor.TensorType
  67. :param bvis: Theano variable pointing to a set of biases values (for
  68. visible units) that should be shared belong dA and another
  69. architecture; if dA should be standalone set this to None
  70.  
  71. """
  72. self.n_visible = n_visible
  73. self.n_hidden = n_hidden
  74.  
  75. # create a Theano random generator that gives symbolic random values
  76. if not theano_rng :
  77. theano_rng = RandomStreams(rng.randint(2 ** 30))
  78.  
  79. # note : W' was written as `W_prime` and b' as `b_prime`
  80. if not W:
  81. # W is initialized with `initial_W` which is uniformely sampled
  82. # from -4.*sqrt(6./(n_visible+n_hidden)) and 4.*sqrt(6./(n_hidden+n_visible))
  83. # the output of uniform if converted using asarray to dtype
  84. # theano.config.floatX so that the code is runable on GPU
  85. initial_W = numpy.asarray(numpy_rng.uniform(
  86. low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),
  87. high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),
  88. size=(n_visible, n_hidden)), dtype=theano.config.floatX)
  89. W = theano.shared(value=initial_W, name='W')
  90.  
  91. if not bvis:
  92. bvis = theano.shared(value = numpy.zeros(n_visible,
  93. dtype=theano.config.floatX), name='bvis')
  94.  
  95. if not bhid:
  96. bhid = theano.shared(value=numpy.zeros(n_hidden,
  97. dtype=theano.config.floatX), name='bhid')
  98.  
  99. self.W = W
  100. # b corresponds to the bias of the hidden
  101. self.b = bhid
  102. # b_prime corresponds to the bias of the visible
  103. self.b_prime = bvis
  104. # tied weights, therefore W_prime is W transpose
  105. self.W_prime = self.W.T
  106. self.theano_rng = theano_rng
  107. # if no input is given, generate a variable representing the input
  108. if input == None:
  109. # we use a matrix because we expect a minibatch of several examples,
  110. # each example being a row
  111. self.x = T.dmatrix(name='input')
  112. else:
  113. self.x = input
  114.  
  115. self.params = [self.W, self.b, self.b_prime]
  116.  
  117. def get_corrupted_input(self, input, corruption_level):
  118. """ This function keeps ``1-corruption_level`` entries of the inputs the same
  119. and zero-out randomly selected subset of size ``coruption_level``
  120. Note : first argument of theano.rng.binomial is the shape(size) of
  121. random numbers that it should produce
  122. second argument is the number of trials
  123. third argument is the probability of success of any trial
  124.  
  125. this will produce an array of 0s and 1s where 1 has a probability of
  126. 1 - ``corruption_level`` and 0 with ``corruption_level``
  127. """
  128. return self.theano_rng.binomial(size=input.shape, n=1, p=1 - corruption_level) * input
  129.  
  130. def get_hidden_values(self, input):
  131. """ Computes the values of the hidden layer """
  132. return T.nnet.sigmoid(T.dot(input, self.W) + self.b)
  133.  
  134. def get_reconstructed_input(self, hidden ):
  135. """ Computes the reconstructed input given the values of the hidden layer """
  136. return T.nnet.sigmoid(T.dot(hidden, self.W_prime) + self.b_prime)
  137.  
  138. def get_cost_updates(self, corruption_level, learning_rate):
  139. """ This function computes the cost and the updates for one trainng
  140. step of the dA """
  141.  
  142. tilde_x = self.get_corrupted_input(self.x, corruption_level)
  143. y = self.get_hidden_values( tilde_x)
  144. z = self.get_reconstructed_input(y)
  145. # note : we sum over the size of a datapoint; if we are using minibatches,
  146. # L will be a vector, with one entry per example in minibatch
  147. L = -T.sum(self.x * T.log(z) + (1 - self.x) * T.log(1 - z), axis=1 )
  148. # note : L is now a vector, where each element is the cross-entropy cost
  149. # of the reconstruction of the corresponding example of the
  150. # minibatch. We need to compute the average of all these to get
  151. # the cost of the minibatch
  152. cost = T.mean(L)
  153.  
  154. # compute the gradients of the cost of the `dA` with respect
  155. # to its parameters
  156. gparams = T.grad(cost, self.params)
  157. # generate the list of updates
  158. updates = []
  159. for param, gparam in zip(self.params, gparams):
  160. updates.append((param, param - learning_rate * gparam))
  161.  
  162. return (cost, updates)

3.训练

为了视觉上看出,学习出来的权重到底是什么样子,将使用下面代码:

  1. import numpy
  2.  
  3. def scale_to_unit_interval(ndar, eps=1e-8):
  4. """ Scales all values in the ndarray ndar to be between 0 and 1 """
  5. ndar = ndar.copy()
  6. ndar -= ndar.min()
  7. ndar *= 1.0 / (ndar.max() + eps)
  8. return ndar
  9.  
  10. def tile_raster_images(X, img_shape, tile_shape, tile_spacing=(0, 0),
  11. scale_rows_to_unit_interval=True,
  12. output_pixel_vals=True):
  13. """
  14. Transform an array with one flattened image per row, into an array in
  15. which images are reshaped and layed out like tiles on a floor.
  16.  
  17. This function is useful for visualizing datasets whose rows are images,
  18. and also columns of matrices for transforming those rows
  19. (such as the first layer of a neural net).
  20.  
  21. :type X: a 2-D ndarray or a tuple of 4 channels, elements of which can
  22. be 2-D ndarrays or None;
  23. :param X: a 2-D array in which every row is a flattened image.
  24.  
  25. :type img_shape: tuple; (height, width)
  26. :param img_shape: the original shape of each image
  27.  
  28. :type tile_shape: tuple; (rows, cols)
  29. :param tile_shape: the number of images to tile (rows, cols)
  30.  
  31. :param output_pixel_vals: if output should be pixel values (i.e. int8
  32. values) or floats
  33.  
  34. :param scale_rows_to_unit_interval: if the values need to be scaled before
  35. being plotted to [0,1] or not
  36.  
  37. :returns: array suitable for viewing as an image.
  38. (See:`Image.fromarray`.)
  39. :rtype: a 2-d array with same dtype as X.
  40.  
  41. """
  42.  
  43. assert len(img_shape) == 2
  44. assert len(tile_shape) == 2
  45. assert len(tile_spacing) == 2
  46.  
  47. # The expression below can be re-written in a more C style as
  48. # follows :
  49. #
  50. # out_shape = [0,0]
  51. # out_shape[0] = (img_shape[0]+tile_spacing[0])*tile_shape[0] -
  52. # tile_spacing[0]
  53. # out_shape[1] = (img_shape[1]+tile_spacing[1])*tile_shape[1] -
  54. # tile_spacing[1]
  55. out_shape = [
  56. (ishp + tsp) * tshp - tsp
  57. for ishp, tshp, tsp in zip(img_shape, tile_shape, tile_spacing)
  58. ]
  59.  
  60. if isinstance(X, tuple):
  61. assert len(X) == 4
  62. # Create an output numpy ndarray to store the image
  63. if output_pixel_vals:
  64. out_array = numpy.zeros((out_shape[0], out_shape[1], 4),
  65. dtype='uint8')
  66. else:
  67. out_array = numpy.zeros((out_shape[0], out_shape[1], 4),
  68. dtype=X.dtype)
  69.  
  70. #colors default to 0, alpha defaults to 1 (opaque)
  71. if output_pixel_vals:
  72. channel_defaults = [0, 0, 0, 255]
  73. else:
  74. channel_defaults = [0., 0., 0., 1.]
  75.  
  76. for i in xrange(4):
  77. if X[i] is None:
  78. # if channel is None, fill it with zeros of the correct
  79. # dtype
  80. dt = out_array.dtype
  81. if output_pixel_vals:
  82. dt = 'uint8'
  83. out_array[:, :, i] = numpy.zeros(
  84. out_shape,
  85. dtype=dt
  86. ) + channel_defaults[i]
  87. else:
  88. # use a recurrent call to compute the channel and store it
  89. # in the output
  90. out_array[:, :, i] = tile_raster_images(
  91. X[i], img_shape, tile_shape, tile_spacing,
  92. scale_rows_to_unit_interval, output_pixel_vals)
  93. return out_array
  94.  
  95. else:
  96. # if we are dealing with only one channel
  97. H, W = img_shape
  98. Hs, Ws = tile_spacing
  99.  
  100. # generate a matrix to store the output
  101. dt = X.dtype
  102. if output_pixel_vals:
  103. dt = 'uint8'
  104. out_array = numpy.zeros(out_shape, dtype=dt)
  105.  
  106. for tile_row in xrange(tile_shape[0]):
  107. for tile_col in xrange(tile_shape[1]):
  108. if tile_row * tile_shape[1] + tile_col < X.shape[0]:
  109. this_x = X[tile_row * tile_shape[1] + tile_col]
  110. if scale_rows_to_unit_interval:
  111. # if we should scale values to be between 0 and 1
  112. # do this by calling the `scale_to_unit_interval`
  113. # function
  114. this_img = scale_to_unit_interval(
  115. this_x.reshape(img_shape))
  116. else:
  117. this_img = this_x.reshape(img_shape)
  118. # add the slice to the corresponding position in the
  119. # output array
  120. c = 1
  121. if output_pixel_vals:
  122. c = 255
  123. out_array[
  124. tile_row * (H + Hs): tile_row * (H + Hs) + H,
  125. tile_col * (W + Ws): tile_col * (W + Ws) + W
  126. ] = this_img * c
  127. return out_array

所以整个训练代码如下:

  1. """
  2. This tutorial introduces denoising auto-encoders (dA) using Theano.
  3.  
  4. Denoising autoencoders are the building blocks for SdA.
  5. They are based on auto-encoders as the ones used in Bengio et al. 2007.
  6. An autoencoder takes an input x and first maps it to a hidden representation
  7. y = f_{\theta}(x) = s(Wx+b), parameterized by \theta={W,b}. The resulting
  8. latent representation y is then mapped back to a "reconstructed" vector
  9. z \in [0,1]^d in input space z = g_{\theta'}(y) = s(W'y + b'). The weight
  10. matrix W' can optionally be constrained such that W' = W^T, in which case
  11. the autoencoder is said to have tied weights. The network is trained such
  12. that to minimize the reconstruction error (the error between x and z).
  13.  
  14. For the denosing autoencoder, during training, first x is corrupted into
  15. \tilde{x}, where \tilde{x} is a partially destroyed version of x by means
  16. of a stochastic mapping. Afterwards y is computed as before (using
  17. \tilde{x}), y = s(W\tilde{x} + b) and z as s(W'y + b'). The reconstruction
  18. error is now measured between z and the uncorrupted input x, which is
  19. computed as the cross-entropy :
  20. - \sum_{k=1}^d[ x_k \log z_k + (1-x_k) \log( 1-z_k)]
  21.  
  22. References :
  23. - P. Vincent, H. Larochelle, Y. Bengio, P.A. Manzagol: Extracting and
  24. Composing Robust Features with Denoising Autoencoders, ICML'08, 1096-1103,
  25. 2008
  26. - Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle: Greedy Layer-Wise
  27. Training of Deep Networks, Advances in Neural Information Processing
  28. Systems 19, 2007
  29.  
  30. """
  31.  
  32. import os
  33. import sys
  34. import time
  35.  
  36. import numpy
  37.  
  38. import theano
  39. import theano.tensor as T
  40. from theano.tensor.shared_randomstreams import RandomStreams
  41.  
  42. from logistic_sgd import load_data
  43. from utils import tile_raster_images
  44.  
  45. try:
  46. import PIL.Image as Image
  47. except ImportError:
  48. import Image
  49.  
  50. # start-snippet-1
  51. class dA(object):
  52. """Denoising Auto-Encoder class (dA)
  53.  
  54. A denoising autoencoders tries to reconstruct the input from a corrupted
  55. version of it by projecting it first in a latent space and reprojecting
  56. it afterwards back in the input space. Please refer to Vincent et al.,2008
  57. for more details. If x is the input then equation (1) computes a partially
  58. destroyed version of x by means of a stochastic mapping q_D. Equation (2)
  59. computes the projection of the input into the latent space. Equation (3)
  60. computes the reconstruction of the input, while equation (4) computes the
  61. reconstruction error.
  62.  
  63. .. math::
  64.  
  65. \tilde{x} ~ q_D(\tilde{x}|x) (1)
  66.  
  67. y = s(W \tilde{x} + b) (2)
  68.  
  69. x = s(W' y + b') (3)
  70.  
  71. L(x,z) = -sum_{k=1}^d [x_k \log z_k + (1-x_k) \log( 1-z_k)] (4)
  72.  
  73. """
  74.  
  75. def __init__(
  76. self,
  77. numpy_rng,
  78. theano_rng=None,
  79. input=None,
  80. n_visible=784,
  81. n_hidden=500,
  82. W=None,
  83. bhid=None,
  84. bvis=None
  85. ):
  86. """
  87. Initialize the dA class by specifying the number of visible units (the
  88. dimension d of the input ), the number of hidden units ( the dimension
  89. d' of the latent or hidden space ) and the corruption level. The
  90. constructor also receives symbolic variables for the input, weights and
  91. bias. Such a symbolic variables are useful when, for example the input
  92. is the result of some computations, or when weights are shared between
  93. the dA and an MLP layer. When dealing with SdAs this always happens,
  94. the dA on layer 2 gets as input the output of the dA on layer 1,
  95. and the weights of the dA are used in the second stage of training
  96. to construct an MLP.
  97.  
  98. :type numpy_rng: numpy.random.RandomState
  99. :param numpy_rng: number random generator used to generate weights
  100.  
  101. :type theano_rng: theano.tensor.shared_randomstreams.RandomStreams
  102. :param theano_rng: Theano random generator; if None is given one is
  103. generated based on a seed drawn from `rng`
  104.  
  105. :type input: theano.tensor.TensorType
  106. :param input: a symbolic description of the input or None for
  107. standalone dA
  108.  
  109. :type n_visible: int
  110. :param n_visible: number of visible units
  111.  
  112. :type n_hidden: int
  113. :param n_hidden: number of hidden units
  114.  
  115. :type W: theano.tensor.TensorType
  116. :param W: Theano variable pointing to a set of weights that should be
  117. shared belong the dA and another architecture; if dA should
  118. be standalone set this to None
  119.  
  120. :type bhid: theano.tensor.TensorType
  121. :param bhid: Theano variable pointing to a set of biases values (for
  122. hidden units) that should be shared belong dA and another
  123. architecture; if dA should be standalone set this to None
  124.  
  125. :type bvis: theano.tensor.TensorType
  126. :param bvis: Theano variable pointing to a set of biases values (for
  127. visible units) that should be shared belong dA and another
  128. architecture; if dA should be standalone set this to None
  129.  
  130. """
  131. self.n_visible = n_visible
  132. self.n_hidden = n_hidden
  133.  
  134. # create a Theano random generator that gives symbolic random values
  135. if not theano_rng:
  136. theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))
  137.  
  138. # note : W' was written as `W_prime` and b' as `b_prime`
  139. if not W:
  140. # W is initialized with `initial_W` which is uniformely sampled
  141. # from -4*sqrt(6./(n_visible+n_hidden)) and
  142. # 4*sqrt(6./(n_hidden+n_visible))the output of uniform if
  143. # converted using asarray to dtype
  144. # theano.config.floatX so that the code is runable on GPU
  145. initial_W = numpy.asarray(
  146. numpy_rng.uniform(
  147. low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),
  148. high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),
  149. size=(n_visible, n_hidden)
  150. ),
  151. dtype=theano.config.floatX
  152. )
  153. W = theano.shared(value=initial_W, name='W', borrow=True)
  154.  
  155. if not bvis:
  156. bvis = theano.shared(
  157. value=numpy.zeros(
  158. n_visible,
  159. dtype=theano.config.floatX
  160. ),
  161. borrow=True
  162. )
  163.  
  164. if not bhid:
  165. bhid = theano.shared(
  166. value=numpy.zeros(
  167. n_hidden,
  168. dtype=theano.config.floatX
  169. ),
  170. name='b',
  171. borrow=True
  172. )
  173.  
  174. self.W = W
  175. # b corresponds to the bias of the hidden
  176. self.b = bhid
  177. # b_prime corresponds to the bias of the visible
  178. self.b_prime = bvis
  179. # tied weights, therefore W_prime is W transpose
  180. self.W_prime = self.W.T
  181. self.theano_rng = theano_rng
  182. # if no input is given, generate a variable representing the input
  183. if input is None:
  184. # we use a matrix because we expect a minibatch of several
  185. # examples, each example being a row
  186. self.x = T.dmatrix(name='input')
  187. else:
  188. self.x = input
  189.  
  190. self.params = [self.W, self.b, self.b_prime]
  191. # end-snippet-1
  192.  
  193. def get_corrupted_input(self, input, corruption_level):
  194. """This function keeps ``1-corruption_level`` entries of the inputs the
  195. same and zero-out randomly selected subset of size ``coruption_level``
  196. Note : first argument of theano.rng.binomial is the shape(size) of
  197. random numbers that it should produce
  198. second argument is the number of trials
  199. third argument is the probability of success of any trial
  200.  
  201. this will produce an array of 0s and 1s where 1 has a
  202. probability of 1 - ``corruption_level`` and 0 with
  203. ``corruption_level``
  204.  
  205. The binomial function return int64 data type by
  206. default. int64 multiplicated by the input
  207. type(floatX) always return float64. To keep all data
  208. in floatX when floatX is float32, we set the dtype of
  209. the binomial to floatX. As in our case the value of
  210. the binomial is always 0 or 1, this don't change the
  211. result. This is needed to allow the gpu to work
  212. correctly as it only support float32 for now.
  213.  
  214. """
  215. return self.theano_rng.binomial(size=input.shape, n=1,
  216. p=1 - corruption_level,
  217. dtype=theano.config.floatX) * input
  218.  
  219. def get_hidden_values(self, input):
  220. """ Computes the values of the hidden layer """
  221. return T.nnet.sigmoid(T.dot(input, self.W) + self.b)
  222.  
  223. def get_reconstructed_input(self, hidden):
  224. """Computes the reconstructed input given the values of the
  225. hidden layer
  226.  
  227. """
  228. return T.nnet.sigmoid(T.dot(hidden, self.W_prime) + self.b_prime)
  229.  
  230. def get_cost_updates(self, corruption_level, learning_rate):
  231. """ This function computes the cost and the updates for one trainng
  232. step of the dA """
  233.  
  234. tilde_x = self.get_corrupted_input(self.x, corruption_level)
  235. y = self.get_hidden_values(tilde_x)
  236. z = self.get_reconstructed_input(y)
  237. # note : we sum over the size of a datapoint; if we are using
  238. # minibatches, L will be a vector, with one entry per
  239. # example in minibatch
  240. L = - T.sum(self.x * T.log(z) + (1 - self.x) * T.log(1 - z), axis=1)
  241. # note : L is now a vector, where each element is the
  242. # cross-entropy cost of the reconstruction of the
  243. # corresponding example of the minibatch. We need to
  244. # compute the average of all these to get the cost of
  245. # the minibatch
  246. cost = T.mean(L)
  247.  
  248. # compute the gradients of the cost of the `dA` with respect
  249. # to its parameters
  250. gparams = T.grad(cost, self.params)
  251. # generate the list of updates
  252. updates = [
  253. (param, param - learning_rate * gparam)
  254. for param, gparam in zip(self.params, gparams)
  255. ]
  256.  
  257. return (cost, updates)
  258.  
  259. def test_dA(learning_rate=0.1, training_epochs=15,
  260. dataset='mnist.pkl.gz',
  261. batch_size=20, output_folder='dA_plots'):
  262.  
  263. """
  264. This demo is tested on MNIST
  265.  
  266. :type learning_rate: float
  267. :param learning_rate: learning rate used for training the DeNosing
  268. AutoEncoder
  269.  
  270. :type training_epochs: int
  271. :param training_epochs: number of epochs used for training
  272.  
  273. :type dataset: string
  274. :param dataset: path to the picked dataset
  275.  
  276. """
  277. datasets = load_data(dataset)
  278. train_set_x, train_set_y = datasets[0]
  279.  
  280. # compute number of minibatches for training, validation and testing
  281. n_train_batches = train_set_x.get_value(borrow=True).shape[0] / batch_size
  282.  
  283. # allocate symbolic variables for the data
  284. index = T.lscalar() # index to a [mini]batch
  285. x = T.matrix('x') # the data is presented as rasterized images
  286.  
  287. if not os.path.isdir(output_folder):
  288. os.makedirs(output_folder)
  289. os.chdir(output_folder)
  290. ####################################
  291. # BUILDING THE MODEL NO CORRUPTION #
  292. ####################################
  293.  
  294. rng = numpy.random.RandomState(123)
  295. theano_rng = RandomStreams(rng.randint(2 ** 30))
  296.  
  297. da = dA(
  298. numpy_rng=rng,
  299. theano_rng=theano_rng,
  300. input=x,
  301. n_visible=28 * 28,
  302. n_hidden=500
  303. )
  304.  
  305. cost, updates = da.get_cost_updates(
  306. corruption_level=0.,
  307. learning_rate=learning_rate
  308. )
  309.  
  310. train_da = theano.function(
  311. [index],
  312. cost,
  313. updates=updates,
  314. givens={
  315. x: train_set_x[index * batch_size: (index + 1) * batch_size]
  316. }
  317. )
  318.  
  319. start_time = time.clock()
  320.  
  321. ############
  322. # TRAINING #
  323. ############
  324.  
  325. # go through training epochs
  326. for epoch in xrange(training_epochs):
  327. # go through trainng set
  328. c = []
  329. for batch_index in xrange(n_train_batches):
  330. c.append(train_da(batch_index))
  331.  
  332. print 'Training epoch %d, cost ' % epoch, numpy.mean(c)
  333.  
  334. end_time = time.clock()
  335.  
  336. training_time = (end_time - start_time)
  337.  
  338. print >> sys.stderr, ('The no corruption code for file ' +
  339. os.path.split(__file__)[1] +
  340. ' ran for %.2fm' % ((training_time) / 60.))
  341. image = Image.fromarray(
  342. tile_raster_images(X=da.W.get_value(borrow=True).T,
  343. img_shape=(28, 28), tile_shape=(10, 10),
  344. tile_spacing=(1, 1)))
  345. image.save('filters_corruption_0.png')
  346.  
  347. #####################################
  348. # BUILDING THE MODEL CORRUPTION 30% #
  349. #####################################
  350.  
  351. rng = numpy.random.RandomState(123)
  352. theano_rng = RandomStreams(rng.randint(2 ** 30))
  353.  
  354. da = dA(
  355. numpy_rng=rng,
  356. theano_rng=theano_rng,
  357. input=x,
  358. n_visible=28 * 28,
  359. n_hidden=500
  360. )
  361.  
  362. cost, updates = da.get_cost_updates(
  363. corruption_level=0.3,
  364. learning_rate=learning_rate
  365. )
  366.  
  367. train_da = theano.function(
  368. [index],
  369. cost,
  370. updates=updates,
  371. givens={
  372. x: train_set_x[index * batch_size: (index + 1) * batch_size]
  373. }
  374. )
  375.  
  376. start_time = time.clock()
  377.  
  378. ############
  379. # TRAINING #
  380. ############
  381.  
  382. # go through training epochs
  383. for epoch in xrange(training_epochs):
  384. # go through trainng set
  385. c = []
  386. for batch_index in xrange(n_train_batches):
  387. c.append(train_da(batch_index))
  388.  
  389. print 'Training epoch %d, cost ' % epoch, numpy.mean(c)
  390.  
  391. end_time = time.clock()
  392.  
  393. training_time = (end_time - start_time)
  394.  
  395. print >> sys.stderr, ('The 30% corruption code for file ' +
  396. os.path.split(__file__)[1] +
  397. ' ran for %.2fm' % (training_time / 60.))
  398.  
  399. image = Image.fromarray(tile_raster_images(
  400. X=da.W.get_value(borrow=True).T,
  401. img_shape=(28, 28), tile_shape=(10, 10),
  402. tile_spacing=(1, 1)))
  403. image.save('filters_corruption_30.png')
  404.  
  405. os.chdir('../')
  406.  
  407. if __name__ == '__main__':
  408. test_dA()
  1. ... loading data
  2. Training epoch 0, cost 63.2891694201
  3. Training epoch 1, cost 55.7866565443
  4. Training epoch 2, cost 54.7631168984
  5. Training epoch 3, cost 54.2420533514
  6. Training epoch 4, cost 53.888670659
  7. Training epoch 5, cost 53.6203505434
  8. Training epoch 6, cost 53.4037459012
  9. Training epoch 7, cost 53.2219976788
  10. Training epoch 8, cost 53.0658010178
  11. Training epoch 9, cost 52.9295596873
  12. Training epoch 10, cost 52.8094163525
  13. Training epoch 11, cost 52.7024367362
  14. Training epoch 12, cost 52.606310148
  15. Training epoch 13, cost 52.5191693641
  16. Training epoch 14, cost 52.4395240004
  17. The no corruption code for file dA.py ran for 10.21m
  18. Training epoch 0, cost 81.7714190632
  19. Training epoch 1, cost 73.4285756365
  20. Training epoch 2, cost 70.8632686268
  21. Training epoch 3, cost 69.3396642015
  22. Training epoch 4, cost 68.4134660704
  23. Training epoch 5, cost 67.723705304
  24. Training epoch 6, cost 67.2401360252
  25. Training epoch 7, cost 66.849303071
  26. Training epoch 8, cost 66.5663948395
  27. Training epoch 9, cost 66.3591257941
  28. Training epoch 10, cost 66.1336658308
  29. Training epoch 11, cost 65.9893924612
  30. Training epoch 12, cost 65.8344131768
  31. Training epoch 13, cost 65.7185348901
  32. Training epoch 14, cost 65.6010749532
  33. The 30% corruption code for file dA.py ran for 10.37m

没有加噪声训练出的权重可视化结果:

加了30%噪声训练出的权重可视化结果:

学习资料来源:http://deeplearning.net/tutorial/dA.html#daa

基于theano的降噪自动编码器(Denoising Autoencoders--DA)的更多相关文章

  1. 降噪自动编码器(Denoising Autoencoder)

    起源:PCA.特征提取.... 随着一些奇怪的高维数据出现,比如图像.语音,传统的统计学-机器学习方法遇到了前所未有的挑战. 数据维度过高,数据单调,噪声分布广,传统方法的“数值游戏”很难奏效.数据挖 ...

  2. theano学习指南5(翻译)- 降噪自动编码器

    降噪自动编码器是经典的自动编码器的一种扩展,它最初被当作深度网络的一个模块使用 [Vincent08].这篇指南中,我们首先也简单的讨论一下自动编码器. 自动编码器 文献[Bengio09] 给出了自 ...

  3. 堆叠式降噪自动编码器(SDA)

    1.1 自动编码器  自动编码器(AutoEncoder,AE)就是一种尽可能复现输入信号的神经网络,其输出向量与输入向量同维,常按照输入向量的某种形式,通过隐层学习一个数据的表示或对原始数据进行有效 ...

  4. (转) 基于Theano的深度学习(Deep Learning)框架Keras学习随笔-01-FAQ

    特别棒的一篇文章,仍不住转一下,留着以后需要时阅读 基于Theano的深度学习(Deep Learning)框架Keras学习随笔-01-FAQ

  5. 基于Theano的深度学习框架keras及配合SVM训练模型

    https://blog.csdn.net/a819825294/article/details/51334397 1.介绍 Keras是基于Theano的一个深度学习框架,它的设计参考了Torch, ...

  6. 基于Theano的DL的开源小框架:Dragon

    Link:https://github.com/neopenx/Dragon 起因 最近看了Hinton的Dropout,发现原来的乱代码只能在Softmax层前面加Dropout.索性把整个Thea ...

  7. Keras:基于Theano和TensorFlow的深度学习库

    catalogue . 引言 . 一些基本概念 . Sequential模型 . 泛型模型 . 常用层 . 卷积层 . 池化层 . 递归层Recurrent . 嵌入层 Embedding 1. 引言 ...

  8. [DL] 基于theano.tensor.dot的逻辑回归代码中的SGD部分的疑问探幽

    在Hinton的教程中, 使用Python的theano库搭建的CNN是其中重要一环, 而其中的所谓的SGD - stochastic gradient descend算法又是如何实现的呢? 看下面源 ...

  9. 基于theano的深度卷积神经网络

    使用了两个卷积层.一个全连接层和一个softmax分类器. 在测试数据集上正确率可以达到99.22%. 代码参考了neural-networks-and-deep-learning #coding:u ...

随机推荐

  1. The Unique MST POJ - 1679 (次小生成树)

    Given a connected undirected graph, tell if its minimum spanning tree is unique. Definition 1 (Spann ...

  2. 2017ACM/ICPC亚洲区沈阳站-重现赛

    HDU 6222 Heron and His Triangle 链接:http://acm.hdu.edu.cn/showproblem.php?pid=6222 思路: 打表找规律+大数运算 首先我 ...

  3. [luogu1912][bzoj4196][NOI2015]软件管理器

    题解 树剖模板题,每次改变是\(1\)或者是\(0\),区间求和和区间修改就可了. ac代码 # include <cstdio> # include <cstring> # ...

  4. luogu3295 萌萌哒 (并查集+ST表)

    如果给相同的位置连边,最后联通块数是n,最后答案就是$9*10^{n-1}$ 但直接连边是$O(n^2)$的 所以事先处理出一个ST表,每次O(1)地给那个ST表连边 最后再一点一点下放,就是把在这层 ...

  5. luogu2336 喵星球上的点名 (SA+二分答案+树状数组)

    离散化一下然后把姓名串和询问串都放一起做SA 和bzoj3277串类似地,满足某一询问的后缀(就是和这个询问对应的后缀的LCP>=这个询问长度的后缀)的排名也是一个区间,把这个区间二分出来即可 ...

  6. shell一些不为人知的技巧

    !$!$是一个特殊的环境变量,它代表了上一个命令的最后一个字符串.如:你可能会这样:$mkdir mydir$mv mydir yourdir$cd yourdir可以改成:$mkdir mydir$ ...

  7. A1036. Boys vs Girls

    This time you are asked to tell the difference between the lowest grade of all the male students and ...

  8. JAVA注释的另一种神奇用法

    每个JAVA程序员在写程序的时候一定都会用到注释,本篇博客不是讲怎么定义注释,而是说明注释神奇的一种写法. /** * 这是一个测试类 */ public class Test { /** * 程序的 ...

  9. C++11并发——多线程std::thread (一)

    https://www.cnblogs.com/haippy/p/3284540.html 与 C++11 多线程相关的头文件 C++11 新标准中引入了四个头文件来支持多线程编程,他们分别是< ...

  10. Code-force 1003 E Tree Constructing

                                                                                                    E. T ...