前面记到了深度网络这一章。当时觉得练习应该挺简单的,用不了多少时间,结果训练时间真够长的...途中debug的时候还手贱的clear了一下,又得从头开始运行。不过最终还是调试成功了,sigh~

前一篇博文讲了深度网络的一些基本知识,这次讲义中的练习还是针对MNIST手写库,主要步骤是训练两个自编码器,然后进行softmax回归,最后再整体进行一次微调。

训练自编码器以及softmax回归都是利用前面已经写好的代码。微调部分的代码其实就是一次反向传播。

以下就是代码:

主程序部分:

stackedAEExercise.m

  1. % For the purpose of completing the assignment, you do not need to
  2. % change the code in this file.
  3. %
  4. %%======================================================================
  5. %% STEP 0: Here we provide the relevant parameters values that will
  6. % allow your sparse autoencoder to get good filters; you do not need to
  7. % change the parameters below.
  8. DISPLAY = true;
  9. inputSize = 28 * 28;
  10. numClasses = 10;
  11. hiddenSizeL1 = 200; % Layer 1 Hidden Size
  12. hiddenSizeL2 = 200; % Layer 2 Hidden Size
  13. sparsityParam = 0.1; % desired average activation of the hidden units.
  14. % (This was denoted by the Greek alphabet rho, which looks like a lower-case "p",
  15. % in the lecture notes).
  16. lambda = 3e-3; % weight decay parameter
  17. beta = 3; % weight of sparsity penalty term
  18.  
  19. %%======================================================================
  20. %% STEP 1: Load data from the MNIST database
  21. %
  22. % This loads our training data from the MNIST database files.
  23.  
  24. % Load MNIST database files
  25. trainData = loadMNISTImages('mnist/train-images-idx3-ubyte');
  26. trainLabels = loadMNISTLabels('mnist/train-labels-idx1-ubyte');
  27.  
  28. trainLabels(trainLabels == 0) = 10; % Remap 0 to 10 since our labels need to start from 1
  29.  
  30. %%======================================================================
  31. %% STEP 2: Train the first sparse autoencoder
  32. % This trains the first sparse autoencoder on the unlabelled STL training
  33. % images.
  34. % If you've correctly implemented sparseAutoencoderCost.m, you don't need
  35. % to change anything here.
  36.  
  37. % Randomly initialize the parameters
  38. sae1Theta = initializeParameters(hiddenSizeL1, inputSize);
  39.  
  40. %% ---------------------- YOUR CODE HERE ---------------------------------
  41. % Instructions: Train the first layer sparse autoencoder, this layer has
  42. % an hidden size of "hiddenSizeL1"
  43. % You should store the optimal parameters in sae1OptTheta
  44.  
  45. % Use minFunc to minimize the function
  46. addpath minFunc/
  47. options.Method = 'lbfgs'; % Here, we use L-BFGS to optimize our cost
  48. % function. Generally, for minFunc to work, you
  49. % need a function pointer with two outputs: the
  50. % function value and the gradient. In our problem,
  51. % sparseAutoencoderCost.m satisfies this.
  52. options.maxIter = 400; % Maximum number of iterations of L-BFGS to run
  53. options.display = 'on';
  54.  
  55. [sae1optTheta, cost] = minFunc( @(p) sparseAutoencoderCost(p, ...
  56. inputSize, hiddenSizeL1, ...
  57. lambda, sparsityParam, ...
  58. beta, trainData), ...
  59. sae1Theta, options);
  60.  
  61. %-------------------------------------------------------------------------
  62.  
  63. %======================================================================
  64. % STEP 2: Train the second sparse autoencoder
  65.  
  66. %This trains the second sparse autoencoder on the first autoencoder
  67. %featurse.
  68. %If you've correctly implemented sparseAutoencoderCost.m, you don't need
  69. %to change anything here.
  70.  
  71. [sae1Features] = feedForwardAutoencoder(sae1optTheta, hiddenSizeL1, ...
  72. inputSize, trainData);
  73.  
  74. % Randomly initialize the parameters
  75. sae2Theta = initializeParameters(hiddenSizeL2, hiddenSizeL1);
  76.  
  77. %% ---------------------- YOUR CODE HERE ---------------------------------
  78. % Instructions: Train the second layer sparse autoencoder, this layer has
  79. % an hidden size of "hiddenSizeL2" and an inputsize of
  80. % "hiddenSizeL1"
  81. %
  82. % You should store the optimal parameters in sae2OptTheta
  83.  
  84. [sae2opttheta, cost] = minFunc( @(p) sparseAutoencoderCost(p, ...
  85. hiddenSizeL1, hiddenSizeL2, ...
  86. lambda, sparsityParam, ...
  87. beta, sae1Features), ...
  88. sae2Theta, options);
  89.  
  90. %-------------------------------------------------------------------------
  91.  
  92. %======================================================================
  93. %% STEP 3: Train the softmax classifier
  94. % This trains the sparse autoencoder on the second autoencoder features.
  95. % If you've correctly implemented softmaxCost.m, you don't need
  96. % to change anything here.
  97.  
  98. [sae2Features] = feedForwardAutoencoder(sae2opttheta, hiddenSizeL2, ...
  99. hiddenSizeL1, sae1Features);
  100.  
  101. % Randomly initialize the parameters
  102. saeSoftmaxTheta = 0.005 * randn(hiddenSizeL2 * numClasses, 1);
  103.  
  104. %% ---------------------- YOUR CODE HERE ---------------------------------
  105. % Instructions: Train the softmax classifier, the classifier takes in
  106. % input of dimension "hiddenSizeL2" corresponding to the
  107. % hidden layer size of the 2nd layer.
  108. %
  109. % You should store the optimal parameters in saeSoftmaxOptTheta
  110. %
  111. % NOTE: If you used softmaxTrain to complete this part of the exercise,
  112. % set saeSoftmaxOptTheta = softmaxModel.optTheta(:);
  113.  
  114. options.maxIter = 100;
  115. softmax_lambda = 1e-4;
  116.  
  117. numLabels = 10;
  118. softmaxModel = softmaxTrain(hiddenSizeL2, numLabels, softmax_lambda, ...
  119. sae2Features, trainLabels, options);
  120. saeSoftmaxOptTheta = softmaxModel.optTheta(:);
  121.  
  122. %-------------------------------------------------------------------------
  123.  
  124. %======================================================================
  125. %% STEP 5: Finetune softmax model
  126.  
  127. % Implement the stackedAECost to give the combined cost of the whole model
  128. % then run this cell.
  129.  
  130. % Initialize the stack using the parameters learned
  131. inputSize = 28*28;
  132. stack = cell(2,1);
  133. stack{1}.w = reshape(sae1optTheta(1:hiddenSizeL1*inputSize), ...
  134. hiddenSizeL1, inputSize);
  135. stack{1}.b = sae1optTheta(2*hiddenSizeL1*inputSize+1:2*hiddenSizeL1*inputSize+hiddenSizeL1);
  136. stack{2}.w = reshape(sae2opttheta(1:hiddenSizeL2*hiddenSizeL1), ...
  137. hiddenSizeL2, hiddenSizeL1);
  138. stack{2}.b = sae2opttheta(2*hiddenSizeL2*hiddenSizeL1+1:2*hiddenSizeL2*hiddenSizeL1+hiddenSizeL2);
  139.  
  140. % Initialize the parameters for the deep model
  141. [stackparams, netconfig] = stack2params(stack);
  142. stackedAETheta = [ saeSoftmaxOptTheta ; stackparams ];
  143.  
  144. %% ---------------------- YOUR CODE HERE ---------------------------------
  145. % Instructions: Train the deep network, hidden size here refers to the '
  146. % dimension of the input to the classifier, which corresponds
  147. % to "hiddenSizeL2".
  148. %
  149. %
  150. [stackedAEOptTheta, cost] = minFunc( @(p) stackedAECost(p, inputSize, hiddenSizeL2, ...
  151. numClasses, netconfig, ...
  152. lambda, trainData, trainLabels), ...
  153. stackedAETheta,options);
  154.  
  155. % -------------------------------------------------------------------------
  156.  
  157. %%======================================================================
  158. %% STEP 6: Test
  159. % Instructions: You will need to complete the code in stackedAEPredict.m
  160. % before running this part of the code
  161. %
  162.  
  163. % Get labelled test images
  164. % Note that we apply the same kind of preprocessing as the training set
  165. testData = loadMNISTImages('mnist/t10k-images-idx3-ubyte');
  166. testLabels = loadMNISTLabels('mnist/t10k-labels-idx1-ubyte');
  167.  
  168. testLabels(testLabels == 0) = 10; % Remap 0 to 10
  169.  
  170. [pred] = stackedAEPredict(stackedAETheta, inputSize, hiddenSizeL2, ...
  171. numClasses, netconfig, testData);
  172.  
  173. acc = mean(testLabels(:) == pred(:));
  174. fprintf('Before Finetuning Test Accuracy: %0.3f%%\n', acc * 100);
  175.  
  176. [pred] = stackedAEPredict(stackedAEOptTheta, inputSize, hiddenSizeL2, ...
  177. numClasses, netconfig, testData);
  178.  
  179. acc = mean(testLabels(:) == pred(:));
  180. fprintf('After Finetuning Test Accuracy: %0.3f%%\n', acc * 100);
  181.  
  182. % Accuracy is the proportion of correctly classified images
  183. % The results for our implementation were:
  184. %
  185. % Before Finetuning Test Accuracy: 87.7%
  186. % After Finetuning Test Accuracy: 97.6%
  187. %
  188. % If your values are too low (accuracy less than 95%), you should check
  189. % your code for errors, and make sure you are training on the
  190. % entire data set of 60000 28x28 training images
  191. % (unless you modified the loading code, this should be the case)

 微调部分的代价函数:

stackedAECost.m

  1. function [ cost, grad ] = stackedAECost(theta, inputSize, hiddenSize, ...
  2. numClasses, netconfig, ...
  3. lambda, data, labels)
  4.  
  5. % stackedAECost: Takes a trained softmaxTheta and a training data set with labels,
  6. % and returns cost and gradient using a stacked autoencoder model. Used for
  7. % finetuning.
  8.  
  9. % theta: trained weights from the autoencoder
  10. % visibleSize: the number of input units
  11. % hiddenSize: the number of hidden units *at the 2nd layer*
  12. % numClasses: the number of categories
  13. % netconfig: the network configuration of the stack
  14. % lambda: the weight regularization penalty
  15. % data: Our matrix containing the training data as columns. So, data(:,i) is the i-th training example.
  16. % labels: A vector containing labels, where labels(i) is the label for the
  17. % i-th training example
  18.  
  19. %% Unroll softmaxTheta parameter
  20.  
  21. % We first extract the part which compute the softmax gradient
  22. softmaxTheta = reshape(theta(1:hiddenSize*numClasses), numClasses, hiddenSize);
  23.  
  24. % Extract out the "stack"
  25. stack = params2stack(theta(hiddenSize*numClasses+1:end), netconfig);
  26.  
  27. % You will need to compute the following gradients
  28. softmaxThetaGrad = zeros(size(softmaxTheta));
  29. stackgrad = cell(size(stack));
  30. for d = 1:numel(stack)
  31. stackgrad{d}.w = zeros(size(stack{d}.w));
  32. stackgrad{d}.b = zeros(size(stack{d}.b));
  33. end
  34.  
  35. cost = 0; % You need to compute this
  36.  
  37. % You might find these variables useful
  38. M = size(data, 2);
  39. groundTruth = full(sparse(labels, 1:M, 1));
  40.  
  41. %% --------------------------- YOUR CODE HERE -----------------------------
  42. % Instructions: Compute the cost function and gradient vector for
  43. % the stacked autoencoder.
  44. %
  45. % You are given a stack variable which is a cell-array of
  46. % the weights and biases for every layer. In particular, you
  47. % can refer to the weights of Layer d, using stack{d}.w and
  48. % the biases using stack{d}.b . To get the total number of
  49. % layers, you can use numel(stack).
  50. %
  51. % The last layer of the network is connected to the softmax
  52. % classification layer, softmaxTheta.
  53. %
  54. % You should compute the gradients for the softmaxTheta,
  55. % storing that in softmaxThetaGrad. Similarly, you should
  56. % compute the gradients for each layer in the stack, storing
  57. % the gradients in stackgrad{d}.w and stackgrad{d}.b
  58. % Note that the size of the matrices in stackgrad should
  59. % match exactly that of the size of the matrices in stack.
  60. %
  61. %----------先计算az----------------
  62. d = numel(stack); %stack的深度
  63. n = d+1; %网络层数
  64. a = cell(n,1);
  65. z = cell(n,1);
  66. a{1} = data; %a{1}设成输入数据
  67. for l = 2:n %给a{2,...n}和z{2,,...n}赋值
  68. z{l} = stack{l-1}.w * a{l-1} + repmat(stack{l-1}.b,[1,size(a{l-1},2)]);
  69. a{l} = sigmoid(z{l});
  70. end
  71. %------------------------------------
  72.  
  73. %-------------计算softmax的代价函数和梯度函数-------------
  74. Ma = softmaxTheta * a{n};
  75. NorM = bsxfun(@minus, Ma, max(Ma, [], 1)); %归一化,每列减去此列的最大值,使得M的每个元素不至于太大。
  76. ExpM = exp(NorM);
  77. P = bsxfun(@rdivide,ExpM,sum(ExpM)); %概率
  78. cost = -1/M*(groundTruth(:)'*log(P(:)))+lambda/2*(softmaxTheta(:)'*softmaxTheta(:)); %代价函数
  79. softmaxThetaGrad = -1/M*((groundTruth-P)*a{n}') + lambda*softmaxTheta; %梯度
  80. %--------------------------------------------------------
  81.  
  82. %--------------计算每一层的delta---------------------
  83. delta = cell(n);
  84. delta{n} = -softmaxTheta'*(groundTruth-P).*(a{n}).*(1-a{n}); %可以参照前面讲义BP算法的实现
  85. for l = n-1:-1:1
  86. delta{l} = stack{l}.w' * delta{l+1}.*(a{l}).*(1-a{l});
  87. end
  88. %----------------------------------------------------
  89.  
  90. %--------------计算每一层的w和b的梯度-----------------
  91. for l = n-1:-1:1
  92. stackgrad{l}.w = (1/M)*delta{l+1}*a{l}';
  93. stackgrad{l}.b = (1/M)*sum(delta{l+1},2);
  94. end
  95. %----------------------------------------------------
  96.  
  97. % -------------------------------------------------------------------------
  98.  
  99. %% Roll gradient vector
  100. grad = [softmaxThetaGrad(:) ; stack2params(stackgrad)];
  101.  
  102. end
  103.  
  104. % You might find this useful
  105. function sigm = sigmoid(x)
  106. sigm = 1 ./ (1 + exp(-x));
  107. end

预测函数:

stackedAEPredict.m

  1. function [pred] = stackedAEPredict(theta, inputSize, hiddenSize, numClasses, netconfig, data)
  2.  
  3. % stackedAEPredict: Takes a trained theta and a test data set,
  4. % and returns the predicted labels for each example.
  5.  
  6. % theta: trained weights from the autoencoder
  7. % visibleSize: the number of input units
  8. % hiddenSize: the number of hidden units *at the 2nd layer*
  9. % numClasses: the number of categories
  10. % data: Our matrix containing the training data as columns. So, data(:,i) is the i-th training example.
  11.  
  12. % Your code should produce the prediction matrix
  13. % pred, where pred(i) is argmax_c P(y(c) | x(i)).
  14.  
  15. %% Unroll theta parameter
  16.  
  17. % We first extract the part which compute the softmax gradient
  18. softmaxTheta = reshape(theta(1:hiddenSize*numClasses), numClasses, hiddenSize);
  19.  
  20. % Extract out the "stack"
  21. stack = params2stack(theta(hiddenSize*numClasses+1:end), netconfig);
  22.  
  23. %% ---------- YOUR CODE HERE --------------------------------------
  24. % Instructions: Compute pred using theta assuming that the labels start
  25. % from 1.
  26. %
  27. %----------先计算az----------------
  28. d = numel(stack); %stack的深度
  29. n = d+1; %网络层数
  30. a = cell(n,1);
  31. z = cell(n,1);
  32. a{1} = data; %a{1}设成输入数据
  33. for l = 2:n %给a{2,...n}和z{2,,...n}赋值
  34. z{l} = stack{l-1}.w * a{l-1} + repmat(stack{l-1}.b,[1,size(a{l-1},2)]);
  35. a{l} = sigmoid(z{l});
  36. end
  37. %-------------------------------------
  38. M = softmaxTheta * a{n};
  39. [Y,pred] = max(M,[],1);
  40.  
  41. % -----------------------------------------------------------
  42.  
  43. end
  44.  
  45. % You might find this useful
  46. function sigm = sigmoid(x)
  47. sigm = 1 ./ (1 + exp(-x));
  48. end

最后结果:

跟讲义以及程序注释中有点差别,特别是没有微调的结果,讲义中提到是不到百分之九十的,这里算出来是百分之九十四左右:

但是微调后的结果基本是一样的。

PS:讲义地址:http://deeplearning.stanford.edu/wiki/index.php/Exercise:_Implement_deep_networks_for_digit_classification

Deep Learning 学习随记(五)深度网络--续的更多相关文章

  1. Deep Learning 学习随记(三)续 Softmax regression练习

    上一篇讲的Softmax regression,当时时间不够,没把练习做完.这几天学车有点累,又特别想动动手自己写写matlab代码 所以等到了现在,这篇文章就当做上一篇的续吧. 回顾: 上一篇最后给 ...

  2. Deep Learning 学习随记(五)Deep network 深度网络

    这一个多周忙别的事去了,忙完了,接着看讲义~ 这章讲的是深度网络(Deep Network).前面讲了自学习网络,通过稀疏自编码和一个logistic回归或者softmax回归连接,显然是3层的.而这 ...

  3. 深度学习笔记之关于总结、展望、参考文献和Deep Learning学习资源(五)

    不多说,直接上干货! 十.总结与展望 1)Deep learning总结 深度学习是关于自动学习要建模的数据的潜在(隐含)分布的多层(复杂)表达的算法.换句话来说,深度学习算法自动的提取分类需要的低层 ...

  4. Deep Learning学习随记(一)稀疏自编码器

    最近开始看Deep Learning,随手记点,方便以后查看. 主要参考资料是Stanford 教授 Andrew Ng 的 Deep Learning 教程讲义:http://deeplearnin ...

  5. Deep Learning 学习随记(七)Convolution and Pooling --卷积和池化

    图像大小与参数个数: 前面几章都是针对小图像块处理的,这一章则是针对大图像进行处理的.两者在这的区别还是很明显的,小图像(如8*8,MINIST的28*28)可以采用全连接的方式(即输入层和隐含层直接 ...

  6. Deep Learning 学习随记(四)自学习和非监督特征学习

    接着看讲义,接下来这章应该是Self-Taught Learning and Unsupervised Feature Learning. 含义: 从字面上不难理解其意思.这里的self-taught ...

  7. Deep Learning学习随记(二)Vectorized、PCA和Whitening

    接着上次的记,前面看了稀疏自编码.按照讲义,接下来是Vectorized, 翻译成向量化?暂且这么认为吧. Vectorized: 这节是老师教我们编程技巧了,这个向量化的意思说白了就是利用已经被优化 ...

  8. Deep Learning 学习随记(八)CNN(Convolutional neural network)理解

    前面Andrew Ng的讲义基本看完了.Andrew讲的真是通俗易懂,只是不过瘾啊,讲的太少了.趁着看完那章convolution and pooling, 自己又去翻了翻CNN的相关东西. 当时看讲 ...

  9. Deep Learning 学习随记(六)Linear Decoder 线性解码

    线性解码器(Linear Decoder) 前面第一章提到稀疏自编码器(http://www.cnblogs.com/bzjia-blog/p/SparseAutoencoder.html)的三层网络 ...

随机推荐

  1. lc面试准备:Candy

    1 题目 There are N children standing in a line. Each child is assigned a rating value. You are giving ...

  2. 【POJ】1056 IMMEDIATE DECODABILITY

    字典树水题. #include <cstdio> #include <cstring> #include <cstdlib> typedef struct Trie ...

  3. WIA

    一台扫描仪,实际上就是一个Device对象,因此,我们可以通过DeviceManager来“获取”这台设备的“引用”,然后通过得到的Device对象,执行相应的扫描工作.从而跳过了使用ShowAcqu ...

  4. Json 的日期格式转换成DateTime

    JSON 的日期形式:”/Date(1242357713797+0800)/” , 下面我们就用以下C#的方法将他转换成DateTime类型: /// <summary> /// Json ...

  5. C++中的虚函数总结

    一.什么是虚函数.纯虚函数.抽象基类 虚函数:在某基类中声明为 virtual 并在一个或多个派生类中被重新定 义的成员函数. 纯虚函数:是一种特殊的虚函数,使用virtual关键字,并且在其后面加上 ...

  6. UIView hitTest的原理

    [转自:http://blog.sina.com.cn/s/blog_59fb90df0101ab26.html] UIView 两个方法: - (UIView *)hitTest:(CGPoint) ...

  7. bzoj 1096 [ZJOI2007]仓库建设(关于斜率优化问题的总结)

    1096: [ZJOI2007]仓库建设 Time Limit: 10 Sec  Memory Limit: 162 MBSubmit: 3234  Solved: 1388[Submit][Stat ...

  8. cocos2d的ARC开启

    ARC,官方解释是Automatic Reference Counting,是Apple公司从iOS5开始为开发者新添加的一个功能. 相信很多写移动开发,可能不只是移动开发的人都深有体会,创建一个对象 ...

  9. 《Linear Algebra and Its Applications》-chaper5-特征值与特征向量-基本概念

    基于之前章节的铺垫,我们这里能够很容易的引出特征向量和特征值的概念. 首先我们知道n x n矩阵的A和n维向量v的乘积会得到一个n维的向量,那么现在我们发现,经过计算u=Av,得到的向量u是和v共线的 ...

  10. 深入解析Java中volatile关键字的作用

    转(http://m.jb51.net/article/41185.htm)Java语言是支持多线程的,为了解决线程并发的问题,在语言内部引入了 同步块 和 volatile 关键字机制 在java线 ...