本文部分内容来自zouxy09的博客。谢谢。http://blog.csdn.net/zouxy09/article/details/9993371

以及斯坦福大学深度学习教程:http://ufldl.stanford.edu/wiki/index.php/UFLDL教程

CNN结构的连接比权值多非常多,由于权值共享。CNN通过数据驱动的方式学习得到一些滤波器,作为提取输入的特征的一种方法。

典型CNN中開始几层都是卷积和下採样交替,然后在最后是一些全连接层。

在全连接层时已经将全部两维特征map转化为全连接一维输入。

1、前向传播

如果该网络能处理c类分类问题,共N个训练样本。

定义平方误差代价函数:

2、反向传播


调整參数。

批量梯度下降法是一种经常使用的优化目标函数的方法,通过对目标函数关于參数求导,来更新參数。其目标函数延梯度下降的方向高速逼近最小值。所以每次迭代都依照例如以下公式对

反向传播算法的思路例如以下:

3、卷积神经网络训练參数时的不同处

3.1卷积层

CNN中卷积层的BP更新。

在卷积层,上层的特征map被一个能够学习的卷积核进行卷积,然后通过一个激活函数。就能够得到输出特征map。

每一个输出map能够组合卷积多个输入map,正向传播时例如以下计算:

对于卷积层參数的调整。在计算残差时看的是卷积层和下採样层间的连接,在调整參数时看的是上层和卷积层间的连接。

3.2下採样层

对于下採样层。输入N个特征map,则输出N个map。仅仅是每一个输出map就变小了。正向传播时例如以下计算:

watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQv/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="">

4、卷积神经网络代码实现

本代码须要下载MNIST数据,网上非常easy搜到,在这就不详细给出了。另外完整CNN实现代码,能够在此下载:https://github.com/rasmusbergpalm/DeepLearnToolbox
1、cnnapplygrads.m
  1. <pre name="code" class="html"><pre name="code" class="cpp">function net = cnnapplygrads(net, opts)
  2. for l = 2 : numel(net.layers)
  3. if strcmp(net.layers{l}.type, 'c')
  4. for j = 1 : numel(net.layers{l}.a)
  5. for ii = 1 : numel(net.layers{l - 1}.a)
  6. net.layers{l}.k{ii}{j} = net.layers{l}.k{ii}{j} - opts.alpha * net.layers{l}.dk{ii}{j};
  7. end
  8. net.layers{l}.b{j} = net.layers{l}.b{j} - opts.alpha * net.layers{l}.db{j};
  9. end
  10. end
  11. end
  12.  
  13. net.ffW = net.ffW - opts.alpha * net.dffW;
  14. net.ffb = net.ffb - opts.alpha * net.dffb;
  15. end

2、cnnbp.m

  1.  
  1. <pre name="code" class="cpp">function net = cnnbp(net, y)
  2. n = numel(net.layers);
  3.  
  4. % error
  5. net.e = net.o - y;
  6. % loss function
  7. net.L = 1/2* sum(net.e(:) .^ 2) / size(net.e, 2);
  8.  
  9. %% backprop deltas
  10. net.od = net.e .* (net.o .* (1 - net.o)); % output delta
  11. net.fvd = (net.ffW' * net.od); % feature vector delta
  12. if strcmp(net.layers{n}.type, 'c') % only conv layers has sigm function
  13. net.fvd = net.fvd .* (net.fv .* (1 - net.fv));
  14. end
  15.  
  16. % reshape feature vector deltas into output map style
  17. sa = size(net.layers{n}.a{1});
  18. fvnum = sa(1) * sa(2);
  19. for j = 1 : numel(net.layers{n}.a)
  20. net.layers{n}.d{j} = reshape(net.fvd(((j - 1) * fvnum + 1) : j * fvnum, :), sa(1), sa(2), sa(3));
  21. end
  22.  
  23. for l = (n - 1) : -1 : 1
  24. if strcmp(net.layers{l}.type, 'c')
  25. for j = 1 : numel(net.layers{l}.a)
  26. net.layers{l}.d{j} = net.layers{l}.a{j} .* (1 - net.layers{l}.a{j}) .* (expand(net.layers{l + 1}.d{j}, [net.layers{l + 1}.scale net.layers{l + 1}.scale 1]) / net.layers{l + 1}.scale ^ 2);
  27. end
  28. elseif strcmp(net.layers{l}.type, 's')
  29. for i = 1 : numel(net.layers{l}.a)
  30. z = zeros(size(net.layers{l}.a{1}));
  31. for j = 1 : numel(net.layers{l + 1}.a)
  32. z = z + convn(net.layers{l + 1}.d{j}, rot180(net.layers{l + 1}.k{i}{j}), 'full');
  33. end
  34. net.layers{l}.d{i} = z;
  35. end
  36. end
  37. end
  38.  
  39. %% calc gradients
  40. for l = 2 : n
  41. if strcmp(net.layers{l}.type, 'c')
  42. for j = 1 : numel(net.layers{l}.a)
  43. for i = 1 : numel(net.layers{l - 1}.a)
  44. net.layers{l}.dk{i}{j} = convn(flipall(net.layers{l - 1}.a{i}), net.layers{l}.d{j}, 'valid') / size(net.layers{l}.d{j}, 3);
  45. end
  46. net.layers{l}.db{j} = sum(net.layers{l}.d{j}(:)) / size(net.layers{l}.d{j}, 3);
  47. end
  48. end
  49. end
  50. net.dffW = net.od * (net.fv)' / size(net.od, 2);
  51. net.dffb = mean(net.od, 2);
  52.  
  53. function X = rot180(X)
  54. X = flipdim(flipdim(X, 1), 2);
  55. end
  56. end

3、cnnff.m

  1.  
  1. <pre name="code" class="cpp">function net = cnnff(net, x)
  2. n = numel(net.layers);
  3. net.layers{1}.a{1} = x;
  4. inputmaps = 1;
  5.  
  6. for l = 2 : n % for each layer
  7. if strcmp(net.layers{l}.type, 'c')
  8. % !!below can probably be handled by insane matrix operations
  9. for j = 1 : net.layers{l}.outputmaps % for each output map
  10. % create temp output map
  11. z = zeros(size(net.layers{l - 1}.a{1}) - [net.layers{l}.kernelsize - 1 net.layers{l}.kernelsize - 1 0]);
  12. for i = 1 : inputmaps % for each input map
  13. % convolve with corresponding kernel and add to temp output map
  14. z = z + convn(net.layers{l - 1}.a{i}, net.layers{l}.k{i}{j}, 'valid');
  15. end
  16. % add bias, pass through nonlinearity
  17. net.layers{l}.a{j} = sigm(z + net.layers{l}.b{j});
  18. end
  19. % set number of input maps to this layers number of outputmaps
  20. inputmaps = net.layers{l}.outputmaps;
  21. elseif strcmp(net.layers{l}.type, 's')
  22. % downsample
  23. for j = 1 : inputmaps
  24. z = convn(net.layers{l - 1}.a{j}, ones(net.layers{l}.scale) / (net.layers{l}.scale ^ 2), 'valid'); % !! replace with variable
  25. net.layers{l}.a{j} = z(1 : net.layers{l}.scale : end, 1 : net.layers{l}.scale : end, :);
  26. end
  27. end
  28. end
  29.  
  30. % concatenate all end layer feature maps into vector
  31. net.fv = [];
  32. for j = 1 : numel(net.layers{n}.a)
  33. sa = size(net.layers{n}.a{j});
  34. net.fv = [net.fv; reshape(net.layers{n}.a{j}, sa(1) * sa(2), sa(3))];
  35. end
  36. % feedforward into output perceptrons
  37. net.o = sigm(net.ffW * net.fv + repmat(net.ffb, 1, size(net.fv, 2)));
  38.  
  39. end

4、cnnnumgradcheck.m

  1.  
  1. <pre name="code" class="cpp">function cnnnumgradcheck(net, x, y)
  2. epsilon = 1e-4;
  3. er = 1e-8;
  4. n = numel(net.layers);
  5. for j = 1 : numel(net.ffb)
  6. net_m = net; net_p = net;
  7. net_p.ffb(j) = net_m.ffb(j) + epsilon;
  8. net_m.ffb(j) = net_m.ffb(j) - epsilon;
  9. net_m = cnnff(net_m, x); net_m = cnnbp(net_m, y);
  10. net_p = cnnff(net_p, x); net_p = cnnbp(net_p, y);
  11. d = (net_p.L - net_m.L) / (2 * epsilon);
  12. e = abs(d - net.dffb(j));
  13. if e > er
  14. error('numerical gradient checking failed');
  15. end
  16. end
  17.  
  18. for i = 1 : size(net.ffW, 1)
  19. for u = 1 : size(net.ffW, 2)
  20. net_m = net; net_p = net;
  21. net_p.ffW(i, u) = net_m.ffW(i, u) + epsilon;
  22. net_m.ffW(i, u) = net_m.ffW(i, u) - epsilon;
  23. net_m = cnnff(net_m, x); net_m = cnnbp(net_m, y);
  24. net_p = cnnff(net_p, x); net_p = cnnbp(net_p, y);
  25. d = (net_p.L - net_m.L) / (2 * epsilon);
  26. e = abs(d - net.dffW(i, u));
  27. if e > er
  28. error('numerical gradient checking failed');
  29. end
  30. end
  31. end
  32.  
  33. for l = n : -1 : 2
  34. if strcmp(net.layers{l}.type, 'c')
  35. for j = 1 : numel(net.layers{l}.a)
  36. net_m = net; net_p = net;
  37. net_p.layers{l}.b{j} = net_m.layers{l}.b{j} + epsilon;
  38. net_m.layers{l}.b{j} = net_m.layers{l}.b{j} - epsilon;
  39. net_m = cnnff(net_m, x); net_m = cnnbp(net_m, y);
  40. net_p = cnnff(net_p, x); net_p = cnnbp(net_p, y);
  41. d = (net_p.L - net_m.L) / (2 * epsilon);
  42. e = abs(d - net.layers{l}.db{j});
  43. if e > er
  44. error('numerical gradient checking failed');
  45. end
  46. for i = 1 : numel(net.layers{l - 1}.a)
  47. for u = 1 : size(net.layers{l}.k{i}{j}, 1)
  48. for v = 1 : size(net.layers{l}.k{i}{j}, 2)
  49. net_m = net; net_p = net;
  50. net_p.layers{l}.k{i}{j}(u, v) = net_p.layers{l}.k{i}{j}(u, v) + epsilon;
  51. net_m.layers{l}.k{i}{j}(u, v) = net_m.layers{l}.k{i}{j}(u, v) - epsilon;
  52. net_m = cnnff(net_m, x); net_m = cnnbp(net_m, y);
  53. net_p = cnnff(net_p, x); net_p = cnnbp(net_p, y);
  54. d = (net_p.L - net_m.L) / (2 * epsilon);
  55. e = abs(d - net.layers{l}.dk{i}{j}(u, v));
  56. if e > er
  57. error('numerical gradient checking failed');
  58. end
  59. end
  60. end
  61. end
  62. end
  63. elseif strcmp(net.layers{l}.type, 's')
  64. % for j = 1 : numel(net.layers{l}.a)
  65. % net_m = net; net_p = net;
  66. % net_p.layers{l}.b{j} = net_m.layers{l}.b{j} + epsilon;
  67. % net_m.layers{l}.b{j} = net_m.layers{l}.b{j} - epsilon;
  68. % net_m = cnnff(net_m, x); net_m = cnnbp(net_m, y);
  69. % net_p = cnnff(net_p, x); net_p = cnnbp(net_p, y);
  70. % d = (net_p.L - net_m.L) / (2 * epsilon);
  71. % e = abs(d - net.layers{l}.db{j});
  72. % if e > er
  73. % error('numerical gradient checking failed');
  74. % end
  75. % end
  76. end
  77. end
  78. % keyboard
  79. end

5、cnnsetup.m

  1.  
  1. <pre name="code" class="cpp">function net = cnnsetup(net, x, y)
  2. assert(~isOctave() || compare_versions(OCTAVE_VERSION, '3.8.0', '>='), ['Octave 3.8.0 or greater is required for CNNs as there is a bug in convolution in previous versions. See http://savannah.gnu.org/bugs/?
  3.  
  4. 39314. Your version is ' myOctaveVersion]);
  5. inputmaps = 1;
  6. mapsize = size(squeeze(x(:, :, 1)));
  7.  
  8. for l = 1 : numel(net.layers) % layer
  9. if strcmp(net.layers{l}.type, 's')
  10. mapsize = mapsize / net.layers{l}.scale;
  11. assert(all(floor(mapsize)==mapsize), ['Layer ' num2str(l) ' size must be integer. Actual: ' num2str(mapsize)]);
  12. for j = 1 : inputmaps
  13. net.layers{l}.b{j} = 0;
  14. end
  15. end
  16. if strcmp(net.layers{l}.type, 'c')
  17. mapsize = mapsize - net.layers{l}.kernelsize + 1;
  18. fan_out = net.layers{l}.outputmaps * net.layers{l}.kernelsize ^ 2;
  19. for j = 1 : net.layers{l}.outputmaps % output map
  20. fan_in = inputmaps * net.layers{l}.kernelsize ^ 2;
  21. for i = 1 : inputmaps % input map
  22. net.layers{l}.k{i}{j} = (rand(net.layers{l}.kernelsize) - 0.5) * 2 * sqrt(6 / (fan_in + fan_out));
  23. end
  24. net.layers{l}.b{j} = 0;
  25. end
  26. inputmaps = net.layers{l}.outputmaps;
  27. end
  28. end
  29. % 'onum' is the number of labels, that's why it is calculated using size(y, 1). If you have 20 labels so the output of the network will be 20 neurons.
  30. % 'fvnum' is the number of output neurons at the last layer, the layer just before the output layer.
  31. % 'ffb' is the biases of the output neurons.
  32. % 'ffW' is the weights between the last layer and the output neurons. Note that the last layer is fully connected to the output layer, that's why the size of the weights is (onum * fvnum)
  33. fvnum = prod(mapsize) * inputmaps;
  34. onum = size(y, 1);
  35.  
  36. net.ffb = zeros(onum, 1);
  37. net.ffW = (rand(onum, fvnum) - 0.5) * 2 * sqrt(6 / (onum + fvnum));
  38. end

6、cnntest.m

  1. function [er, bad] = cnntest(net, x, y)
  2. % feedforward
  3. net = cnnff(net, x);
  4. [~, h] = max(net.o);
  5. [~, a] = max(y);
  6. bad = find(h ~= a);
  7.  
  8. er = numel(bad) / size(y, 2);
  9. end

7、cnntrain.m

  1. function net = cnntrain(net, x, y, opts)
  2. m = size(x, 3);
  3. numbatches = m / opts.batchsize;
  4. if rem(numbatches, 1) ~= 0
  5. error('numbatches not integer');
  6. end
  7. net.rL = [];
  8. for i = 1 : opts.numepochs
  9. disp(['epoch ' num2str(i) '/' num2str(opts.numepochs)]);
  10. tic;
  11. kk = randperm(m);
  12. for l = 1 : numbatches
  13. batch_x = x(:, :, kk((l - 1) * opts.batchsize + 1 : l * opts.batchsize));
  14. batch_y = y(:, kk((l - 1) * opts.batchsize + 1 : l * opts.batchsize));
  15.  
  16. net = cnnff(net, batch_x);
  17. net = cnnbp(net, batch_y);
  18. net = cnnapplygrads(net, opts);
  19. if isempty(net.rL)
  20. net.rL(1) = net.L;
  21. end
  22. net.rL(end + 1) = 0.99 * net.rL(end) + 0.01 * net.L;
  23. end
  24. toc;
  25. end
  26.  
  27. end

8、test_example_CNN.m

  1. function test_example_CNN
  2. load mnist_uint8;
  3.  
  4. train_x = double(reshape(train_x',28,28,60000))/255;
  5. test_x = double(reshape(test_x',28,28,10000))/255;
  6. train_y = double(train_y');
  7. test_y = double(test_y');
  8.  
  9. %% ex1 Train a 6c-2s-12c-2s Convolutional neural network
  10. %will run 1 epoch in about 200 second and get around 11% error.
  11. %With 100 epochs you'll get around 1.2% error
  12.  
  13. rand('state',0)
  14.  
  15. cnn.layers = {
  16. struct('type', 'i') %input layer
  17. struct('type', 'c', 'outputmaps', 6, 'kernelsize', 5) %convolution layer
  18. struct('type', 's', 'scale', 2) %sub sampling layer
  19. struct('type', 'c', 'outputmaps', 12, 'kernelsize', 5) %convolution layer
  20. struct('type', 's', 'scale', 2) %subsampling layer
  21. };
  22.  
  23. opts.alpha = 1;
  24. opts.batchsize = 50;
  25. opts.numepochs = 1;
  26.  
  27. cnn = cnnsetup(cnn, train_x, train_y);
  28. cnn = cnntrain(cnn, train_x, train_y, opts);
  29.  
  30. [er, bad] = cnntest(cnn, test_x, test_y);
  31. er
  32. %plot mean squared error
  33. figure; plot(cnn.rL);
  34. assert(er<0.12, 'Too big error');
  1.  
  1.  

注:另外还有CNN具体MATLAB实现代码及详解请參照zouxy09的博客:http://blog.csdn.net/zouxy09/article/details/9993743/ 该作者博客里解释的非常具体,另外作者还写了非常多关于深度学习的笔记,都写得非常棒。在此对其表示膜拜和感谢。

卷积神经网络(CNN)的训练及代码实现的更多相关文章

  1. 深度学习之卷积神经网络(CNN)详解与代码实现(一)

    卷积神经网络(CNN)详解与代码实现 本文系作者原创,转载请注明出处:https://www.cnblogs.com/further-further-further/p/10430073.html 目 ...

  2. 深度学习之卷积神经网络(CNN)详解与代码实现(二)

    用Tensorflow实现卷积神经网络(CNN) 本文系作者原创,转载请注明出处:https://www.cnblogs.com/further-further-further/p/10737065. ...

  3. 【转载】 深度学习之卷积神经网络(CNN)详解与代码实现(一)

    原文地址: https://www.cnblogs.com/further-further-further/p/10430073.html ------------------------------ ...

  4. 深度学习之卷积神经网络CNN及tensorflow代码实例

    深度学习之卷积神经网络CNN及tensorflow代码实例 什么是卷积? 卷积的定义 从数学上讲,卷积就是一种运算,是我们学习高等数学之后,新接触的一种运算,因为涉及到积分.级数,所以看起来觉得很复杂 ...

  5. 深度学习之卷积神经网络CNN及tensorflow代码实现示例

    深度学习之卷积神经网络CNN及tensorflow代码实现示例 2017年05月01日 13:28:21 cxmscb 阅读数 151413更多 分类专栏: 机器学习 深度学习 机器学习   版权声明 ...

  6. 【深度学习系列】手写数字识别卷积神经--卷积神经网络CNN原理详解(一)

    上篇文章我们给出了用paddlepaddle来做手写数字识别的示例,并对网络结构进行到了调整,提高了识别的精度.有的同学表示不是很理解原理,为什么传统的机器学习算法,简单的神经网络(如多层感知机)都可 ...

  7. 【深度学习系列】卷积神经网络CNN原理详解(一)——基本原理

    上篇文章我们给出了用paddlepaddle来做手写数字识别的示例,并对网络结构进行到了调整,提高了识别的精度.有的同学表示不是很理解原理,为什么传统的机器学习算法,简单的神经网络(如多层感知机)都可 ...

  8. 深度学习之卷积神经网络CNN

    转自:https://blog.csdn.net/cxmscb/article/details/71023576 一.CNN的引入 在人工的全连接神经网络中,每相邻两层之间的每个神经元之间都是有边相连 ...

  9. 卷积神经网络CNN原理以及TensorFlow实现

    在知乎上看到一段介绍卷积神经网络的文章,感觉讲的特别直观明了,我整理了一下.首先介绍原理部分. [透析] 卷积神经网络CNN究竟是怎样一步一步工作的? 通过一个图像分类问题介绍卷积神经网络是如何工作的 ...

随机推荐

  1. rabbitMQ 基本概念

    RabbitMQ 整体上是一个生产者与消费者模型,主要负责接收.存储和转发消息.可以把消 息传递的过程想象成:当你将一个包裹送到邮局,邮局会暂存并最终将邮件通过邮递员送到收件人的手上, RabbitM ...

  2. atitit.guice3 绑定方式打总结生成非单例对象toInstance toProvider区别 v2 pb29

    atitit.guice3 绑定方式打总结生成非单例对象toInstance toProvider区别 v2 pb29 1. 三 绑定方式的介绍1 2. To接口,链式绑定,用的最多的1 3. toC ...

  3. Atitit.attilax的 case list 项目经验 案例列表

    Atitit.attilax的 case list 项目经验 案例列表 1. Atian inputmethod 输入法3 2. Ati desktop engine桌面引擎3 3. Acc资金账户系 ...

  4. 关于public class

    初学问题:“The public type movietestdive must be defined in its own file” 对于一个class里,只能出现一个public class(公 ...

  5. _T("") vs L 到底用谁?L!

    一直没有注意这个,今天突然纠结起来这个问题,代码写多了,难免这两个混用. 现在是时候有个结论了: 如果你的工程是unicode编译,那么请明确的使用L! 如果是多字节(ansi),那么请使用_T(&q ...

  6. OC-7-内存管理

    课程要点: 内存管理的必要性 MRC(手动管理) 自动释放池 ARC是怎么对内存进行管理的 内存管理的必要性 OC是一门面向对象的语言,在软件运行过程中会创造大量的对象,每创建一个对象系统就会给其分配 ...

  7. 配置linux服务器和pycharm的连接

    1.打开pyCharm Tools->Deployment->Configuratio Connecion ->Root Path: /home/admin/application/ ...

  8. Phaser 桌面和手机游戏HTML5框架

    Phaser是一个流行的2D开源游戏框架,可以用来开发桌面或手机浏览器HTML5游戏,适合侧视或顶视风格: Phaser同时支持Canvas和WebGL渲染引擎,预置了完备的精灵动画.输入 管理.瓦片 ...

  9. hive0.13.1安装-mysql server作为hive的metastore

    hive0.13.1在hadoop2.4.1伪分布式部署上安装过程 环境:redhat enterprice 6.5 +hadoop2.4.1+hive0.13.1+mysql单节点伪分布式部署 相关 ...

  10. Tesseract–OCR 库原理探索

    一,简介: Tesseract is probably the most accurate open source OCR engine available. Combined with the Le ...