CheeseZH: Stanford University: Machine Learning Ex4:Training Neural Network(Backpropagation Algorithm)

1. Feedforward and cost function;

$J(\theta)= \frac{1}{m}\sum_{i=1}^{m}\sum_{k=1}^{K}[-y_{k}^{(i)}log((h_{\theta}(x^{(i)}))_{k})-(1-y_{k}^{(i)})log(1-(h_{\theta}(x^{(i)}))_{k})]$

2.Regularized cost function:

$J(\theta) = \frac{1}{m}\sum_{i=1}^{m}\sum_{k=1}^{K}[-y_{k}^{(i)}log((h_{\theta}(x^{(i)}))_{k})-(1-y_{k}^{(i)})log(1-(h_{\theta}(x^{(i)}))_{k})]+\frac{\lambda}{2m}[\sum_{j=1}^{25}\sum_{k=1}^{400}(\Theta_{j,k}^{(1)})^{2}+\sum_{j=1}^{10}\sum_{k=1}^{25}(\Theta_{j,k}^{(2)})^{2}]$

3.Sigmoid gradient

The gradient for the sigmoid function can be computed as:

$g'(z)=\frac{d}{dz}g(z)=g(z)(1-g(z))$

where:

$sigmoid(z)=g(z)=\frac{1}{1+e^{-z}}$

4.Random initialization

randInitializeWeights.m

 function W = randInitializeWeights(L_in, L_out)

 %RANDINITIALIZEWEIGHTS Randomly initialize the weights of a layer with L_in

 %incoming connections and L_out outgoing connections

 %   W = RANDINITIALIZEWEIGHTS(L_in, L_out) randomly initializes the weights

 %   of a layer with L_in incoming connections and L_out outgoing

 %   connections.

 %

 %   Note that W should be set to a matrix of size(L_out,  + L_in) as

 %   the column row of W handles the "bias" terms

 %

 % You need to return the following variables correctly

 W = zeros(L_out,  + L_in);

 % ====================== YOUR CODE HERE ======================

 % Instructions: Initialize W randomly so that we break the symmetry while

 %               training the neural network.

 %

 % Note: The first row of W corresponds to the parameters for the bias units

 %

 epsilon_init = 0.12;

 W = rand(L_out,  + L_in) *  * epsilon_init - epsilon_init;

 % =========================================================================

 end

5.Backpropagation(using a for-loop for t=1:m and place steps 1-4 below inside the for-loop), with the t^th iteration perfoming the calculation on the t^th training example(x^(t),y^(t)).Step 5 will divide the accumulated gradients by m to obtain the gradients for the neural network cost function.

(1) Set the input layer's values(a⁽¹⁾) to the t-th training example x^(t). Perform a feedforward pass, computing the activations(z⁽²⁾,a⁽²⁾,z⁽³⁾,a⁽³⁾) for layers 2 and 3.

(2) For each output unit k in layer 3(the output layer), set :

$\delta_{k}^{(3)}=(a_{k}^{(3)}-y_{k})$

where y_k = 1 or 0.

(3)For the hidden layer l=2, set:

$\delta^{(2)}=(\Theta^{(2)})^{T}\delta^{(3)}.*g'(z^{(2)})$

(4) Accumulate the gradient from this example using the following formula. Note that you should skip or remove δ₀(2).

$\Delta^{(l)}=\Delta^{(l)}+\delta^{(l+1)}(a^{\text{(l)}})^{T}$

(5) Obtain the(unregularized) gradient for the neural network cost function by dividing the accumulated gradients by 1/m:

$\frac{\partial}{\partial\Theta_{ij}^{(l)}}J(\Theta)=D_{ij}^{(l)}=\frac{1}{m}\Delta_{ij}^{(l)}$

nnCostFunction.m

 function [J grad] = nnCostFunction(nn_params, ...

                                    input_layer_size, ...

                                    hidden_layer_size, ...

                                    num_labels, ...

                                    X, y, lambda)

 %NNCOSTFUNCTION Implements the neural network cost function for a two layer

 %neural network which performs classification

 %   [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...

 %   X, y, lambda) computes the cost and gradient of the neural network. The

 %   parameters for the neural network are "unrolled" into the vector

 %   nn_params and need to be converted back into the weight matrices.

 %

 %   The returned parameter grad should be a "unrolled" vector of the

 %   partial derivatives of the neural network.

 %

 % Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices

 % for our  layer neural network

 Theta1 = reshape(nn_params(:hidden_layer_size * (input_layer_size + )), ...

                  hidden_layer_size, (input_layer_size + ));

 Theta2 = reshape(nn_params(( + (hidden_layer_size * (input_layer_size + ))):end), ...

                  num_labels, (hidden_layer_size + ));

 % Setup some useful variables

 m = size(X, );

 % You need to return the following variables correctly

 J = ;

 Theta1_grad = zeros(size(Theta1));

 Theta2_grad = zeros(size(Theta2));

 % ====================== YOUR CODE HERE ======================

 % Instructions: You should complete the code by working through the

 %               following parts.

 %

 % Part : Feedforward the neural network and return the cost in the

 %         variable J. After implementing Part , you can verify that your

 %         cost function computation is correct by verifying the cost

 %         computed in ex4.m

 %

 % Part : Implement the backpropagation algorithm to compute the gradients

 %         Theta1_grad and Theta2_grad. You should return the partial derivatives of

 %         the cost function with respect to Theta1 and Theta2 in Theta1_grad and

 %         Theta2_grad, respectively. After implementing Part , you can check

 %         that your implementation is correct by running checkNNGradients

 %

 %         Note: The vector y passed into the function is a vector of labels

 %               containing values from ..K. You need to map this vector into a

 %               binary vector of 's and 0's to be used with the neural network

 %               cost function.

 %

 %         Hint: We recommend implementing backpropagation using a for-loop

 %               over the training examples if you are implementing it for the

 %               first time.

 %

 % Part : Implement regularization with the cost function and gradients.

 %

 %         Hint: You can implement this around the code for

 %               backpropagation. That is, you can compute the gradients for

 %               the regularization separately and then add them to Theta1_grad

 %               and Theta2_grad from Part .

 %

 %Part

 %Theta1 has size *

 %Theta2 has size *

 %y hase size *

 K = num_labels;

 Y = eye(K)(y,:); %[ ]

 a1 = [ones(m,),X];%[ ]

 a2 = sigmoid(a1*Theta1'); %[5000 25]

 a2 = [ones(m,),a2];%[ ]

 h = sigmoid(a2*Theta2');%[5000 10]

 costPositive = -Y.*log(h);

 costNegtive = (-Y).*log(-h);

 cost = costPositive - costNegtive;

 J = (/m)*sum(cost(:));

 %Regularized

 Theta1Filtered = Theta1(:,:end); %[ ]

 Theta2Filtered = Theta2(:,:end); %[ ]

 reg = (lambda/(*m))*(sumsq(Theta1Filtered(:))+sumsq(Theta2Filtered(:)));

 J = J + reg;

 %Part

 Delta1 = ;

 Delta2 = ;

 for t=:m,

   %step

   a1 = [ X(t,:)]; %[ ]

   z2 = a1*Theta1'; %[1 25]

   a2 = [ sigmoid(z2)];%[ ]

   z3 = a2*Theta2'; %[1 10]

   a3 = sigmoid(z3); %[ ]

   %step

   yt = Y(t,:);%[ ]

   d3 = a3-yt; %[ ]

   %step

   %   [ ]  [ ]           [ ]

   d2 = (d3*Theta2Filtered).*sigmoidGradient(z2); %[ ]

   %step

   Delta1 = Delta1 + (d2'*a1);%[25 401]

   Delta2 = Delta2 + (d3'*a2);%[10 26]

 end;

 %step

 Theta1_grad = (/m)*Delta1;

 Theta2_grad = (/m)*Delta2;

 %Part

 Theta1_grad(:,:end) = Theta1_grad(:,:end) + ((lambda/m)*Theta1Filtered);

 Theta2_grad(:,:end) = Theta2_grad(:,:end) + ((lambda/m)*Theta2Filtered);

 % -------------------------------------------------------------

 % =========================================================================

 % Unroll gradients

 grad = [Theta1_grad(:) ; Theta2_grad(:)];

 end

6.Gradient checking

Let

$\theta^{(i+)}=\theta+\left[\begin{array}{c}0\\\vdots\\\epsilon\\\vdots\end{array}\right]$

and

$\theta^{(i-)}=\theta-\left[\begin{array}{c}0\\\vdots\\\epsilon\\\vdots\end{array}\right]$

for each i, that:

$f_{i}(\theta)\thickapprox\frac{J(\theta^{(i+)})-J(\theta^{(i-)})}{2\epsilon}$

computeNumericalGradient.m

 function numgrad = computeNumericalGradient(J, theta)

 %COMPUTENUMERICALGRADIENT Computes the gradient using "finite differences"

 %and gives us a numerical estimate of the gradient.

 %   numgrad = COMPUTENUMERICALGRADIENT(J, theta) computes the numerical

 %   gradient of the function J around theta. Calling y = J(theta) should

 %   return the function value at theta.

 % Notes: The following code implements numerical gradient checking, and

 %        returns the numerical gradient.It sets numgrad(i) to (a numerical

 %        approximation of) the partial derivative of J with respect to the

 %        i-th input argument, evaluated at theta. (i.e., numgrad(i) should

 %        be the (approximately) the partial derivative of J with respect

 %        to theta(i).)

 %                

 numgrad = zeros(size(theta));

 perturb = zeros(size(theta));

 e = 1e-;

 for p = :numel(theta)

     % Set perturbation vector

     perturb(p) = e;

     loss1 = J(theta - perturb);

     loss2 = J(theta + perturb);

     % Compute Numerical Gradient

     numgrad(p) = (loss2 - loss1) / (*e);

     perturb(p) = ;

 end

 end

7.Regularized Neural Networks

for j=0:

$\frac{\partial}{\partial\Theta_{ij}^{(l)}}J(\Theta)=D_{ij}^{(l)}=\frac{1}{m}\Delta_{ij}^{(l)}$

for j>=1:

$\frac{\partial}{\partial\Theta_{ij}^{(l)}}J(\Theta)=D_{ij}^{(l)}=\frac{1}{m}\Delta_{ij}^{(l)}+\frac{\lambda}{m}\Theta_{ij}^{(l)}$

别人的代码：

https://github.com/jcgillespie/Coursera-Machine-Learning/tree/master/ex4

CheeseZH: Stanford University: Machine Learning Ex4:Training Neural Network(Backpropagation Algorithm)的更多相关文章

CheeseZH: Stanford University: Machine Learning Ex3: Multiclass Logistic Regression and Neural Network Prediction
Handwritten digits recognition (0-9) Multi-class Logistic Regression 1. Vectorizing Logistic Regress ...
CheeseZH: Stanford University: Machine Learning Ex5:Regularized Linear Regression and Bias v.s. Variance
源码:https://github.com/cheesezhe/Coursera-Machine-Learning-Exercise/tree/master/ex5 Introduction: In ...
CheeseZH: Stanford University: Machine Learning Ex2:Logistic Regression
1. Sigmoid Function In Logisttic Regression, the hypothesis is defined as: where function g is the s ...
CheeseZH: Stanford University: Machine Learning Ex1:Linear Regression
(1) How to comput the Cost function in Univirate/Multivariate Linear Regression; (2) How to comput t ...
Machine Learning, Homework 9, Neural Nets
Machine Learning, Homework 9, Neural NetsApril 15, 2019ContentsBoston Housing with a Single Layer an ...
【MetaPruning】2019-ICCV-MetaPruning Meta Learning for Automatic Neural Network Channel Pruning-论文阅读
MetaPruning 2019-ICCV-MetaPruning Meta Learning for Automatic Neural Network Channel Pruning Zechun ...
MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning
MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning 2019-08-11 19:48:17 Paper: h ...
Stanford CS229 Machine Learning by Andrew Ng
CS229 Machine Learning Stanford Course by Andrew Ng Course material, problem set Matlab code written ...
Machine Learning No.5: Neural networks
1. advantage: when number of features is too large, so previous algorithm is not a good way to learn ...

随机推荐

【8.13校内测试】【DP】【按除数分类】【二分】
感觉今天状态不太好啊一大早就很困,t1卡得有点久,以为三道题都是这个难度,结果难度完全是倒着排的啊!!在dp和数学上还得多练题!! 很像背包的一道DP??先不考虑树的结构,给每个点都先分配一个度数,剩 ...
redis缓存穿透解决办法--排它锁
webpack vuejs 和 vue-router 如何使用?
读本文之前,建议对webpack和vuejs有初步的了解,通过webpack的官网和vuejs的中文官网了解即可网站主要目录://某些文件不一定全部罗列出来,注意观察 vue-wepack -src ...
Python的静态方法和类成员方法
http://www.cnblogs.com/2gua/archive/2012/09/03/2668125.html Python的静态方法和类成员方法都可以被类或实例访问,两者概念不容易理清,但还 ...
D-U-N-S申请流程
1.打开D-U-N-S官网 http://fedgov.dnb.com/webform 图一 2.占击页面的“Click here to request your ......” (如图一红框所示)进 ...
MVC扩展Filter,通过继承ActionFilterAttribute为登录密码加密
与ActionFilter相关的接口有2个: □ IActionFilter 对action执行前后处理 void OnActionExecuting(ActionExecutingContext f ...
java内存模型知识点汇总
1.像windows/linux这种操作系统中,自带jvm么?以方便java程序的运行? 答:是的,一般操作系统都自带jvm的.但不带jdk,也就是说java的运行环境有,但编译环境没有. 1.jav ...
将asi-http-request引入到ARC工程需要做的转
在发表前,容许我发两句牢骚,太折腾了.但结合网路上各种大侠的答案相助,最终我终于可以在ARC项目使用该库了. 1.需要下载asi-http-request这个包.https://github.com/ ...
UIScrollView视差模糊效果
UIScrollView视差模糊效果效果源码 https://github.com/YouXianMing/Animations // // ScrollBlurImageViewControll ...
java.lang.UnsatisfiedLinkError：no dll in java.library.path终极解决之道
Java调用Dll时,会出现no dll in java.library.path异常,在Java Project中不常见,因为只要将Dll拷贝到system32目录下即可: 但若是 ...

CheeseZH: Stanford University: Machine Learning Ex4:Training Neural Network(Backpropagation Algorithm)

CheeseZH: Stanford University: Machine Learning Ex4:Training Neural Network(Backpropagation Algorithm)的更多相关文章

随机推荐

热门专题