（六）6.16 Neurons Networks linear decoders and its implements

Sparse AutoEncoder是一个三层结构的网络，分别为输入输出与隐层，前边自编码器的描述可知，神经网络中的神经元都采用相同的激励函数，Linear Decoders 修改了自编码器的定义，对输出层与隐层采用了不用的激励函数，所以 Linear Decoder 得到的模型更容易应用，而且对模型的参数变化有更高的鲁棒性。

在网络中的前向传导过程中的公式：

其中 a⁽³⁾ 是输出. 在自编码器中, a⁽³⁾ 近似重构了输入 x = a⁽¹⁾。

对于最后一层为 sigmod(tanh) 激活函数的 autoencoder ，会直接将数据归一化到 [0,1] ，所以当 f(z⁽³⁾) 采用 sigmod(tanh) 函数时，就要对输入限制或缩放，使其位于 [0,1] 范围中。但是对于输入数据 x ，比如 MNIST，但是很难满足 x 也在 [0,1] 的要求。比如， PCA 白化处理的输入并不满足 [0,1] 范围要求。

另 a⁽³⁾ = z⁽³⁾ 可以很简单的解决上述问题。即在输出端使用恒等函数 f(z) = z 作为激励函数，于是有 a⁽³⁾ = f(z⁽³⁾) = z⁽³⁾。该特殊的激励函数叫做 线性激励 (恒等激励)函数。

Linear Decoder 中隐含层的神经元依然使用 sigmod（tanh）激励函数。隐含单元的激励公式为 ,其中是 S 型函数, x 是入, W⁽¹⁾ 和 b⁽¹⁾ 分别是隐单元的权重和偏差项。即仅在输出层中使用线性激励函数。这用一个 S 型或 tanh 隐含层以及线性输出层构成的自编码器，叫做线性解码器。

在线性解码器中，。因为输出是隐单元激励输出的线性函数，改变 W⁽²⁾ ，即可使输出值 a⁽³⁾ 大于 1 或者小于 0。这样就可以避免在 sigmod 对输出层的值缩放到 [0,1] 。

随着输出单元的激励函数的改变，输出单元的梯度也相应变化。之前每一个输出单元误差项定义为：

其中 y = x 是所期望的输出, 是自编码器的输出, 是激励函数.因为在输出层激励函数为 f(z) = z, 这样 f'(z) = 1，所以上述公式可以简化为

当然，若使用反向传播算法来计算隐含层的误差项时:

因为隐含层采用一个 S 型（或 tanh）的激励函数 f,在上述公式中，依然是 S 型（或 tanh）函数的导数。即Linear Decoder中只有输出层残差是不同于autoencoder 的。

Liner Decoder 代码：

%% CS294A/CS294W Linear Decoder Exercise

%  Instructions

%  ------------

%

%  This file contains code that helps you get started on the

%  linear decoder exericse. For this exercise, you will only need to modify

%  the code in sparseAutoencoderLinearCost.m. You will not need to modify

%  any code in this file.

%%======================================================================

%% STEP 0: Initialization

%  Here we initialize some parameters used for the exercise.

imageChannels = 3;     % number of channels (rgb, so 3)

patchDim   = 8;          % patch dimension(需要 8*8 的小patches)

numPatches = 100000;   % number of patches

% 把8 * 8 * rgb_size 的小patchs 共同作为可见层的unit数目

visibleSize = patchDim * patchDim * imageChannels;  % number of input units

outputSize  = visibleSize;   % number of output units

hiddenSize  = 400;           % number of hidden units 

sparsityParam = 0.035; % desired average activation of the hidden units.

lambda = 3e-3;         % weight decay parameter

beta = 5;              % weight of sparsity penalty term       

epsilon = 0.1;	       % epsilon for ZCA whitening

%%======================================================================

%% STEP 1: Create and modify sparseAutoencoderLinearCost.m to use a linear decoder,

%          and check gradients

%  You should copy sparseAutoencoderCost.m from your earlier exercise

%  and rename it to sparseAutoencoderLinearCost.m.

%  Then you need to rename the function from sparseAutoencoderCost to

%  sparseAutoencoderLinearCost, and modify it so that the sparse autoencoder

%  uses a linear decoder instead. Once that is done, you should check

% your gradients to verify that they are correct.

% NOTE: Modify sparseAutoencoderCost first!

% To speed up gradient checking, we will use a reduced network and some

% dummy patches

debugHiddenSize = 5;

debugvisibleSize = 8;

patches = rand([8 10]);

theta = initializeParameters(debugHiddenSize, debugvisibleSize); 

[cost, grad] = sparseAutoencoderLinearCost(theta, debugvisibleSize, debugHiddenSize, ...

                                           lambda, sparsityParam, beta, ...

                                           patches);

% Check gradients

numGrad = computeNumericalGradient( @(x) sparseAutoencoderLinearCost(x, debugvisibleSize, debugHiddenSize, ...

                                                  lambda, sparsityParam, beta, ...

                                                  patches), theta);

% Use this to visually compare the gradients side by side

disp([numGrad grad]); 

diff = norm(numGrad-grad)/norm(numGrad+grad);

% Should be small. In our implementation, these values are usually less than 1e-9.

disp(diff); 

assert(diff < 1e-9, 'Difference too large. Check your gradient computation again');

% NOTE: Once your gradients check out, you should run step 0 again to

%       reinitialize the parameters

%}

%%======================================================================

%% STEP 2: Learn features on small patches

%  In this step, you will use your sparse autoencoder (which now uses a

%  linear decoder) to learn features on small patches sampled from related

%  images.

%% STEP 2a: Load patches

%  In this step, we load 100k patches sampled from the STL10 dataset and

%  visualize them. Note that these patches have been scaled to [0,1]

load stlSampledPatches.mat

displayColorNetwork(patches(:, 1:100));

%% STEP 2b: Apply preprocessing

%  In this sub-step, we preprocess the sampled patches, in particular,

%  ZCA whitening them.

%

%  In a later exercise on convolution and pooling, you will need to replicate

%  exactly the preprocessing steps you apply to these patches before

%  using the autoencoder to learn features on them. Hence, we will save the

%  ZCA whitening and mean image matrices together with the learned features

%  later on.

% Subtract mean patch (hence zeroing the mean of the patches)

meanPatch = mean(patches, 2);

patches = bsxfun(@minus, patches, meanPatch);% - mean

% Apply ZCA whitening

sigma = patches * patches' / numPatches;

[u, s, v] = svd(sigma);

%一下是打算对数据做ZCA变换，数据需要做的变换的矩阵

ZCAWhite = u * diag(1 ./ sqrt(diag(s) + epsilon)) * u';

%这一步是ZCA变换

patches = ZCAWhite * patches;

displayColorNetwork(patches(:, 1:100));

%% STEP 2c: Learn features

%  You will now use your sparse autoencoder (with linear decoder) to learn

%  features on the preprocessed patches. This should take around 45 minutes.

theta = initializeParameters(hiddenSize, visibleSize);

% Use minFunc to minimize the function

addpath minFunc/

options = struct;

options.Method = 'lbfgs';

options.maxIter = 400;

options.display = 'on';

[optTheta, cost] = minFunc( @(p) sparseAutoencoderLinearCost(p, ...

                                   visibleSize, hiddenSize, ...

                                   lambda, sparsityParam, ...

                                   beta, patches), ...

                              theta, options);

% Save the learned features and the preprocessing matrices for use in

% the later exercise on convolution and pooling

fprintf('Saving learned features and preprocessing matrices...\n');

save('STL10Features.mat', 'optTheta', 'ZCAWhite', 'meanPatch');

fprintf('Saved\n');

%% STEP 2d: Visualize learned features

%这里为什么要用(W*ZCAWhite)'呢？首先，使用W*ZCAWhite是因为每个样本x输入网络，

%其输出等价于W*ZCAWhite*x；另外，由于W*ZCAWhite的每一行才是一个隐含节点的变换值

%而displayColorNetwork函数是把每一列显示一个小图像块的，所以需要对其转置。

W = reshape(optTheta(1:visibleSize * hiddenSize), hiddenSize, visibleSize);

b = optTheta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);

displayColorNetwork( (W*ZCAWhite)');

function [cost,grad,features] = sparseAutoencoderLinearCost(theta, visibleSize, hiddenSize, ...

                                                            lambda, sparsityParam, beta, data)

% -------------------- YOUR CODE HERE --------------------

% Instructions:

%   Copy sparseAutoencoderCost in sparseAutoencoderCost.m from your

%   earlier exercise onto this file, renaming the function to

%   sparseAutoencoderLinearCost, and changing the autoencoder to use a

%   linear decoder.

% -------------------- YOUR CODE HERE --------------------     

%将数据由向量转化为矩阵：

W1 = reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);

W2 = reshape(theta(hiddenSize*visibleSize+1:2*hiddenSize*visibleSize), visibleSize, hiddenSize);

b1 = theta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);

b2 = theta(2*hiddenSize*visibleSize+hiddenSize+1:end);                               

%样本数

m = size(data ,2); 

 %%%%%%%%%%% forward %%%%%%%%%%%

z2 = W1*data + repmat(b1, [1,m]);

a2 = f(z2);

z3 = W2*a2   + repmat(b2, [1,m]);

a3 = z3;

%求当前网络的平均激活度

rho_hat = mean(a2 ,2);

rho = sparsityParam;

%对隐层所有节点的散度求和。

KL_Divergence = sum(rho * log(rho ./ rho_hat) + log((1- rho) ./ (1-rho_hat)));

squares = (a3- data).^2;

J_square_err = (1/2)*(1/m)* sum(squares(:));

J_weight_decay = (lambd/2)*(sum(W1(:).^2) + sum(W2(:).^2));

J_sparsity = beta * KL_Divergence;

cost = J_square_err + J_weight_decay + J_sparsity;

%%%%%%%%%%% backward %%%%%%%%%%%

delta3 = -(data-a3);% 注意  linear decoder

beta_term = beta * (- rho ./ rho_hat + (1-rho) ./ (1-rho_hat));

delta2 = (W2' * delta3) * repmat(beta_term, [1,m]) .* a2 .*(1-a2);

W2grad = (1/m) * delta3 * a2' + lambda * W2;

b2grad = (1/m) * sum(delta3, 2);

W1grad = (1/m) * delta2 * data' + lambda * W1;

b1grad = (1/m) * sum(delta2, 2);

%-------------------------------------------------------------------

% Convert weights and bias gradients to a compressed form

% This step will concatenate and flatten all your gradients to a vector

% which can be used in the optimization method.

grad = [W1grad(:) ; W2grad(:) ; b1grad(:) ; b2grad(:)];

end

%-------------------------------------------------------------------

% We are giving you the sigmoid function, you may find this function

% useful in your computation of the loss and the gradients.

function sigm = sigmoid(x)

    sigm = 1 ./ (1 + exp(-x));

end

（六）6.16 Neurons Networks linear decoders and its implements的更多相关文章

CS229 6.16 Neurons Networks linear decoders and its implements
Sparse AutoEncoder是一个三层结构的网络,分别为输入输出与隐层,前边自编码器的描述可知,神经网络中的神经元都采用相同的激励函数,Linear Decoders 修改了自编码器的定义,对 ...
(六) 6.1 Neurons Networks Representation
面对复杂的非线性可分的样本是,使用浅层分类器如Logistic等需要对样本进行复杂的映射,使得样本在映射后的空间是线性可分的,但在原始空间,分类边界可能是复杂的曲线.比如下图的样本只是在2维情形下的示 ...
(六) 6.2 Neurons Networks Backpropagation Algorithm
今天得主题是BP算法.大规模的神经网络可以使用batch gradient descent算法求解,也可以使用 stochastic gradient descent 算法,求解的关键问题在于求得每层 ...
(六) 6.3 Neurons Networks Gradient Checking
BP算法很难调试,一般情况下会隐隐存在一些小问题,比如(off-by-one error),即只有部分层的权重得到训练,或者忘记计算bais unit,这虽然会得到一个正确的结果,但效果差于准确BP得 ...
（六）6.10 Neurons Networks implements of softmax regression
softmax可以看做只有输入和输出的Neurons Networks,如下图: 其参数数量为k*(n+1) ,但在本实现中没有加入截距项,所以参数为k*n的矩阵. 对损失函数J(θ)的形式有: 算法 ...
CS229 6.10 Neurons Networks implements of softmax regression
softmax可以看做只有输入和输出的Neurons Networks,如下图: 其参数数量为k*(n+1) ,但在本实现中没有加入截距项,所以参数为k*n的矩阵. 对损失函数J(θ)的形式有: 算法 ...
CS229 6.1 Neurons Networks Representation
面对复杂的非线性可分的样本是,使用浅层分类器如Logistic等需要对样本进行复杂的映射,使得样本在映射后的空间是线性可分的,但在原始空间,分类边界可能是复杂的曲线.比如下图的样本只是在2维情形下的示 ...
（六）6.17 Neurons Networks convolutional neural network（cnn）
之前所讲的图像处理都是小 patchs ,比如28*28或者36*36之类,考虑如下情形,对于一副1000*1000的图像,即106,当隐层也有106节点时,那么W(1)的数量将达到1012级别,为了 ...
（六）6.15 Neurons Networks Deep Belief Networks
Hintion老爷子在06年的science上的论文里阐述了 RBMs 可以堆叠起来并且通过逐层贪婪的方式来训练,这种网络被称作Deep Belife Networks(DBN),DBN是一种可以学习 ...

随机推荐

（转）OpenCV 2.4.8 +VS2010的开发环境配置
转自: http://blog.csdn.net/poem_qianmo/article/details/19809337 自己可能需要再进行修改本系列文章由zhmxy555(毛星云)编写,转载请 ...
phantomjs + selenium headless test
1. 安装selenium pip install selenium 2. 安装phantomjs 如果你是Ubuntu12.04,默认安装的版本是1.4.这个会出错. 需要安装1.9.7 cd /u ...
Zen Coding 用法
html:5 或者 ! 生成 HTML5 结构html:xt 生成 HTML4 过渡型html:4s 生成 HTML4 严格型 E 元素名 (div, p);E#id 带id的元素 (div#cont ...
java cache过期策略两种实现，一个基于list轮询一个基于timer定时
最近项目要引入缓存机制,但是不想引入分布式的缓存框架,所以自己就写了一个轻量级的缓存实现,有两个版本,一个是通过timer实现其超时过期处理,另外一个是通过list轮询. 首先要了解下ja ...
Oracle ->> 变量赋值 Demo
刚学Oracle,学习学习别人的代码.这段代码时从下面的博文中摘取的:http://www.cnblogs.com/mq0036/p/4155774.html declare l_dept ; cur ...
Tomcat运行DOM4J的时候报ClassNotFoundException
WEB应用中一个模块用到了DOM4J,加载到TOMCAT中运行,报错如下(给出部分StackTrace): java.lang.ClassNotFoundException: org.dom4j.Do ...
uploadify+批量上传文件+java
<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding= ...
MyEclipse配置Tomcat 并编写第一个JSP程序
安装myeclipse之后配置tomcat服务器,在window里选择servers 选择tomcat的文件夹路径(我的是从别人那里考过来的文件夹) 选中上enable即可出现了这个界面在这里可以 ...
hdu 1829-A Bug's LIfe(简单带权并查集)
题意:Bug有两种性别,异性之间才交往, 让你根据数据判断是否存在同性恋,输入有 t 组数据,每组数据给出bug数量n, 和关系数m, 以下m行给出相交往的一对Bug编号 a, b.只需要判断有没有, ...
Hibernate 中update hql语句
今天在MySQL中用hibernate测试update语句发现以下问题: update语句竟然不去作用: 表机构如下: create table student(sid int primary key ...

（六）6.16 Neurons Networks linear decoders and its implements

（六）6.16 Neurons Networks linear decoders and its implements的更多相关文章

随机推荐

热门专题