CS229 6.16 Neurons Networks linear decoders and its implements

Sparse AutoEncoder是一个三层结构的网络，分别为输入输出与隐层，前边自编码器的描述可知，神经网络中的神经元都采用相同的激励函数，Linear Decoders 修改了自编码器的定义，对输出层与隐层采用了不用的激励函数，所以 Linear Decoder 得到的模型更容易应用，而且对模型的参数变化有更高的鲁棒性。

在网络中的前向传导过程中的公式：

其中 a⁽³⁾ 是输出. 在自编码器中, a⁽³⁾ 近似重构了输入 x = a⁽¹⁾。

对于最后一层为 sigmod(tanh) 激活函数的 autoencoder ，会直接将数据归一化到 [0,1] ，所以当 f(z⁽³⁾) 采用 sigmod(tanh) 函数时，就要对输入限制或缩放，使其位于 [0,1] 范围中。但是对于输入数据 x ，比如 MNIST，但是很难满足 x 也在 [0,1] 的要求。比如， PCA 白化处理的输入并不满足 [0,1] 范围要求。

另 a⁽³⁾ = z⁽³⁾ 可以很简单的解决上述问题。即在输出端使用恒等函数 f(z) = z 作为激励函数，于是有 a⁽³⁾ = f(z⁽³⁾) = z⁽³⁾。该特殊的激励函数叫做 线性激励 (恒等激励)函数。

Linear Decoder 中隐含层的神经元依然使用 sigmod（tanh）激励函数。隐含单元的激励公式为 ,其中是 S 型函数, x 是入, W⁽¹⁾ 和 b⁽¹⁾ 分别是隐单元的权重和偏差项。即仅在输出层中使用线性激励函数。这用一个 S 型或 tanh 隐含层以及线性输出层构成的自编码器，叫做线性解码器。

在线性解码器中，。因为输出是隐单元激励输出的线性函数，改变 W⁽²⁾ ，即可使输出值 a⁽³⁾ 大于 1 或者小于 0。这样就可以避免在 sigmod 对输出层的值缩放到 [0,1] 。

随着输出单元的激励函数的改变，输出单元的梯度也相应变化。之前每一个输出单元误差项定义为：

其中 y = x 是所期望的输出, 是自编码器的输出, 是激励函数.因为在输出层激励函数为 f(z) = z, 这样 f'(z) = 1，所以上述公式可以简化为

当然，若使用反向传播算法来计算隐含层的误差项时:

因为隐含层采用一个 S 型（或 tanh）的激励函数 f,在上述公式中，依然是 S 型（或 tanh）函数的导数。即Linear Decoder中只有输出层残差是不同于autoencoder 的。

Liner Decoder 代码：

%% CS294A/CS294W Linear Decoder Exercise

%  Instructions

%  ------------

%

%  This file contains code that helps you get started on the

%  linear decoder exericse. For this exercise, you will only need to modify

%  the code in sparseAutoencoderLinearCost.m. You will not need to modify

%  any code in this file.

%%======================================================================

%% STEP : Initialization

%  Here we initialize some parameters used for the exercise.

imageChannels = ;     % number of channels (rgb, so 3)

patchDim   = ;          % patch dimension(需要 8*8 的小patches)

numPatches = ;   % number of patches

% 把8 *  * rgb_size 的小patchs 共同作为可见层的unit数目

visibleSize = patchDim * patchDim * imageChannels;  % number of input units

outputSize  = visibleSize;   % number of output units

hiddenSize  = ;           % number of hidden units

sparsityParam = .; % desired average activation of the hidden units.

lambda = 3e-;         % weight decay parameter

beta = ;              % weight of sparsity penalty term      

epsilon = .;         % epsilon for ZCA whitening

%%======================================================================

%% STEP : Create and modify sparseAutoencoderLinearCost.m to use a linear decoder,

%          and check gradients

%  You should copy sparseAutoencoderCost.m from your earlier exercise

%  and rename it to sparseAutoencoderLinearCost.m.

%  Then you need to rename the function from sparseAutoencoderCost to

%  sparseAutoencoderLinearCost, and modify it so that the sparse autoencoder

%  uses a linear decoder instead. Once that is done, you should check

% your gradients to verify that they are correct.

% NOTE: Modify sparseAutoencoderCost first!

% To speed up gradient checking, we will use a reduced network and some

% dummy patches

debugHiddenSize = ;

debugvisibleSize = ;

patches = rand([ ]);

theta = initializeParameters(debugHiddenSize, debugvisibleSize);

[cost, grad] = sparseAutoencoderLinearCost(theta, debugvisibleSize, debugHiddenSize, ...

                                           lambda, sparsityParam, beta, ...

                                           patches);

% Check gradients

numGrad = computeNumericalGradient( @(x) sparseAutoencoderLinearCost(x, debugvisibleSize, debugHiddenSize, ...

                                                  lambda, sparsityParam, beta, ...

                                                  patches), theta);

% Use this to visually compare the gradients side by side

disp([numGrad grad]);

diff = norm(numGrad-grad)/norm(numGrad+grad);

% Should be small. In our implementation, these values are usually less than 1e-.

disp(diff);

assert(diff < 1e-, 'Difference too large. Check your gradient computation again');

% NOTE: Once your gradients check out, you should run step  again to

%       reinitialize the parameters

%}

%%======================================================================

%% STEP : Learn features on small patches

%  In this step, you will use your sparse autoencoder (which now uses a

%  linear decoder) to learn features on small patches sampled from related

%  images.

%% STEP 2a: Load patches

%  In this step, we load 100k patches sampled from the STL10 dataset and

%  visualize them. Note that these patches have been scaled to [,]

load stlSampledPatches.mat

displayColorNetwork(patches(:, :));

%% STEP 2b: Apply preprocessing

%  In this sub-step, we preprocess the sampled patches, in particular,

%  ZCA whitening them.

%

%  In a later exercise on convolution and pooling, you will need to replicate

%  exactly the preprocessing steps you apply to these patches before

%  using the autoencoder to learn features on them. Hence, we will save the

%  ZCA whitening and mean image matrices together with the learned features

%  later on.

% Subtract mean patch (hence zeroing the mean of the patches)

meanPatch = mean(patches, );

patches = bsxfun(@minus, patches, meanPatch);% - mean

% Apply ZCA whitening

sigma = patches * patches' / numPatches;

[u, s, v] = svd(sigma);

%一下是打算对数据做ZCA变换，数据需要做的变换的矩阵

ZCAWhite = u * diag(1 ./ sqrt(diag(s) + epsilon)) * u';

%这一步是ZCA变换

patches = ZCAWhite * patches;

displayColorNetwork(patches(:, :));

%% STEP 2c: Learn features

%  You will now use your sparse autoencoder (with linear decoder) to learn

%  features on the preprocessed patches. This should take around  minutes.

theta = initializeParameters(hiddenSize, visibleSize);

% Use minFunc to minimize the function

addpath minFunc/

options = struct;

options.Method = 'lbfgs';

options.maxIter = ;

options.display = 'on';

[optTheta, cost] = minFunc( @(p) sparseAutoencoderLinearCost(p, ...

                                   visibleSize, hiddenSize, ...

                                   lambda, sparsityParam, ...

                                   beta, patches), ...

                              theta, options);

% Save the learned features and the preprocessing matrices for use in

% the later exercise on convolution and pooling

fprintf('Saving learned features and preprocessing matrices...\n');

save('STL10Features.mat', 'optTheta', 'ZCAWhite', 'meanPatch');

fprintf('Saved\n');

%% STEP 2d: Visualize learned features

%这里为什么要用(W*ZCAWhite)'呢？首先，使用W*ZCAWhite是因为每个样本x输入网络，

%其输出等价于W*ZCAWhite*x；另外，由于W*ZCAWhite的每一行才是一个隐含节点的变换值

%而displayColorNetwork函数是把每一列显示一个小图像块的，所以需要对其转置。

W = reshape(optTheta(1:visibleSize * hiddenSize), hiddenSize, visibleSize);

b = optTheta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);

displayColorNetwork( (W*ZCAWhite)');

function [cost,grad,features] = sparseAutoencoderLinearCost(theta, visibleSize, hiddenSize, ...

                                                            lambda, sparsityParam, beta, data)

% -------------------- YOUR CODE HERE --------------------

% Instructions:

%   Copy sparseAutoencoderCost in sparseAutoencoderCost.m from your

%   earlier exercise onto this file, renaming the function to

%   sparseAutoencoderLinearCost, and changing the autoencoder to use a

%   linear decoder.

% -------------------- YOUR CODE HERE --------------------    

%将数据由向量转化为矩阵：

W1 = reshape(theta(:hiddenSize*visibleSize), hiddenSize, visibleSize);

W2 = reshape(theta(hiddenSize*visibleSize+:*hiddenSize*visibleSize), visibleSize, hiddenSize);

b1 = theta(*hiddenSize*visibleSize+:*hiddenSize*visibleSize+hiddenSize);

b2 = theta(*hiddenSize*visibleSize+hiddenSize+:end);                              

%样本数

m = size(data ,);

 %%%%%%%%%%% forward %%%%%%%%%%%

z2 = W1*data + repmat(b1, [,m]);

a2 = f(z2);

z3 = W2*a2   + repmat(b2, [,m]);

a3 = z3;

%求当前网络的平均激活度

rho_hat = mean(a2 ,);

rho = sparsityParam;

%对隐层所有节点的散度求和。

KL_Divergence = sum(rho * log(rho ./ rho_hat) + log((- rho) ./ (-rho_hat)));

squares = (a3- data).^;

J_square_err = (/)*(/m)* sum(squares(:));

J_weight_decay = (lambd/)*(sum(W1(:).^) + sum(W2(:).^));

J_sparsity = beta * KL_Divergence;

cost = J_square_err + J_weight_decay + J_sparsity;

%%%%%%%%%%% backward %%%%%%%%%%%

delta3 = -(data-a3);% 注意  linear decoder

beta_term = beta * (- rho ./ rho_hat + (-rho) ./ (-rho_hat));

delta2 = (W2' * delta3) * repmat(beta_term, [1,m]) .* a2 .*(1-a2);

W2grad = (1/m) * delta3 * a2' + lambda * W2;

b2grad = (/m) * sum(delta3, );

W1grad = (/m) * delta2 * data' + lambda * W1;

b1grad = (1/m) * sum(delta2, 2);

%-------------------------------------------------------------------

% Convert weights and bias gradients to a compressed form

% This step will concatenate and flatten all your gradients to a vector

% which can be used in the optimization method.

grad = [W1grad(:) ; W2grad(:) ; b1grad(:) ; b2grad(:)];

end

%-------------------------------------------------------------------

% We are giving you the sigmoid function, you may find this function

% useful in your computation of the loss and the gradients.

function sigm = sigmoid(x)

    sigm = 1 ./ (1 + exp(-x));

end

CS229 6.16 Neurons Networks linear decoders and its implements的更多相关文章

（六）6.16 Neurons Networks linear decoders and its implements
Sparse AutoEncoder是一个三层结构的网络,分别为输入输出与隐层,前边自编码器的描述可知,神经网络中的神经元都采用相同的激励函数,Linear Decoders 修改了自编码器的定义,对 ...
CS229 6.10 Neurons Networks implements of softmax regression
softmax可以看做只有输入和输出的Neurons Networks,如下图: 其参数数量为k*(n+1) ,但在本实现中没有加入截距项,所以参数为k*n的矩阵. 对损失函数J(θ)的形式有: 算法 ...
CS229 6.1 Neurons Networks Representation
面对复杂的非线性可分的样本是,使用浅层分类器如Logistic等需要对样本进行复杂的映射,使得样本在映射后的空间是线性可分的,但在原始空间,分类边界可能是复杂的曲线.比如下图的样本只是在2维情形下的示 ...
CS229 6.17 Neurons Networks convolutional neural network（cnn）
之前所讲的图像处理都是小 patchs ,比如28*28或者36*36之类,考虑如下情形,对于一副1000*1000的图像,即106,当隐层也有106节点时,那么W(1)的数量将达到1012级别,为了 ...
CS229 6.15 Neurons Networks Deep Belief Networks
Hintion老爷子在06年的science上的论文里阐述了 RBMs 可以堆叠起来并且通过逐层贪婪的方式来训练,这种网络被称作Deep Belife Networks(DBN),DBN是一种可以学习 ...
CS229 6.2 Neurons Networks Backpropagation Algorithm
今天得主题是BP算法.大规模的神经网络可以使用batch gradient descent算法求解,也可以使用 stochastic gradient descent 算法,求解的关键问题在于求得每层 ...
CS229 6.14 Neurons Networks Restricted Boltzmann Machines
1.RBM简介受限玻尔兹曼机(Restricted Boltzmann Machines,RBM)最早由hinton提出,是一种无监督学习方法,即对于给定数据,找到最大程度拟合这组数据的参数.RBM ...
CS229 6.13 Neurons Networks Implements of stack autoencoder
对于加深网络层数带来的问题,(gradient diffuse 局部最优等)可以使用逐层预训练(pre-training)的方法来避免 Stack-Autoencoder是一种逐层贪婪(Greedy ...
CS229 6.12 Neurons Networks from self-taught learning to deep network
self-taught learning 在特征提取方面完全是用的无监督的方法,对于有标记的数据,可以结合有监督学习来对上述方法得到的参数进行微调,从而得到一个更加准确的参数a. 在self-taug ...

随机推荐

ORACLE 12C 之集群日志位置变化
如果你还是使用 oracle 11g RAC 的目录结构方式寻找集群的日志,你会发现目录中所有的日志都是空的.actdb21:/oracle/app/12.2.0/grid/log/actdb21(+ ...
mysql 的 help 命令：每个命令，都有相应的反斜杠（\）加一个字母或字符的简写
mysql> help For information about MySQL products and services, visit: http://www.mysql.com/ For d ...
mobx-state-tree 知识点
中文教程:https://github.com/chenxiaochun/mobx-state-tree 比较好的介绍文章:https://tech.youzan.com/mobx_vs_redux/ ...
适配器模式adepter
1. 主要优点无论是对象适配器模式还是类适配器模式都具有如下优点: (1) 将目标类和适配者类解耦,通过引入一个适配器类来重用现有的适配者类,无须修改原有结构.(适配者得结构 (2) 增加了类的透明 ...
npm安装教程(vue.js）
https://www.cnblogs.com/goldlong/p/8027997.html 首先理清nodejs和npm的关系: node.js是javascript的一种运行环境,是对Googl ...
Python应用场景（转）
Web应用开发 Python经常被用于Web开发.比如,通过mod_wsgi模块,Apache可以运行用Python编写的Web程序.Python定义了WSGI标准应用接口来协调Http服务器与基于P ...
C# 打印、输入和for循环的使用
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.T ...
golang gopacket网络抓包和分析
gopacket 是golang语言使用的网络数据抓取和分析的工具包. 本文简单介绍如何使用gopacket进行网络抓包. 下载gopacket # go get git@github.com:goo ...
[转]【NLP】干货！Python NLTK结合stanford NLP工具包进行文本处理阅读目录
[NLP]干货!Python NLTK结合stanford NLP工具包进行文本处理原贴: https://www.cnblogs.com/baiboy/p/nltk1.html 阅读目录目 ...
paramiko 实现ssh登录和sftp登录
简单ssh登录 import paramiko ssh = paramiko.SSHClient() ssh.set_missing_host_key_policy(paramiko.AutoAddP ...

CS229 6.16 Neurons Networks linear decoders and its implements

CS229 6.16 Neurons Networks linear decoders and its implements的更多相关文章

随机推荐

热门专题