Sparse AutoEncoder是一个三层结构的网络,分别为输入输出与隐层,前边自编码器的描述可知,神经网络中的神经元都采用相同的激励函数,Linear Decoders 修改了自编码器的定义,对输出层与隐层采用了不用的激励函数,所以 Linear Decoder 得到的模型更容易应用,而且对模型的参数变化有更高的鲁棒性。

在网络中的前向传导过程中的公式:

其中 a(3) 是输出. 在自编码器中, a(3) 近似重构了输入 x = a(1) 

对于最后一层为 sigmod(tanh) 激活函数的 autoencoder ,会直接将数据归一化到 [0,1] ,所以当 f(z(3)) 采用 sigmod(tanh) 函数时,就要对输入限制或缩放,使其位于 [0,1] 范围中。但是对于输入数据 x ,比如 MNIST,但是很难满足 x 也在 [0,1] 的要求。比如, PCA 白化处理的输入并不满足 [0,1] 范围要求。

另 a(3) = z(3) 可以很简单的解决上述问题。即在输出端使用恒等函数 f(z) = z 作为激励函数,于是有 a(3) = f(z(3)) = z(3)。该特殊的激励函数叫做 线性激励 (恒等激励)函数

Linear Decoder 中隐含层的神经元依然使用 sigmod(tanh)激励函数。隐含单元的激励公式为  ,其中  是 S 型函数, x 是入, W(1) 和 b(1) 分别是隐单元的权重和偏差项。即仅在输出层中使用线性激励函数。这用一个 S 型或 tanh 隐含层以及线性输出层构成的自编码器,叫做线性解码器

在线性解码器中,。因为输出  是隐单元激励输出的线性函数,改变 W(2) ,即可使输出值 a(3) 大于 1 或者小于 0。这样就可以避免在 sigmod 对输出层的值缩放到 [0,1] 。

随着输出单元的激励函数的改变,输出单元的梯度也相应变化。之前每一个输出单元误差项定义为:

其中 y = x 是所期望的输出,  是自编码器的输出,   是激励函数.因为在输出层激励函数为 f(z) = z, 这样 f'(z) = 1,所以上述公式可以简化为

当然,若使用反向传播算法来计算隐含层的误差项时:

因为隐含层采用一个 S 型(或 tanh)的激励函数 f,在上述公式中, 依然是 S 型(或 tanh)函数的导数。即Linear Decoder中只有输出层残差是不同于autoencoder 的。

Liner Decoder 代码:

%% CS294A/CS294W Linear Decoder Exercise

%  Instructions
% ------------
%
% This file contains code that helps you get started on the
% linear decoder exericse. For this exercise, you will only need to modify
% the code in sparseAutoencoderLinearCost.m. You will not need to modify
% any code in this file. %%======================================================================
%% STEP : Initialization
% Here we initialize some parameters used for the exercise. imageChannels = ; % number of channels (rgb, so 3) patchDim = ; % patch dimension(需要 8*8 的小patches)
numPatches = ; % number of patches
% 把8 * * rgb_size 的小patchs 共同作为可见层的unit数目
visibleSize = patchDim * patchDim * imageChannels; % number of input units
outputSize = visibleSize; % number of output units
hiddenSize = ; % number of hidden units sparsityParam = .; % desired average activation of the hidden units.
lambda = 3e-; % weight decay parameter
beta = ; % weight of sparsity penalty term epsilon = .; % epsilon for ZCA whitening %%======================================================================
%% STEP : Create and modify sparseAutoencoderLinearCost.m to use a linear decoder,
% and check gradients
% You should copy sparseAutoencoderCost.m from your earlier exercise
% and rename it to sparseAutoencoderLinearCost.m.
% Then you need to rename the function from sparseAutoencoderCost to
% sparseAutoencoderLinearCost, and modify it so that the sparse autoencoder
% uses a linear decoder instead. Once that is done, you should check
% your gradients to verify that they are correct. % NOTE: Modify sparseAutoencoderCost first! % To speed up gradient checking, we will use a reduced network and some
% dummy patches debugHiddenSize = ;
debugvisibleSize = ;
patches = rand([ ]);
theta = initializeParameters(debugHiddenSize, debugvisibleSize); [cost, grad] = sparseAutoencoderLinearCost(theta, debugvisibleSize, debugHiddenSize, ...
lambda, sparsityParam, beta, ...
patches); % Check gradients
numGrad = computeNumericalGradient( @(x) sparseAutoencoderLinearCost(x, debugvisibleSize, debugHiddenSize, ...
lambda, sparsityParam, beta, ...
patches), theta); % Use this to visually compare the gradients side by side
disp([numGrad grad]); diff = norm(numGrad-grad)/norm(numGrad+grad);
% Should be small. In our implementation, these values are usually less than 1e-.
disp(diff); assert(diff < 1e-, 'Difference too large. Check your gradient computation again'); % NOTE: Once your gradients check out, you should run step again to
% reinitialize the parameters
%} %%======================================================================
%% STEP : Learn features on small patches
% In this step, you will use your sparse autoencoder (which now uses a
% linear decoder) to learn features on small patches sampled from related
% images. %% STEP 2a: Load patches
% In this step, we load 100k patches sampled from the STL10 dataset and
% visualize them. Note that these patches have been scaled to [,] load stlSampledPatches.mat displayColorNetwork(patches(:, :)); %% STEP 2b: Apply preprocessing
% In this sub-step, we preprocess the sampled patches, in particular,
% ZCA whitening them.
%
% In a later exercise on convolution and pooling, you will need to replicate
% exactly the preprocessing steps you apply to these patches before
% using the autoencoder to learn features on them. Hence, we will save the
% ZCA whitening and mean image matrices together with the learned features
% later on. % Subtract mean patch (hence zeroing the mean of the patches)
meanPatch = mean(patches, );
patches = bsxfun(@minus, patches, meanPatch);% - mean % Apply ZCA whitening
sigma = patches * patches' / numPatches;
[u, s, v] = svd(sigma);
%一下是打算对数据做ZCA变换,数据需要做的变换的矩阵
ZCAWhite = u * diag(1 ./ sqrt(diag(s) + epsilon)) * u';
%这一步是ZCA变换
patches = ZCAWhite * patches; displayColorNetwork(patches(:, :)); %% STEP 2c: Learn features
% You will now use your sparse autoencoder (with linear decoder) to learn
% features on the preprocessed patches. This should take around minutes. theta = initializeParameters(hiddenSize, visibleSize); % Use minFunc to minimize the function
addpath minFunc/ options = struct;
options.Method = 'lbfgs';
options.maxIter = ;
options.display = 'on'; [optTheta, cost] = minFunc( @(p) sparseAutoencoderLinearCost(p, ...
visibleSize, hiddenSize, ...
lambda, sparsityParam, ...
beta, patches), ...
theta, options); % Save the learned features and the preprocessing matrices for use in
% the later exercise on convolution and pooling
fprintf('Saving learned features and preprocessing matrices...\n');
save('STL10Features.mat', 'optTheta', 'ZCAWhite', 'meanPatch');
fprintf('Saved\n'); %% STEP 2d: Visualize learned features
%这里为什么要用(W*ZCAWhite)'呢?首先,使用W*ZCAWhite是因为每个样本x输入网络,
%其输出等价于W*ZCAWhite*x;另外,由于W*ZCAWhite的每一行才是一个隐含节点的变换值
%而displayColorNetwork函数是把每一列显示一个小图像块的,所以需要对其转置。
W = reshape(optTheta(1:visibleSize * hiddenSize), hiddenSize, visibleSize);
b = optTheta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);
displayColorNetwork( (W*ZCAWhite)'); function [cost,grad,features] = sparseAutoencoderLinearCost(theta, visibleSize, hiddenSize, ...
lambda, sparsityParam, beta, data)
% -------------------- YOUR CODE HERE --------------------
% Instructions:
% Copy sparseAutoencoderCost in sparseAutoencoderCost.m from your
% earlier exercise onto this file, renaming the function to
% sparseAutoencoderLinearCost, and changing the autoencoder to use a
% linear decoder.
% -------------------- YOUR CODE HERE -------------------- %将数据由向量转化为矩阵:
W1 = reshape(theta(:hiddenSize*visibleSize), hiddenSize, visibleSize);
W2 = reshape(theta(hiddenSize*visibleSize+:*hiddenSize*visibleSize), visibleSize, hiddenSize);
b1 = theta(*hiddenSize*visibleSize+:*hiddenSize*visibleSize+hiddenSize);
b2 = theta(*hiddenSize*visibleSize+hiddenSize+:end); %样本数
m = size(data ,); %%%%%%%%%%% forward %%%%%%%%%%%
z2 = W1*data + repmat(b1, [,m]);
a2 = f(z2);
z3 = W2*a2 + repmat(b2, [,m]);
a3 = z3; %求当前网络的平均激活度
rho_hat = mean(a2 ,);
rho = sparsityParam;
%对隐层所有节点的散度求和。
KL_Divergence = sum(rho * log(rho ./ rho_hat) + log((- rho) ./ (-rho_hat))); squares = (a3- data).^;
J_square_err = (/)*(/m)* sum(squares(:));
J_weight_decay = (lambd/)*(sum(W1(:).^) + sum(W2(:).^));
J_sparsity = beta * KL_Divergence; cost = J_square_err + J_weight_decay + J_sparsity; %%%%%%%%%%% backward %%%%%%%%%%%
delta3 = -(data-a3);% 注意 linear decoder
beta_term = beta * (- rho ./ rho_hat + (-rho) ./ (-rho_hat));
delta2 = (W2' * delta3) * repmat(beta_term, [1,m]) .* a2 .*(1-a2); W2grad = (1/m) * delta3 * a2' + lambda * W2;
b2grad = (/m) * sum(delta3, );
W1grad = (/m) * delta2 * data' + lambda * W1;
b1grad = (1/m) * sum(delta2, 2);
%-------------------------------------------------------------------
% Convert weights and bias gradients to a compressed form
% This step will concatenate and flatten all your gradients to a vector
% which can be used in the optimization method.
grad = [W1grad(:) ; W2grad(:) ; b1grad(:) ; b2grad(:)]; end
%-------------------------------------------------------------------
% We are giving you the sigmoid function, you may find this function
% useful in your computation of the loss and the gradients.
function sigm = sigmoid(x) sigm = 1 ./ (1 + exp(-x));
end

CS229 6.16 Neurons Networks linear decoders and its implements的更多相关文章

  1. (六)6.16 Neurons Networks linear decoders and its implements

    Sparse AutoEncoder是一个三层结构的网络,分别为输入输出与隐层,前边自编码器的描述可知,神经网络中的神经元都采用相同的激励函数,Linear Decoders 修改了自编码器的定义,对 ...

  2. CS229 6.10 Neurons Networks implements of softmax regression

    softmax可以看做只有输入和输出的Neurons Networks,如下图: 其参数数量为k*(n+1) ,但在本实现中没有加入截距项,所以参数为k*n的矩阵. 对损失函数J(θ)的形式有: 算法 ...

  3. CS229 6.1 Neurons Networks Representation

    面对复杂的非线性可分的样本是,使用浅层分类器如Logistic等需要对样本进行复杂的映射,使得样本在映射后的空间是线性可分的,但在原始空间,分类边界可能是复杂的曲线.比如下图的样本只是在2维情形下的示 ...

  4. CS229 6.17 Neurons Networks convolutional neural network(cnn)

    之前所讲的图像处理都是小 patchs ,比如28*28或者36*36之类,考虑如下情形,对于一副1000*1000的图像,即106,当隐层也有106节点时,那么W(1)的数量将达到1012级别,为了 ...

  5. CS229 6.15 Neurons Networks Deep Belief Networks

    Hintion老爷子在06年的science上的论文里阐述了 RBMs 可以堆叠起来并且通过逐层贪婪的方式来训练,这种网络被称作Deep Belife Networks(DBN),DBN是一种可以学习 ...

  6. CS229 6.2 Neurons Networks Backpropagation Algorithm

    今天得主题是BP算法.大规模的神经网络可以使用batch gradient descent算法求解,也可以使用 stochastic gradient descent 算法,求解的关键问题在于求得每层 ...

  7. CS229 6.14 Neurons Networks Restricted Boltzmann Machines

    1.RBM简介 受限玻尔兹曼机(Restricted Boltzmann Machines,RBM)最早由hinton提出,是一种无监督学习方法,即对于给定数据,找到最大程度拟合这组数据的参数.RBM ...

  8. CS229 6.13 Neurons Networks Implements of stack autoencoder

    对于加深网络层数带来的问题,(gradient diffuse  局部最优等)可以使用逐层预训练(pre-training)的方法来避免 Stack-Autoencoder是一种逐层贪婪(Greedy ...

  9. CS229 6.12 Neurons Networks from self-taught learning to deep network

    self-taught learning 在特征提取方面完全是用的无监督的方法,对于有标记的数据,可以结合有监督学习来对上述方法得到的参数进行微调,从而得到一个更加准确的参数a. 在self-taug ...

随机推荐

  1. ASP.NET AJAX入门系列(6):UpdateProgress控件简单介绍

    在ASP.NET AJAX Beta2中,UpdateProgress控件已经从“增值”CTP中移到了ASP.NET AJAX核心中.以下两篇关于UpdateProgress的文章基本翻译自ASP.N ...

  2. C++ Builder使用VC DLL

    好久没用BCB了,真的有些陌生了,当然个烂笔头吧. 1 先 implib c:\xxx.lib c:\xxx.dll 生成lib文件 2 #pragma comment(lib,"xxx.l ...

  3. xcode Xcode_9.2.xip 官方离线下载地址

    一.打开下面的链接 https://developer.apple.com/downloads/ 二.在左侧搜索框中搜索:xcode 三.展开你要下载的版本,点列表右边的蓝色链接如:Xcode 9.2 ...

  4. Delphi中使用ADO连接Excel

    第一部分: . 设置ADOConnection的ConnectionString属性的OLE DB的提供者要选择Microsoft Jet 4.0 OLE DB Provider(这本来是用于连接Ac ...

  5. dom响应事件

    DOMsubtreeModified.DOMNodeInserted.DOMNodeRemoved.DOMAttrModified.DOMCharacterDataModified 当底层DOM结构发 ...

  6. 多线程练习,深刻体会了一次变量的BUG.

    package ltb20180106; public class TestBankThread { private int deposit=0;//注意全局变量的摆放. public TestBan ...

  7. 国外的开源项目Shopizer部署问题

    版本:shopizer-2.2.0 项目地址:https://github.com/shopizer-ecommerce/shopizer 使用IDEA时遇到和修改的问题 1.修改数据库类型为MYSQ ...

  8. 将mongo设置为windows的服务

    原文链接 https://mp.weixin.qq.com/s/rmWcvjZFJb3z_5M8UPWAPQ PHP的mongo扩展: 首先 下载一个PHP的mongo扩展, 地址:http://do ...

  9. GoJS拖动设计

    http://192.168.0.149:8035/gojs/intro/groups.html http://192.168.0.149:8035/gojs/intro/ports.html htt ...

  10. STL基础--仿函数(函数对象)

    1 首先看个仿函数的例子 class X { public: void operator()(string str) { // 函数调用运算符,返回类型在operator之前 cout << ...