一天一经典Reducing the Dimensionality of Data with Neural Networks [Science2006]

别看本文没有几页纸，本着把经典的文多读几遍的想法，把它彩印出来看，没想到效果很好，比在屏幕上看着舒服。若用蓝色的笔圈出重点，这篇文章中几乎要全蓝。字字珠玑。

Reducing the Dimensionality of Data with Neural Networks

G.E. Hinton and R.R. Salakhutdinov

摘要

训练一个带有很小的中间层的多层神经网络，可以重构高维空间的输入向量，实现从高维数据到低维编码的效果。（原文为high-dimensional data can be converted to low-dimensional codes）在这样的Autoencoder network中，通常使用Gradient Descent方法来对网络权重进行微调（Fine-tuning），这样做有效的前提是初始的网络权重足够好。（原文this works well only if the initial weights are close to a good solution.）本文提出了一个有效的初始化网络权重的方法，使得采用deep autoencoder network学习到的低维编码优于PCA降维的结果。

背景介绍

PCA（Principal Components Analysis）主成分分析是一个被广泛采用的用来降维的方法。它旨在找到数据点的方差最大的方向，并用这些方向作为坐标来表示数据集中的各个点。（PCA finds the directions of greatest variance in the data set and represents each data points by its coordinates along each of these directions.）

Autoencoder: 我们采用了一种非线性的PCA的泛化版本，该方法使用了一个自适应的多层编码网络，将数据从高维变为低维的编码，以及一个类似的解码网络从低维编码中对数据进行重构。采用随机权重对网络初始化，通过最小化原始数据与重构数据之间的误差来对网络的整体结构进行训练。采用链式法则计算梯度，并将梯度向后传播以更新网络权重。

问题来了~~~~

对于autoencoder来说，初始权重的设置非常重要，过大，则会导致陷入坏的局部最优；过小，则会导致训练困难。只有找到一个好的初始化权重，才能保证后续的梯度算法能够收敛到一个理想的局部解。找到这样的一个初始化权重需要对每一层尝试许多类型的算法，因此本文引入了pretraining的过程。（好好理解：Finding such initial weights requires a very different type of algorithm that learns one layer of features at a time.）

方法概览

以binary data为例，说明pretraining的过程

用RBM（Restricted Boltzmann Machines）来建模binary data，由visible layer与hidden layer构成。以image为例，pixels对应RBM的visible units $v$ ，feature detectors对应RBM的hidden units $h$ ，则visible与hidden units共同对应的energy函数有如下定义：

$E(v,h)=-\sum_{i\in \text{pixels}}b_iv_i-\sum_{j\in\text{features}}b_jh_j-\sum_{i,j}v_ih_jw_{ij}$

通过该能量函数赋予每个image一个概率。通过调节权重以及偏差，来降低真实image的能量，提高虚构image的能量，使得网络更加倾向于那些真实的数据。RBM的结构好：给定h，v之间是条件独立的，给定v，h之间是条件独立的。有如下公式：

$\begin{aligned} &P(v_i=1|h)=\sigma (b_i+\sum_jh_jw_{ij})\\ &P(h_j=1|v)=\sigma (b_j+\sum_iv_iw_{ij}) \end{aligned}$

权重w的调整方法：注意：该调整策略并非严格按照能量公式所对应的概率目标函数求导得到的~~其中， $\varepsilon$ 是学习速率， $\left \langle v_ih_j \right \rangle_{\text{data}}$ 对应真实图像pixel i与feature detector j一起出现的概率， $\left \langle v_ih_j \right \rangle_{\text{recon}}$ 对应虚构图像的概率。

$\bigtriangleup w_{ij}=\varepsilon \left ( \left \langle v_ih_j \right \rangle_{\text{data}}-\left \langle v_ih_j \right \rangle_{\text{recon}} \right )$

pretraining的过程：learning one layer of feature detector->将该层的输出作为第二层feature学习的输入数据（原文：treat activities as data for learning a second layer of features.），也就是说第一层的feature detector作为第二层RBM的visible units。（原文：The first layer of feature detectors then become the visible units for learning the next RBM.）这种逐层学习的思想可以重复多次。

可以证明的是，给定每层的单元数目不减少，有正确的初始化weight，多加一层，可以改善模型概率似然的下界。（原文：Adding an extra layer always improves a lower bound on the log probability that the model assigns to the training data, provided the number of feature detectors per layer does not decrease and their weights are intialized correctly.）

逐层的训练方法对于pretrain deep autoencoder是非常有效的方法。（原文： The layer-by-layer learning algorithm is a very effective way to pretrain the weights of a deep autoencoder. ）每层的feature能够很好的捕捉层间activities的强度以及高阶相关性。对于大多数想要揭示数据的低维、非线性特征来说，逐层训练是一个有效的方法。

纵观整个deep autoencoder可以分为pretraining，unfolding以及fine-tuning三个阶段。pretraining就是上述描述的逐层训练的过程。unfolding是将pretraining阶段学到的weight用于编码与解码的过程，从而得到真正的autoencoder的结构。fine-tuning指的是采用bp算法对整个autoencoder的结构的权重进行微调。

泛化到Continuous data

将visible units由原来的stochastic binary variable替换为linear units with Gaussian Noise。在实验中，所有的visible units都是这种情况的。

实验：

模拟数据

MNIST数据集

Reuter Corpus：The autoencoder clearly outperformed latent semantic analysis, a well known document retrieval method based on PCA.（应该是从优化目标函数的角度来说PCA与LDA的关系）

总结：

Pretraining的好处：由于网络权重中的大部分信息来源于原始数据本身，因此pretraining的泛化性能好。

label中所蕴含的有限信息仅仅适合对网络权重做微调。

从以前的实验经验中可以看出这一点是非常正确的。

Deep Autoencoder早在1980年就已经提出，事实上它对于非线性维度归约（nonlinear dimensionality reduction）来说是非常有效的。但是它需要的三个条件迄今才满足：（1）Computers are fast enough; （2）data sets are big enough; （3）the initial weights are close enough to a good solution。

Autoencoders give mappings in both directions between data and code spaces.

一天一经典Reducing the Dimensionality of Data with Neural Networks [Science2006]的更多相关文章

Deep Learning 16：用自编码器对数据进行降维_读论文“Reducing the Dimensionality of Data with Neural Networks”的笔记
前言论文“Reducing the Dimensionality of Data with Neural Networks”是深度学习鼻祖hinton于2006年发表于<SCIENCE > ...
Reducing the Dimensionality of data with neural networks / A fast learing algorithm for deep belief net
Deeplearning原文作者Hinton代码注解 Matlab示例代码为两部分,分别对应不同的论文: . Reducing the Dimensionality of data with neur ...
Reducing the Dimensionality of Data with Neural Networks：神经网络用于降维
原文链接:http://www.ncbi.nlm.nih.gov/pubmed/16873662/ G. E. Hinton* and R. R. Salakhutdinov . Science. ...
【神经网络】Reducing the Dimensionality of Data with Neural Networks
这篇paper来做什么的? 用神经网络来降维.之前降维用的方法是主成分分析法PCA,找到数据集中最大方差方向.(附:降维有助于分类.可视化.交流和高维信号的存储) 这篇paper提出了一种非线性的PC ...
论文阅读---Reducing the Dimensionality of Data with Neural Networks
通过训练多层神经网络可以将高维数据转换成低维数据,其中有对高维输入向量进行改造的网络层.梯度下降可以用来微调如自编码器网络的权重系数,但是对权重的初始化要求比较高.这里提出一种有效初始化权重的方法,允 ...
【Deep Learning】Hinton. Reducing the Dimensionality of Data with Neural Networks Reading Note
2006年,机器学习泰斗.多伦多大学计算机系教授Geoffery Hinton在Science发表文章,提出基于深度信念网络(Deep Belief Networks, DBN)可使用非监督的逐层贪心 ...
Reducing the Dimensionality of Data with Neural Networks
****************内容加密中********************
文章“Redcing the Dimensiongality of Data with Neural Networks”的翻译
注明:本人英语水平有限,翻译不当之处,请以英文原版为准,不喜勿喷,另,本文翻译只限于学术交流,不涉及任何版权问题,若有不当侵权或其他任何除学术交流之外的问题,请留言本人,本人立刻删除,谢谢!! 本文原 ...
阅读笔记 The Impact of Imbalanced Training Data for Convolutional Neural Networks [DegreeProject2015] 数据分析型
The Impact of Imbalanced Training Data for Convolutional Neural Networks Paulina Hensman and David M ...

随机推荐

不安装Oracle客户端情况下使用PL/SQL 远程连接数据库
附送PL/SQL Developer11中文版下载地址 1.先到Oracle网站下载Instant Client : http://www.oracle.com/technetwork/databas ...
Struts2从一个action转到另一个action的两种方法
在Struts2中,Action处理完用户请求后,将会返回一个字符串对象,这个字符串对象就是一个逻辑视图名.Struts 2通过配置逻辑视图名和物理视图之间的映射关系,一旦系统收到Action返回的某 ...
17.linux下root用户与普通用户
默认安装完成之后并不知道root用户的密码,那么如何应用root权限呢? (1)sudo 命令这样输入当前管理员用户密码就可以得到超级用户的权限.但默认的情况下5分钟root权限就失效了. (2 ...
稳定灵活的 HTML 列式布局
主要特点: 所有列轻松实现相同高度兼容性极高 ------------------------------------------------ 代码 ------------------------ ...
rabbitMQ学习（三）
订阅/广播模式发送端: import java.io.IOException; import com.rabbitmq.client.ConnectionFactory; import com.ra ...
Spring AOP /代理模式/事务管理/读写分离/多数据源管理
参考文章: http://www.cnblogs.com/MOBIN/p/5597215.html http://www.cnblogs.com/fenglie/articles/4097759.ht ...
web初学之重定向与请求转发
重定向与请求转发的问题 (1)RequestDispatcher是通过调用HttpServletRequest对象的getRequestDispatcher()方法得到的,是属于请求对象的方法. (2 ...
VSS记住用户名和密码
计算机-属性-高级系统设置-环境变量新建两个环境变量如下: SSUSER(VSS的用户名) SSPWD(VSS的密码)
zTree入门-最简单的树
最近发现项目中很多地方都是树形菜单,而这些树形菜单都是使用树形插件zTree来制作的,所以就想自学一下zTree,参照官方文档写了一个简单的案例,使用zTree做了一个最简单的树形结构. 案例:zT ...
Android_SQLite之创建数据库
今天我们主要学习了SQLite.主要是其中的创建数据库,连接这块. 现在我们先简单讲解下什么是SQLite 一.SQLite 简介 Google为Andriod的较大的数据处理提供了SQLite, 他 ...

一天一经典Reducing the Dimensionality of Data with Neural Networks [Science2006]

一天一经典Reducing the Dimensionality of Data with Neural Networks [Science2006]的更多相关文章

随机推荐

热门专题