What is an intuitive explanation of the relation between PCA and SVD?
What is an intuitive explanation of the relation between PCA and SVD?
36 FOLLOWERS
What is an intuitive explanation of the relation between PCA and SVD?
Mike Tamir, CSO - GalvanizeU accredited Masters program creating top tier Data Scientists...
The result of this process is a ranked list of "directions" in the feature space ordered from most variance to least. The directions along which there is greatest variance are referred to as the "principal components" (of variation in the data) and the common wisdom is that by focusing on the way the data is distributed along these dimensions exclusively, one can capture most of the information represented in in the original feature space without having to deal with such a high number of dimensions which can be of great benefit in statistical modeling and Data Science applications (see: When and where do we use SVD?).
What is the Formal Relation between SVD and PCA?
Let's let the matrix M be our data matrix where the m rows represents our data points and and the n columns represents the features of the data point. The data may already have been mean centered and normalized by the standard deviations column-wise (most off-the-shelf implementations provide these options).
SVD: Because in most cases a data matrix M will not have exactly the same number of data points as features (i.e. m≠n) the matrix M will not be a square matrix and a diagonalization of the from M=UΣUT where U is an m×m orthogonal matrix of the eigenvectors of M and Σ is the diagonal m×m matrix of the eigenvalues of M will not exist. However, in cases where n≠m, an analogue of this decomposition is possible and M can be factored as follows M=UΣVT, where
- U is an m×m orthogonal matrix of the the "left singular-vectors" of M.
- V is an n×n orthogonal matrix of the the "right singular-vectors" of M.
- And, Σ is an m×n matrix with non-zero entries Σi,i referred to as the "singular-values" of M.
- Note, u⃗ , v⃗ , and σ form a left singular-vector, right singular-vector, and singular-value triple for a given matrix M if they satisfy the following equations:
- Mv⃗ =σu⃗ and
- MTu⃗ =σv⃗
PCA: PCA sidesteps the problem of M not being diagonalizable by working directly with the n×n "covariance matrix" MTM. Because MTM is symmetric it is guaranteed to be diagonalizable. So PCA works by finding the eigenvectors of the covariance matrix and ranking them by their respective eigenvalues. The eigenvectors with the greatest eigenvalues are the Principal Components of the data matrix.
Now, a little bit of matrix algebra can be done to show that the Principal Components of a PCA diagonalization of the covariance matrix MTM are the same left-singular vectors that are found through SVD (i.e. the columns of matrix V) - the same as the principal components found through PCA:
From SVD we have M=UΣVT so...
- MTM=(UΣVT)T(UΣVT)
- MTM=(VΣTUT)(UΣVT)
- but since U is orthogonal UTU=I
so
- MTM=VΣ2VT
where Σ2 is an n×n diagonal matrix with the diagonal elements Σ2i,i from the matrix Σ. So the matrix of eigenvectors V in PCA are the same as the singular vectors from SVD, and the eigenvalues generated in PCA are just the squares of the singular values from SVD.
So is it ever better to use SVD over PCA?
Yes. While formally both solutions can be used to calculate the same principal components and their corresponding eigen/singular values, the extra step of calculating the covariance matrix MTM can lead to numerical rounding errors when calculating the eigenvalues/vectors.
Related Questions
I'll be assuming your data matrix is an m×n matrix that is organized such that rows are data samples (m samples), and columns are features (d features).
The first point is that SVD preforms low rank matrix approximation.
Your input to SVD is a number k (that is smaller than m or d), and the SVD procedure will return a set of k vectors of d dimensions (can be organized in a k×d matrix), and a set of k coefficients for each data sample (there are m data samples, so it can be organized in a m×k matrix), such that for each sample, the linear combination of it's k coefficients multiplied by the k vectors best reconstructs that data sample (in the euclidean distance sense). and this is true for all data samples.
So in a sense, the SVD procedure finds the optimum k vectors that together span a subspace in which most of the data samples lie in (up to a small reconstruction error).
PCA on the other hand is:
1) subtract the mean sample from each row of the data matrix.
2) preform SVD on the resulting matrix.
So, the second point is that PCA is giving you as output the subspace thatspans the deviations from the mean data sample, and SVD provides you with a subspace that spans the data samples themselves (or, you can view this as a subspace that spans the deviations from zero).
Note that these two subspaces are usually NOT the same, and will be the same only if the mean data sample is zero.
In order to understand a little better why they are not the same, let's think of a data set where all features values for all data samples are in the range 999-1001, and each feature's mean is 1000.
From the SVD point of view, the main way in which these sample deviate from zero are along the vector (1,1,1,...,1).
From the PCA point of view, on the other hand, the main way in which these data samples deviate from the mean data sample is dependent on the precise data distributions around the mean data sample...
In short, we can think of SVD as "something that compactly summarizes the main ways in which my data is deviating from zero" and PCA as "something that compactly summarizes the main ways in which my data is deviating from the mean data sample".
What is an intuitive explanation of the relation between PCA and SVD?的更多相关文章
- False Discovery Rate, a intuitive explanation
[转载请注明出处]http://www.cnblogs.com/mashiqi Today let's talk about a intuitive explanation of Benjamini- ...
- [转]An Intuitive Explanation of Convolutional Neural Networks
An Intuitive Explanation of Convolutional Neural Networks https://ujjwalkarn.me/2016/08/11/intuitive ...
- An Intuitive Explanation of Fourier Theory
Reprinted from: http://cns-alumni.bu.edu/~slehar/fourier/fourier.html An Intuitive Explanation of Fo ...
- An Intuitive Explanation of Convolutional Neural Networks
https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/ An Intuitive Explanation of Convolu ...
- 一目了然卷积神经网络 - An Intuitive Explanation of Convolutional Neural Networks
An Intuitive Explanation of Convolutional Neural Networks 原文地址:https://ujjwalkarn.me/2016/08/11/intu ...
- 从 Quora 的 187 个问题中学习机器学习和NLP
从 Quora 的 187 个问题中学习机器学习和NLP 原创 2017年12月18日 20:41:19 作者:chen_h 微信号 & QQ:862251340 微信公众号:coderpai ...
- PCA,SVD
PCA的数学原理 https://www.zhihu.com/question/34143886/answer/196294308 奇异值分解的揭秘(二):降维与奇异向量的意义 奇异值分解的揭秘(一) ...
- Goldstone's theorem(转载)
Goldstone's theorem是凝聚态物理中的重要定理之一.简单来说,定理指出:每个自发对称破缺都对应一个无质量的玻色子(准粒子),或者说一个zero mode. 看过文章后,我个人理解这其实 ...
- Why one-norm is an agreeable alternative for zero-norm?
[转载请注明出处]http://www.cnblogs.com/mashiqi Today I try to give a brief inspection on why we always choo ...
随机推荐
- Internet History, Technology and Security (Week6)
Week6 The Internet is desinged based on four-layer model. Each layer builds on the layers below it. ...
- 四则运算结对项目之GUI
本次结对编程让我学到了许多许多知识,受益匪浅!在此之前,我没想过我能做出一个双击运行的小程序. 感谢我的队友与我同心协力,感谢室友宇欣告诉我操作符为“最多多少”而不是“多少”并教我使用效能分析工具,感 ...
- 搭建企业级Docker Registry -- Harbor
Harbor 是一个企业级的 Docker Registry,可以实现 images 的私有存储和日志统计权限控制等功能,并支持创建多项目(Harbor 提出的概念),基于官方 Registry V2 ...
- [转]正确设置nginx/php-fpm/apache权限
核心总结:php-fpm/apache 进程所使用的用户,不能是网站文件所有者. 凡是违背这个原则,则不符合最小权限原则. 根据生产环境不断反馈,发现不断有 php网站被挂木马,绝大部分原因是因为权限 ...
- mysql索引的优化
MySQL索引的优化 上面都在说使用索引的好处,但过多的使用索引将会造成滥用.因此索引也会有它的缺点:虽然索引大大提高了查询速度,同时却会降低更新表的速度,如对表进行INSERT.UPDATE和DEL ...
- jQueryEasyUI的使用
easyUI是一个基于jQuery的前端框架,如果想要使用easyUI就需要先导入easyUI的一些js文件和样式文件(本地化的JS文件可以自己选择是否导入),导入方式如下: 其中必须首先导入jQue ...
- 【BZOJ 3326】[Scoi2013]数数 数位dp+矩阵乘法优化
挺好的数位dp……先说一下我个人的做法:经过观察,发现这题按照以往的思路从后往前递增,不怎么好推,然后我就大胆猜想,从前往后推,发现很好推啊,维护四个变量,从开始位置到现在有了i个数 f[i]:所有数 ...
- BZOJ4037:[HAOI2015]数字串拆分——题解
https://www.lydsy.com/JudgeOnline/problem.php?id=4037 你有一个长度为n的数字串.定义f(S)为将S拆分成若干个1~m的数的和的方案数,比如m=2时 ...
- 洛谷 P3942 将军令 解题报告
P3942 将军令 题目描述 又想起了四月. 如果不是省选,大家大概不会这么轻易地分道扬镳吧? 只见一个又一个昔日的队友离开了机房. 凭君莫话封侯事,一将功成万骨枯. 梦里,小\(F\)成了一个给将军 ...
- Linux内核分析实验三----跟踪分析Linux内核的启动过程
一.Linux内核源代码介绍 1.根目录 arch/x86目录下的代码是我们重点关注的,arch中包括支持不同CPU的源代码. init目录下包含内核启动相关的代码,如main.c(start_ker ...