5种方法推导Normal Equation

引言：

Normal Equation 是最基础的最小二乘方法。在Andrew Ng的课程中给出了矩阵推到形式，本文将重点提供几种推导方式以便于全方位帮助Machine Learning用户学习。

Notations：

RSS（Residual Sum Squared error）：残差平方和

β：参数列向量

X：N×p 矩阵，每行是输入的样本向量

y：标签列向量，即目标列向量

Method 1. 向量投影在特征纬度（Vector Projection onto the Column Space）
是一种最直观的理解： The optimization of linear regression is equivalent to finding the projection of vector y onto the column space of X. As the projection is denoted by Xβ, the optimal configuration of β is when the error vector y−Xβis orthogonal to the column space of X, that is

X^T(y−Xβ)=0.(1)

Solving this gives:

β=(X^TX)⁻¹X^Ty.

Method 2. Direct Matrix Differentiation

通过重写S(β)为简单形式是一种最简明的方法

S(β)=(y−Xβ)^T(y−Xβ)=y^Ty−β^TX^Ty−y^TXβ+β^TX^TXβ=y^Ty−2β^TX^Ty+β^TX^TXβ.

差异化 S(β) w.r.t. β:

−2y^TX+β^T(X^TX+(X^TX)^T)=−2y^TX+2β^TX^TX=0,

Solving S(β) gives:

β=(X^TX)⁻¹X^Ty.

Method 3. Matrix Differentiation with Chain-rule
This is the simplest method for a lazy person, as it takes very little effort to reach the solution. The key is to apply the chain-rule:

∂S(β)∂β=∂(y−Xβ)^T(y−Xβ)∂(y−Xβ)∂(y−Xβ)∂β=−2(y−Xβ)^TX=0,

solving S(β) gives:

β=(X^TX)−1X^Ty.

This method requires an understanding of matrix differentiation of the quadratic form: ∂x^TWx∂x=x^T(W+W^T).

Method 4. Without Matrix Differentiation
We can rewrite S(β) as following:

S(β)=⟨β,β⟩−2⟨β,(X^TX)−1X^Ty⟩+⟨(X^TX)−1XTy,(X^TX)−1X^Ty⟩+C,

where ⟨⋅,⋅⟩ is the inner product defined by

⟨x,y⟩=x^T(X^TX)y.

The idea is to rewrite S(β) into the form of S(β)=(x−a)2+b such that x can be solved exactly.

Method 5. Statistical Learning Theory
An alternative method to derive the normal equation arises from the statistical learning theory. The aim of this task is to minimize the expected prediction error given by:
EPE(β)=∫(y−xTβ)Pr(dx,dy),
where x stands for a column vector of random variables, y denotes the target random variable, and β denotes a column vector of parameters (Note the definitions are different from the notations before).
Differentiating EPE(β) w.r.t. β gives:

∂EPE(β)∂β=∫2(y−xTβ)(−1)xTPr(dx,dy).

Before we proceed, let’s check the dimensions to make sure the partial derivative is correct. EPE is the expected error: a 1×1 vector. β is a column vector that is N×1. According to the Jacobian in vector calculus, the resulting partial derivative should take the form

∂EPE∂β=(∂EPE∂β1,∂EPE∂β2,…,∂EPE∂βN),

which is a 1×N vector. Looking back at the right-hand side of the equation above, we find 2(y−xTβ)(−1) being a constant while xTbeing a row vector, resuling the same 1×Ndimension. Thus, we conclude the above partial derivative is correct. This derivative mirrors the relationship between the expected error and the way to adjust parameters so as to reduce the error. To understand why, imagine 2(y−xTβ)(−1) being the errors incurred by the current parameter configurations β and xT being the values of the input attributes. The resulting derivative equals to the error times the scales of each input attribute. Another way to make this point is: the contribution of error from each parameter βi has a monotonic relationship with the error 2(y−xTβ)(−1) as well as the scalar xT that was multiplied to each βi.

Now, let’s go back to the derivation. Because 2(y−xTβ)(−1) is 1×1, we can rewrite it with its transpose:

∂EPE(β)∂β=∫2(y−xTβ)T(−1)xTPr(dx,dy).

Solving ∂EPE(β)∂β=0 gives:

E[yTxT−βTxxT]=0E[βTxxT]=E[yTxT]E[xxTβ]=E[xy]β=E[xxT]−1E[xy].

5种方法推导Normal Equation的更多相关文章

机器学习入门：Linear Regression与Normal Equation -2017年8月23日22:11:50
本文会讲到: (1)另一种线性回归方法:Normal Equation: (2)Gradient Descent与Normal Equation的优缺点: 前面我们通过Gradient Desce ...
正规方程 Normal Equation
正规方程 Normal Equation 前几篇博客介绍了一些梯度下降的有用技巧,特征缩放(详见http://blog.csdn.net/u012328159/article/details/5103 ...
machine learning (7)---normal equation相对于gradient descent而言求解linear regression问题的另一种方式
Normal equation: 一种用来linear regression问题的求解Θ的方法,另一种可以是gradient descent 仅适用于linear regression问题的求解,对其 ...
coursera机器学习笔记-多元线性回归，normal equation
#对coursera上Andrew Ng老师开的机器学习课程的笔记和心得: #注:此笔记是我自己认为本节课里比较重要.难理解或容易忘记的内容并做了些补充,并非是课堂详细笔记和要点: #标记为<补 ...
Normal Equation
一.Normal Equation 我们知道梯度下降在求解最优参数\(\theta\)过程中需要合适的\(\alpha\),并且需要进行多次迭代,那么有没有经过简单的数学计算就得到参数\(\theta ...
Normal Equation Algorithm
和梯度下降法一样,Normal Equation(正规方程法)算法也是一种线性回归算法(Linear Regression Algorithm).与梯度下降法通过一步步计算来逐步靠近最佳θ值不同,No ...
normal equation（正规方程）
normal equation(正规方程) 正规方程是通过求解下面的方程来找出使得代价函数最小的参数的: \[ \frac{\partial}{\partial\theta_j}J\left(\the ...
YbSoftwareFactory 代码生成插件【二十五】：Razor视图中以全局方式调用后台方法输出页面代码的三种方法
上一篇介绍了 MVC中实现动态自定义路由的实现,本篇将介绍Razor视图中以全局方式调用后台方法输出页面代码的三种方法. 框架最新的升级实现了一个页面部件功能,其实就是通过后台方法查询数据库内容,把 ...
去除inline-block元素间间距的N种方法
这篇文章发布于 2012年04月24日,星期二,22:38,归类于 css相关. 阅读 147771 次, 今日 52 次 by zhangxinxu from http://www.zhangxin ...

随机推荐

微服务框架概览之 Netty
Netty 是什么 Netty 提供异步的.事件驱动的网络应用程序框架和工具,用以快速开发高性能.高可靠性的网络服务器和客户端程序 Netty 架构图为什么选择Netty 通过对Netty的分析,我 ...
db2_errroecode
sqlcode sqlstate 说明 000 00000 SQL语句成功完成 01xxx SQL语句成功完成,但是有警告 +012 01545 未限定的列名被解释为一个有相互关系的引用 +09 ...
JAVA逻辑运算符
逻辑运算符,用于链接boolean类型的表达式. AND与 (&)OR或 (|)XOR异或 (^)Not非 (!)AND双与短路 (&&)OR双与短路 (||) 与(& ...
使用Hugo搭建GitHub个人博客
主题概况 Hugo 是一个用 Go 语言编写的静态网站生成器.类似的静态网站生成器还有Jekyll.hexo等等.以上生成器都使用过,但感觉要么环境麻烦,要么生成静态页面步骤繁琐以及生成缓慢.如果你正 ...
【读书笔记】-- JavaScript模块
在JavaScript编程中我们用的很多的一个场景就是写模块.可以看成一个简单的封装或者是一个类库的开始,有哪些形式呢,先来一个简单的模块. 简单模块 var foo = (function() { ...
Java Web(七) JSTL标签库
在之前我们学过在JSP页面上为了不使用脚本,所以我们有了JSP内置的行为.行为只能提供一小部分的功能,大多数的时候还是会用java脚本,接着就使用了EL表达式,基本上EL表达式看似能满足我们的要求,它 ...
CentOS 7 yum搭建 LAMP
CentOS 7 搭建LAMP环境 1. Apache 安装 Apache 的软件包名称叫做httpd,因此安装Apache,使用以下命令 [root@localhost ~]# yum -y ins ...
[译]AngularJS 1.3.0 开发者指南(一) -- 介绍
[译]AngularJS 1.3.0 开发者指南(一) -- 介绍 Angular是什么 ? AngularJS是一款针对动态web应用的结构框架. 它可以让像使用模板语言使用HTML, 并且可以扩展 ...
windows管道
匿名管道的使用匿名管道主要用于本地父进程和子进程之间的通信, 在父进程中的话,首先是要创建一个匿名管道, 在创建匿名管道成功后,可以获取到对这个匿名管道的读写句柄, 然后父进程就可以向这个匿名管道中 ...
loadrunner测试结果分析
LR性能测试结果样例分析测试结果分析 LoadRunner性能测试结果分析是个复杂的过程,通常可以从结果摘要.并发数.平均事务响应时间.每秒点击数.业务成功率.系统资源.网页细分图.Web服务器资源 ...

5种方法推导Normal Equation

5种方法推导Normal Equation的更多相关文章

随机推荐

热门专题