正则化(Regularization - Solving the Problem of Overfitting)

欠拟合(高偏差) VS 过度拟合(高方差)

Underfitting, or high bias, is when the form of our hypothesis function h maps poorly to the trend of the data.

It is usually caused by a function that is too simple or uses too few features.

欠拟合(高偏差):没有很好的拟合训练集数据;

At the other extreme, overfitting, or high variance, is caused by a hypothesis function that fits the available data but does not generalize well to predict new data.

It is usually caused by a complicated function that creates a lot of unnecessary curves and angles unrelated to the data.

过度拟合(高方差):可以很好的拟合训练集数据,但是函数太过庞大,变量太多,且缺少足够多的数据约束该模型(m < n),无法泛化到新的数据样本。

This terminology is applied to both linear and logistic regression. There are two main options to address the issue of overfitting:

两种方法解决过度拟合:

  1. Reduce the number of features
  • Manually select which features to keep
  • Use a model selection algorithm (studied later in the course)
  1. Regularization
  • Keep all the features, but reduce the magnitude of parameters \(\theta_j\).
  • Regularization works well when we have a lot of slightly useful features.

正则化 - 线性回归代价函数

所有正则化均不包括 \(\theta_0\) 项

\(J(\theta)=\frac{1}{2m} \Bigg[ \sum\limits_{i=1}^m \Big( h_\theta(x^{(i)}) - y^{(i)} \Big)^2 + \lambda \sum\limits_{j=1}^n \theta_j^2 \Bigg]\)

向量化表示为(A vectorized implementation is):

\(\overrightarrow{h}=g(X \overrightarrow{\theta})\)

\(J(\theta)=\frac{1}{2m} \cdot \Bigg[ (\overrightarrow{h}-\overrightarrow{y})^T \cdot (\overrightarrow{h}-\overrightarrow{y}) + \lambda \cdot (\overrightarrow{l} \cdot \overrightarrow{\theta}^{.2}) \Bigg]\)

\(\overrightarrow{l} = [0, 1, 1, ...1]\)

代码实现:

m = length(y);

l = ones(1, length(theta)); l(:,1) = 0;
J = 1/(2*m) * ((X * theta - y)' * (X * theta - y) + lambda * (l * (theta.^2)); or J = 1/(2*m) * ((X * theta - y)' * (X * theta - y) + lambda * (theta'*theta - theta(1,:).^2);

正则化 - 逻辑回归代价函数

所有正则化均不包括 \(\theta_0\) 项

\(J(\theta)=-\frac{1}{m} \sum\limits_{i=1}^m \Bigg[ y^{(i)} \cdot log \bigg(h_\theta(x^{(i)}) \bigg) + (1-y^{(i)}) \cdot log \bigg(1-h_\theta(x^{(i)}) \bigg) \Bigg] + \frac{\lambda}{2m} \sum\limits_{j=1}^n \theta_j^2\)

向量化表示为(A vectorized implementation is):

\(\overrightarrow{h}=g(X \overrightarrow{\theta})\)

\(J(\theta)=\frac{1}{m} \cdot \Big( -\overrightarrow{y}^T \cdot log(\overrightarrow{h}) - (1- \overrightarrow{y})^T \cdot log(1- \overrightarrow{h}) \Big) + \frac{\lambda}{2m} (\overrightarrow{l} \cdot \overrightarrow{\theta}^{.2})\)

\(\overrightarrow{l} = [0, 1, 1, ...1]\)

代码实现:

m = length(y);

l = ones(1, length(theta)); l(:,1) = 0;
J = (1/m)*(-y'*log(sigmoid(X*theta))-(1 - y)'* log(1-sigmoid(X*theta))) + ...
(lambda/(2*m))*(l*(theta.^2)); or J = (1/m)*(-y'*log(sigmoid(X*theta))-(1 - y)'* log(1-sigmoid(X*theta))) + ...
(lambda/(2*m))*(theta'*theta - theta(1,:).^2);

正则化后的线性回归和逻辑回归梯度下降

所有正则化均不包括 \(\theta_0\) 项

\(\begin{cases} \theta_0:=\theta_0 - \alpha \frac{1}{m} \sum\limits_{i=1}^m \Big( h_\theta(x^{(i)}) - y^{(i)} \Big) \cdot x_0^{(i)} \\ \\ \theta_j:=\theta_j - \alpha \Bigg[ \frac{1}{m} \sum\limits_{i=1}^m \Big( h_\theta(x^{(i)}) - y^{(i)} \Big) \cdot x_j^{(i)} + \frac{\lambda}{m} \cdot \theta_j \Bigg] \end{cases}\)

向量化表示为(A vectorized implementation is):

\(\frac{1}{m} \cdot \Big( X^T \cdot (\overrightarrow{h} - \overrightarrow{y}) \Big) + \frac{\lambda}{m} \cdot \theta^{'}\)

\(\theta^{'} = \begin{bmatrix} 0\\[0.3em]\theta_1\\[0.3em]\theta_2\\[0.3em].\\[0.3em].\\[0.3em].\\[0.3em]\theta_n \end{bmatrix}\)

代码实现:

reg_theta=theta; reg_theta(1, :) = 0;
grad = (1/m)*(X'*(sigmoid(X*theta) - y)) + (lambda/m)*reg_theta;

最终形式:对 \(\theta_j\) 的梯度下降公式进行整理变形(With some manipulation our update rule can also be represented as):

\(\begin{cases} \theta_0:=\theta_0 - \alpha \frac{1}{m} \sum\limits_{i=1}^m \Big( h_\theta(x^{(i)}) - y^{(i)} \Big) \cdot x_0^{(i)} \\ \\ \theta_j:=\theta_j (1- \alpha \frac{\lambda}{m}) - \alpha \frac{1}{m} \sum\limits_{i=1}^m \Big( h_\theta(x^{(i)}) - y^{(i)} \Big) \cdot x_j^{(i)} \end{cases}\)

对线性回归正规方程进行正则化

所有正则化均不包括 \(\theta_0\) 项

\(1 - \alpha\frac{\lambda}{m}\) will always be less than 1. Intuitively you can see it as reducing the value of \(\theta_j\) by some amount on every update. Notice that the second term is now exactly the same as it was before.

Now let's approach regularization using the alternate method of the non-iterative normal equation.

To add in regularization, the equation is the same as our original, except that we add another term inside the parentheses:

原始形态 \(\overrightarrow{\theta} = (X^TX)^{-1}X^T \overrightarrow{y}\)

正则化后 \(\overrightarrow{\theta} = (X^TX + \lambda L)^{-1}X^T \overrightarrow{y}\)

\(L = \begin{bmatrix} 0&&&&&&\\[0.3em]&1&&&&&\\[0.3em]&&1&&&&\\[0.3em]&&&·&&&\\[0.3em]&&&&·&&\\[0.3em]&&&&&·&\\[0.3em]&&&&&&1\end{bmatrix}\)

L is a matrix with 0 at the top left and 1's down the diagonal, with 0's everywhere else. It should have dimension (n+1)×(n+1).

Intuitively, this is the identity matrix (though we are not including \(x_0\))multiplied with a single real number \(\lambda\).

Recall that if m < n, then \(X^TX\) is non-invertible. However, when we add the term \(\lambda⋅L\), then \(X^TX + \lambda⋅L\) becomes invertible.

程序代码

正则化的特性已经全部添加到了其他练习代码中,如线性回归,逻辑回归,神经网络等。可在其他练习中查看到,如需非正则化,只要将Lambda=0即可。

获取源码以其他文件,可点击右上角 Fork me on GitHub 自行 Clone。

[C3] 正则化(Regularization)的更多相关文章

  1. [DeeplearningAI笔记]改善深层神经网络1.4_1.8深度学习实用层面_正则化Regularization与改善过拟合

    觉得有用的话,欢迎一起讨论相互学习~Follow Me 1.4 正则化(regularization) 如果你的神经网络出现了过拟合(训练集与验证集得到的结果方差较大),最先想到的方法就是正则化(re ...

  2. zzL1和L2正则化regularization

    最优化方法:L1和L2正则化regularization http://blog.csdn.net/pipisorry/article/details/52108040 机器学习和深度学习常用的规则化 ...

  3. 7、 正则化(Regularization)

    7.1 过拟合的问题 到现在为止,我们已经学习了几种不同的学习算法,包括线性回归和逻辑回归,它们能够有效地解决许多问题,但是当将它们应用到某些特定的机器学习应用时,会遇到过拟合(over-fittin ...

  4. 斯坦福第七课:正则化(Regularization)

    7.1  过拟合的问题 7.2  代价函数 7.3  正则化线性回归 7.4  正则化的逻辑回归模型 7.1  过拟合的问题 如果我们有非常多的特征,我们通过学习得到的假设可能能够非常好地适应训练集( ...

  5. (五)用正则化(Regularization)来解决过拟合

    1 过拟合 过拟合就是训练模型的过程中,模型过度拟合训练数据,而不能很好的泛化到测试数据集上.出现over-fitting的原因是多方面的: 1) 训练数据过少,数据量与数据噪声是成反比的,少量数据导 ...

  6. [笔记]机器学习(Machine Learning) - 03.正则化(Regularization)

    欠拟合(Underfitting)与过拟合(Overfitting) 上面两张图分别是回归问题和分类问题的欠拟合和过度拟合的例子.可以看到,如果使用直线(两组图的第一张)来拟合训,并不能很好地适应我们 ...

  7. CS229 5.用正则化(Regularization)来解决过拟合

    1 过拟合 过拟合就是训练模型的过程中,模型过度拟合训练数据,而不能很好的泛化到测试数据集上.出现over-fitting的原因是多方面的: 1) 训练数据过少,数据量与数据噪声是成反比的,少量数据导 ...

  8. 1.4 正则化 regularization

    如果你怀疑神经网络过度拟合的数据,即存在高方差的问题,那么最先想到的方法可能是正则化,另一个解决高方差的方法就是准备更多数据,但是你可能无法时时准备足够多的训练数据,或者获取更多数据的代价很高.但正则 ...

  9. 机器学习(五)--------正则化(Regularization)

    过拟合(over-fitting) 欠拟合 正好 过拟合 怎么解决 1.丢弃一些不能帮助我们正确预测的特征.可以是手工选择保留哪些特征,或者使用一 些模型选择的算法来帮忙(例如 PCA) 2.正则化. ...

随机推荐

  1. JVM-2-JVM结构

    什么是JVM        JVM是可运行Java代码的假想计算机 (或者理解为一种规范),包括一套字节码指令集.一组寄存器.一个栈.一个垃圾回收,堆 和 一个存储方法域.JVM是运行在操作系统之上的 ...

  2. ora-12505报错解决方法(转)

    用oracle数据库新建连接时遇到ora-12505,此问题解决后又出现ora-12519错误,郁闷的半天,经过一番折腾问题解决,下面小编把我的两种解决方案分享给大家,仅供参考. 解决方案一: 今天工 ...

  3. luoguP4022 [CTSC2012]熟悉的文章

    题意 显然这个\(L\)是可以二分的,我们只需要判断\(L\)是否合法即可. 显然有一个\(O(n^2)\)的DP: 设\(f_i\)表示当前匹配到\(i\)的最大匹配长度. \(f_i=max(f_ ...

  4. 推荐几个公众号Markdown格式化网站

    好多人都喜欢用 Markdown 写文 但是公众号后台编辑又不支持 Markdown 因此,催生出了一系列 Markdown 渲染格式化的工具网站 我使用了其中的一些 分享给你 1.Md2All 官方 ...

  5. 使用VMware Workstation Player虚拟机安装Linux系统

    下载安装 VMware Workstation Player 首先下载并安装 VMware Workstation Player, VMware Workstation是一款非常强大的虚拟机软件,有p ...

  6. python持久化对象

    通过shelve模块即可持久化对象 代码 import shelve import numpy as np def writeObj(name,obj): with shelve.open('obje ...

  7. import和from...import

    目录 一.import 模块名 二.from 模块名 import 具体的功能 三.import和from...import...的异同 一般使用import和from...import...导入模块 ...

  8. Regex quick reference

    近段时间,接触正则较多,常规法则如下,网摘内容,方便查阅.

  9. 手把手教你如何用 OpenCV + Python 实现人脸检测

    配好了OpenCV的Python环境,OpenCV的Python环境搭建.于是迫不及待的想体验一下opencv的人脸识别,如下文. 必备知识 Haar-like Haar-like百科释义.通俗的来讲 ...

  10. 抓包工具之fiddler实战3-接口测试

    Fiddler实现接口测试 Fiddler提供了进行接口测试的功能,找到composer界面,选择接口方法,填写接口URL地址,发送请求. 例子:全国天气预报的接口 http://v.juhe.cn/ ...