线性回归

Linear Regression

MOOC机器学习课程学习笔记

1 单变量线性回归Linear Regression with One Variable

1.1 模型表达Model Representation

一个实际问题，我们可以对其进行数据建模。在机器学习中模型函数一般称为hypothsis。这里假设h为：

我们从简单的单变量线性回归模型开始学习。

1.2 代价函数Cost Function

代价函数也有很多种，下面的是平方误差Squared error function：

其中，m为训练集的个数。该代价函数在线性回归模型中很常用。其目标最小化假设函数hypothesis计算出来的目标值和训练集的实际值的误差值。

下面我们对假设函数进行简化，假设只有一个参数，通过假设函数和代价函数的公式，我们可以画出下面的图。注意，假设函数是以为变量，而代价函数是以为变量。

如果我们还是保持两个变量，那么代价函数画出来的图应该就是一个三维图形，高度就是值

通常我们会用contour plots或称为contour figures来表示。如下图

Contour figures中的那三个点的值是一样的，最中心的那个点的值最小，有些等高线的意思。

1.3 梯度下降Gradient Descent

下面将学习梯度下降这种算法来学习假设函数的参数。这里需要注意的是，梯度下降更新参数值是正确的应该是所有参数同时更新。

我们将参数简化到一个可以看到，梯度下降公式根据目前点沿切线方向以步长（也就是学习速率）下降，其实就是将参数朝最小值方向变化。

这里学习速率对其结果有比较大影响，若太小，则下降的速率很慢，要进行很多步才能到达最小值；若太大，有可能会产生震荡，无法收敛。

下面我们来对线性回归使用梯度下降算法，回忆下线性回归的假设函数与代价函数：

根据梯度下降的公式，对代价函数求偏导，可以算出线性回归中参数更新的公式：

在线性回归中，损失函数是一个凸函数（convex function）所以不存在局部最优点，一定能算出全局最优点。而且在这里我们每次对参数更新，都是对所有训练数据集求和，这种梯度下降方法叫做批量梯度下降（Batch Gradient Descent），当然也有其他的方法。

2 多元线性回归 Linear Regression with Multiple Variables

2.1 多特征Multiple Features

在现实问题中，我们变量往往不止一个，我们将单变量的线性回归推广到多变量。首先来看看我们的模型表达，也就是假设函数。假设我们有4个变量，那么我们定义个记号如下：

再推广到n个变量，我们的假设函数公式为：。用向量表示的话：

2.2 多元梯度下降 Gradient Descent for Multiple Variables

对多元的假设函数求偏导，可以得出多元参数梯度下降的更新公式：

当特征是多元的时候，有可能其中某些特征和另一些特征都不在一个数量级上，比如一个特征的范围在[0,1]而另一个特征的范围在[1000,2000]那么这样直接使用梯度下降会导致收敛速度十分慢。

对此，我们可以使用特征缩放（Feature Scaling）技术来加快梯度下降的收敛速度。其中一种比较常用的方法是均值标准化（Mean normalization）

其中是训练集中特征的均值，可以是max-min也可以是该类特征的标准差。其中对特征进行放缩并不需要十分精确，只要在相似的范围就可以了，它只是为了使梯度下降收敛更快。

在梯度下降中还有一个十分重要的超参数就是学习速率，它不仅会影响到收敛速度，而且可能会到时梯度下降无法收敛。那么如何选择学习速率对于我们来说十分重要。通常我们应该在调试时，绘制出代价函数随迭代次数的变化图。

如果这个代价函数每一步并没有下降，反而上升的话，我们都应该去选择更小的学习速率。如果学习速率太小的话，收敛速率会很慢。在挑选学习速率时，经验是按照3倍的增长通过绘制不同的代价函数图，来寻找一个合适的学习速率。

2.3 特征和多项式回归Features and Polynomial Regression

由于问题的复杂性，很多时候我们不可能只有一条直线去拟合就能得到很好的效果。而且不同的特征对于模型会有不同的效果。对于特征选择以后的教程会讲到，这里只是告诉我们可以通过深入的研究，对不同的特征和函数图像的理解，去选择不同的模型来进行拟合。

2.4 标准方程法 Normal Equation

在求最小化代价函数的参数时，除了用梯度下降法，其实还有其他不少方法，这里介绍通过标注方程Normal Equation，不用迭代直接求出参数。

标准方程：

和梯度下降法相比较，标准方程法不用去选择学习速率，而且不用迭代，但是需要计算特征矩阵的拟，如果特征数很大的话，那么标准方程法计算就十分慢了。所以我们可以根据实际问题特征数量n的大小来选择使用梯度下降还是标准方程方法。

在线性回归中很少会出现不可逆的情况，但是也是会出现的，一般是下面的情况导致不可逆。

我们在使用matlab函数编程时，可以使用pinv函数来求其拟，pinv与inv函数的主要区别在于pinv是伪求逆函数，即使其拟不存在，也可以求解。

练习部分代码

1 特征缩放代码

 function [X_norm, mu, sigma] = featureNormalize(X)

 %FEATURENORMALIZE Normalizes the features in X

 %   FEATURENORMALIZE(X) returns a normalized version of X where

 %   the mean value of each feature is  and the standard deviation

 %   is . This is often a good preprocessing step to do when

 %   working with learning algorithms.

 % You need to set these values correctly

 X_norm = X;

 mu = zeros(, size(X, ));

 sigma = zeros(, size(X, ));

 num_fea=size(X,);

 % ====================== YOUR CODE HERE ======================

 % Instructions: First, for each feature dimension, compute the mean

 %               of the feature and subtract it from the dataset,

 %               storing the mean value in mu. Next, compute the

 %               standard deviation of each feature and divide

 %               each feature by it's standard deviation, storing

 %               the standard deviation in sigma.

 %

 %               Note that X is a matrix where each column is a

 %               feature and each row is an example. You need

 %               to perform the normalization separately for

 %               each feature.

 %

 % Hint: You might find the 'mean' and 'std' functions useful.

 %

 for i=:num_fea

     mu(i)=mean(X(:,i));

     sigma(i)=std(X(:,i));

     X_norm(:,i)=(X(:,i)-mu(i))./sigma(i);

 end

 % ============================================================

 end

2 计算代价函数

 function J = computeCostMulti(X, y, theta)

 %COMPUTECOSTMULTI Compute cost for linear regression with multiple variables

 %   J = COMPUTECOSTMULTI(X, y, theta) computes the cost of using theta as the

 %   parameter for linear regression to fit the data points in X and y

 % Initialize some useful values

 m = length(y); % number of training examples

 % You need to return the following variables correctly

 J = ;

 % ====================== YOUR CODE HERE ======================

 % Instructions: Compute the cost of a particular choice of theta

 %               You should set J to the cost.

 temp=X*theta-y;

 J=/(*m)*temp'*temp;

 % =========================================================================

 end

3 梯度下降
注意向量化的表达

 function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)

 %GRADIENTDESCENTMULTI Performs gradient descent to learn theta

 %   theta = GRADIENTDESCENTMULTI(x, y, theta, alpha, num_iters) updates theta by

 %   taking num_iters gradient steps with learning rate alpha

 % Initialize some useful values

 m = length(y); % number of training examples

 J_history = zeros(num_iters, );

 for iter = :num_iters

     % ====================== YOUR CODE HERE ======================

     % Instructions: Perform a single gradient step on the parameter vector

     %               theta.

     %

     % Hint: While debugging, it can be useful to print out the values

     %       of the cost function (computeCostMulti) and gradient here.

     %

     h_error=X*theta-y;

     error=(alpha/m).*(h_error'*X);

     theta=theta-error';

     % ============================================================

     % Save the cost J in every iteration

     J_history(iter) = computeCostMulti(X, y, theta);

 end

 end

4 主函数

需要注意的是如果在训练时进行了特征缩放，那么在测试时也一定要记得进行同样的特征缩放。

%% Machine Learning Online Class

%  Exercise : Linear regression with multiple variables

%

%  Instructions

%  ------------

%

%  This file contains code that helps you get started on the

%  linear regression exercise.

%

%  You will need to complete the following functions in this

%  exericse:

%

%     warmUpExercise.m

%     plotData.m

%     gradientDescent.m

%     computeCost.m

%     gradientDescentMulti.m

%     computeCostMulti.m

%     featureNormalize.m

%     normalEqn.m

%

%  For this part of the exercise, you will need to change some

%  parts of the code below for various experiments (e.g., changing

%  learning rates).

%

%% Initialization

%% ================ Part : Feature Normalization ================

%% Clear and Close Figures

clear ; close all; clc

fprintf('Loading data ...\n');

%% Load Data

data = load('ex1data2.txt');

X = data(:, :);

y = data(:, );

m = length(y);

% Print out some data points

fprintf('First 10 examples from the dataset: \n');

fprintf(' x = [%.0f %.0f], y = %.0f \n', [X(:,:) y(:,:)]');

fprintf('Program paused. Press enter to continue.\n');

pause;

% Scale features and set them to zero mean

fprintf('Normalizing Features ...\n');

[X mu sigma] = featureNormalize(X);

% Add intercept term to X

X = [ones(m, ) X];

%% ================ Part : Gradient Descent ================

% ====================== YOUR CODE HERE ======================

% Instructions: We have provided you with the following starter

%               code that runs gradient descent with a particular

%               learning rate (alpha).

%

%               Your task is to first make sure that your functions -

%               computeCost and gradientDescent already work with

%               this starter code and support multiple variables.

%

%               After that, try running gradient descent with

%               different values of alpha and see which one gives

%               you the best result.

%

%               Finally, you should complete the code at the end

%               to predict the price of a  sq-ft,  br house.

%

% Hint: By using the 'hold on' command, you can plot multiple

%       graphs on the same figure.

%

% Hint: At prediction, make sure you do the same feature normalization.

%

fprintf('Running gradient descent ...\n');

% Choose some alpha value

alpha = 0.01;

num_iters = ;

% Init Theta and Run Gradient Descent

theta = zeros(, );

[theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters);

% Plot the convergence graph

figure;

plot(:numel(J_history), J_history, '-b', 'LineWidth', );

xlabel('Number of iterations');

ylabel('Cost J');

hold on;

% Display gradient descent's result

fprintf('Theta computed from gradient descent: \n');

fprintf(' %f \n', theta);

fprintf('\n');

% Estimate the price of a  sq-ft,  br house

% ====================== YOUR CODE HERE ======================

% Recall that the first column of X is all-ones. Thus, it does

% not need to be normalized.

x=[  ];

x()=(x()-mu())/sigma();

x()=(x()-mu())/sigma();

price = x*theta; %这里要注意，因为梯度下降使用了特征缩放，这里测试时也一定记得要做同样的特征缩放。

% ============================================================

fprintf(['Predicted price of a 1650 sq-ft, 3 br house ' ...

         '(using gradient descent):\n $%f\n'], price);

fprintf('Program paused. Press enter to continue.\n');

pause;

%% ================ Part : Normal Equations ================

fprintf('Solving with normal equations...\n');

% ====================== YOUR CODE HERE ======================

% Instructions: The following code computes the closed form

%               solution for linear regression using the normal

%               equations. You should complete the code in

%               normalEqn.m

%

%               After doing so, you should complete this code

%               to predict the price of a  sq-ft,  br house.

%

%% Load Data

data = csvread('ex1data2.txt');

X = data(:, :);

y = data(:, );

m = length(y);

% Add intercept term to X

X = [ones(m, ) X];

% Calculate the parameters from the normal equation

theta = normalEqn(X, y);

% Display normal equation's result

fprintf('Theta computed from the normal equations: \n');

fprintf(' %f \n', theta);

fprintf('\n');

% Estimate the price of a  sq-ft,  br house

% ====================== YOUR CODE HERE ======================

price = [  ]*theta; % You should change this

% ============================================================

fprintf(['Predicted price of a 1650 sq-ft, 3 br house ' ...

         '(using normal equations):\n $%f\n'], price);

ML 线性回归Linear Regression的更多相关文章

Stanford机器学习---第二讲. 多变量线性回归 Linear Regression with multiple variable
原文:http://blog.csdn.net/abcjennifer/article/details/7700772 本栏目(Machine learning)包括单参数的线性回归.多参数的线性回归 ...
机器学习（三）--------多变量线性回归(Linear Regression with Multiple Variables)
机器学习(三)--------多变量线性回归(Linear Regression with Multiple Variables) 同样是预测房价问题如果有多个特征值那么这种情况下假设h表示 ...
Ng第二课：单变量线性回归(Linear Regression with One Variable)
二.单变量线性回归(Linear Regression with One Variable) 2.1 模型表示 2.2 代价函数 2.3 代价函数的直观理解 2.4 梯度下降 2.5 梯度下 ...
斯坦福第二课：单变量线性回归(Linear Regression with One Variable)
二.单变量线性回归(Linear Regression with One Variable) 2.1 模型表示 2.2 代价函数 2.3 代价函数的直观理解 I 2.4 代价函数的直观理解 I ...
机器学习方法：回归（一）：线性回归Linear regression
欢迎转载,转载请注明:本文出自Bin的专栏blog.csdn.net/xbinworld. 开一个机器学习方法科普系列:做基础回顾之用,学而时习之:也拿出来与大家分享.数学水平有限,只求易懂,学习与工 ...
斯坦福CS229机器学习课程笔记 Part1：线性回归 Linear Regression
机器学习三要素机器学习的三要素为:模型.策略.算法. 模型:就是所要学习的条件概率分布或决策函数.线性回归模型策略:按照什么样的准则学习或选择最优的模型.最小化均方误差,即所谓的 least-sq ...
机器学习 (一) 单变量线性回归 Linear Regression with One Variable
文章内容均来自斯坦福大学的Andrew Ng教授讲解的Machine Learning课程,本文是针对该课程的个人学习笔记,如有疏漏,请以原课程所讲述内容为准.感谢博主Rachel Zhang的个人笔 ...
机器学习 (二) 多变量线性回归 Linear Regression with Multiple Variables
文章内容均来自斯坦福大学的Andrew Ng教授讲解的Machine Learning课程,本文是针对该课程的个人学习笔记,如有疏漏,请以原课程所讲述内容为准.感谢博主Rachel Zhang 的个人 ...
TensorFlow 学习笔记(1)----线性回归(linear regression)的TensorFlow实现
此系列将会每日持续更新,欢迎关注线性回归(linear regression)的TensorFlow实现 #这里是基于python 3.7版本的TensorFlow TensorFlow是一个机器学 ...

随机推荐

azure iothub create-device-identity样例报错： unable to find valid certification path ，及iothub-explorer Error: CERT_UNTRUSTED
https://docs.microsoft.com/zh-cn/azure/iot-hub/iot-hub-java-java-getstarted 在IDEA中执行上述的代码,会出现下面的报错信息 ...
1052 最大M子段和(DP)
1052 最大M子段和基准时间限制:2 秒空间限制:131072 KB 分值: 80 难度:5级算法题 N个整数组成的序列a[1],a[2],a[3],…,a[n],将这N个数划分为互不相交的M个 ...
关于vue，angularjs1,react之间的对比
1.时间投入的问题:相对于react和angularjs,学习vue的时间成本低,而且容易上手. 2.JSX的可读性比较一般.代码的可读性不如vue,当然,vue也支持jsx,但是vue更提倡temp ...
SpringBoot整合Dubbo报错： java.lang.ClassCastException
com.alibaba.dubbo.rpc.RpcException: Failed to invoke remote proxy method queryGoodsLimitPage to regi ...
【python】-- Django ModelForm
Django ModelForm Django的ModelForm的验证方式相比较form + Model的验证方式有下列区别: ModelForm没有form + Model的低耦合性 ModelF ...
原生JavaScript写AJAX
前端JavaScript: function ajaxGet(url, obj) { var request; if(window.XMLHttpRequest) { request = new XM ...
使用QFileInfo类获取文件信息（在NTFS文件系统上，出于性能考虑，文件的所有权和权限检查在默认情况下是被禁用的，通过qt_ntfs_permission_lookup开启和操作。absolutePath()必须查询文件系统。而path()函数，可以直接作用于文件名本身，所以，path() 函数的运行会更快）
版权声明:本文为博主原创文章,未经博主允许不得转载. https://blog.csdn.net/Amnes1a/article/details/65444966QFileInfo类为我们提供了系统无 ...
解决Raize日历控件显示的问题
解决Raize日历控件显示的问题近自己的程序被测试人员发现一个小问题,就是程序中的日历选择框,显示中的“星期一.星期二....”都显示成了“星.....”,我自己看了代码,原来是raize的控件问题 ...
MySQL中锁问题
1.脏读脏页只是在缓冲池中已经修改的页但是没有刷新到磁盘中,即数据库实例内存中的页和磁盘中的页事不一致的,当然在刷新到磁盘之前,日志都已经被写入到了重做日志文件中,而所谓的脏数据是指事务对缓冲池中行 ...
Android系统移植与调试之------->如何修改Android设备状态条上音量加减键在横竖屏切换的时候的显示于隐藏
这两天由于一个客户的要求,将MID竖屏时候的状态条上的音量键去掉.所以尝试修改了一下,成功了,分享一下经验. 先看一下修改后的效果图,如下所示 . 横屏的时候:有音量加减键竖屏的时候:音量加减键被去 ...

ML 线性回归Linear Regression