Before you read

 This is a demo or practice about how to use Simple-Linear-Regression in scikit-learn with python. Following is the package version that I use below:

The Python version: 3.6.2

The Numpy version: 1.8.0rc1

The Scikit-Learn version: 0.19.0

The Matplotlib version: 2.0.2

Training Data

Here is the training data about the Relationship between Pizza and Diameter below:

training data Diameter(inch) Price($)
1 6 7
2 8 9
3 10 13
4 14 17.5
5 18 18

Now, we can plot the figure about the diameter and price first:

import matplotlib as plt

def run_plt():
plt.figure()
plt.title('Pizza Price with diameter.')
plt.xlabel('diameter(inch)')
plt.ylabel('price($)')
plt.axis([0, 25, 0, 25])
plt.grid(True)
return plt X = [[6], [8], [10], [14], [18]]
y = [[7], [9], [13], [17.5], [18]] plt = run_plt()
plt.plot(X, y, 'k.')
plt.show()

Now we get the figure here.

Next, we use linear regression to fit this model.

from scikit.linear_model import LinearRegression

model = LinearRegression()
# X and y is the data in previous code.
model.fit(X, y)
# To predict the 12inch pizza price.
price = model.predict([12][0])
print('The 12 Pizza price: % .2f' % price)
# The 12 Pizza price: 13.68

The Simple Linear Regression define:

Simple linear regression assumes that a linear relationship exists between the response variable and explanatory variable; it models this relationship with a linear surface called a hyperplane. A hyperplane is a subspace that has one dimension less than the ambient space that contains it. In simple linear regression, there is one dimension for the response variable and another dimension for the explanatory variable, making a total of two dimensions. The regression hyperplane therefore, has one dimension; a hyperplane with one dimension is a line.

The Simple Linear Regression model that scikit-learn use is below:

\(y = \alpha + \beta * x\)

\(y\) is the predicted value of the response variable. \(x\) is the explanatory variable. \(alpha\) and \(beta\) are learned by the learning algorithm.

If we have a data \(X_{2}\) like that,

\(X_{2}\) = [[0], [10], [14], [25]]

We want to use Linear Regression to Predict the Prize Price and Print the Figure. There are two steps:

  1. Use \(x\), \(y\) previous to fit the model.
  2. Predict the Prize price.
model = LinearRegression()
# X, y is the prevoius data
model.fit(X,y) X2 = [[0], [10], [14], [25]]
y2 = model.predict(X2) plt.plot(X2, y2, 'g-')

The figure is following:

Summarize

The function previous that I used is called ordinary least squares. The process is :

  1. Define the cost function and fit the training data.
  2. Get the predict data.

Evaluating the fitness of a model with a cost function

There are serveral line created by different parmeters, and we got a question is that which one is the best-fitting regression line ?

plt = run_plt()
plt.plot(X, y, 'k.')
y3 = [14.25, 14.25, 14.25, 14.25]
y4 = y2 * 0.5 + 5 model.fit(X[1:-1], y[1:-1]) y5 = model.predict(X2) plt.plot(X2, y2, 'g-.')
plt.plot(X2, y3, 'r-.')
plt.plot(X2, y4, 'y-.')
plt.plot(X2, y5, 'o-')
plt.show()

The Define of cost function

A cost function, also called a loss function, is used to de ne and measure the

error of a model. The differences between the prices predicted by the model andthe observed prices of the pizzas in the training set are called residuals or training errors. Later, we will evaluate a model on a separate set of test data; the differences between the predicted and observed values in the test data are called prediction errors or test errors.

The figure is like that:

The original data is black point, as we can see, the green line is the best-fitting regression line. But how computer know!!!!

So we should use some mathematic method to tell the computer which one is best-fitting.

model.fit(X, y)
yr = model.predict(X) for idx, x in enumerate(X)
plt.plot([x, x], [y[idx], yr[idx]], 'r-')

Next we plot the residuals figure.

We can use residual sum of squares to measure the fitness.

\(SS_{res} = \sum _{i =1}^n(y_{i} - f(x_{i}))^{2}\)

Use Numpy package to calculate the \(SS_{res}\) value is 1.75

import numpy as np
SSres = np.mean((model.predict(X) - y)** 2)

Solving ordinary least squares for simple linear regression

Recall that simple linear regression is that:

\(y = \alpha + \beta * x\)

Our goal is to get the value of \(alpha\) and \(beta\). We will solve \(beta\) first, we should calculate the variance of \(x\) and covariance of \(x\) and \(y\).

Variance is a measure of how far a set of values is spread out. If all of the numbers in the set are equal, the variance of the set is zero.

\(var(x) = \frac{\sum_{i=1}^n(x_{i} - \overline{x})^{2}}{n-1}\)

\(\overline{x}\) is the mean of x .

var = np.var(X, ddof =1)
# var = 23.2

Convariance is a measure of how much two variales change to together. If the value of variables increase together. their convariace is positive. If one variable tends to increase while the other decreases, their convariace is negative. If their is no linear relationship between the two variables, their convariance will be equals to zero.

\(cov(x,y) = \frac{\sum_{i=1}^n(x_{i}-\overline{x})(y_{i}-\overline{y})}{n-1}\)

import numpy as np
cov = np.cov([6, 8, 10, 14, 18], [7, 9, 13, 17.5, 18])[0][1]

Their is a formula solve \(\beta\)

\(\beta = \frac{cov(x,y)}{var(x)}\)

\(\beta = \frac{22.65}{23.2} = 0.9762\)

We can solve \(\alpha\) as the following formula:

\(\alpha = \overline{y} - \beta * \overline{x}\)

\(\alpha = 12.9 - 0.9762 * 11.2 =1.9655\)

Summarize

The Regression formula is like following:

\(y = 1.9655 + 0.9762 * x\)

Linear Regression with Scikit Learn的更多相关文章

  1. (原创)(三)机器学习笔记之Scikit Learn的线性回归模型初探

    一.Scikit Learn中使用estimator三部曲 1. 构造estimator 2. 训练模型:fit 3. 利用模型进行预测:predict 二.模型评价 模型训练好后,度量模型拟合效果的 ...

  2. [Sklearn] Linear regression models to fit noisy data

    Ref: [Link] sklearn各种回归和预测[各线性模型对噪声的反应] Ref: Linear Regression 实战[循序渐进思考过程] Ref: simple linear regre ...

  3. Machine Learning #Lab1# Linear Regression

    Machine Learning Lab1 打算把Andrew Ng教授的#Machine Learning#相关的6个实验一一实现了贴出来- 预计时间长度战线会拉的比較长(毕竟JOS的7级浮屠还没搞 ...

  4. 斯坦福机器学习视频笔记 Week1 Linear Regression and Gradient Descent

    最近开始学习Coursera上的斯坦福机器学习视频,我是刚刚接触机器学习,对此比较感兴趣:准备将我的学习笔记写下来, 作为我每天学习的签到吧,也希望和各位朋友交流学习. 这一系列的博客,我会不定期的更 ...

  5. 转载 Deep learning:二(linear regression练习)

    前言 本文是多元线性回归的练习,这里练习的是最简单的二元线性回归,参考斯坦福大学的教学网http://openclassroom.stanford.edu/MainFolder/DocumentPag ...

  6. (原创)(四)机器学习笔记之Scikit Learn的Logistic回归初探

    目录 5.3 使用LogisticRegressionCV进行正则化的 Logistic Regression 参数调优 一.Scikit Learn中有关logistics回归函数的介绍 1. 交叉 ...

  7. Linear Regression with machine learning methods

    Ha, it's English time, let's spend a few minutes to learn a simple machine learning example in a sim ...

  8. 二、Linear Regression 练习(转载)

    转载链接:http://www.cnblogs.com/tornadomeet/archive/2013/03/15/2961660.html 前言 本文是多元线性回归的练习,这里练习的是最简单的二元 ...

  9. CheeseZH: Stanford University: Machine Learning Ex5:Regularized Linear Regression and Bias v.s. Variance

    源码:https://github.com/cheesezhe/Coursera-Machine-Learning-Exercise/tree/master/ex5 Introduction: In ...

随机推荐

  1. ssm中iReport报表使用json数据源过程体会

    前言:做这个一定要有耐心,因为报表本就是数据杂糅到规整的过程,这篇心得会细讲每一步操作,如果只想着一眼到位,建议close tab 在公司中遇到项目,大概是一个这样的需求,有一个列表和一个标题,需要把 ...

  2. Ditto在教学上的应用

    Ditto在教学上的应用 我喜欢iOS和macOS生态的一个原因是,你在iphone上看到一段好文字,复制一下,到macbook中粘贴一下就可以了,这体验太爽了. 大家可能相信大家都听过这样一则笑话: ...

  3. TensorFlow实现Softmax Regression识别手写数字中"TimeoutError: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败”问题

    出现问题: 在使用TensorFlow实现MNIST手写数字识别时,出现"TimeoutError: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应 ...

  4. 201621123068 Week04-面向对象设计与继承

    1. 本周学习总结 1.1 写出你认为本周学习中比较重要的知识点关键词 答:继承.多态.重载.关键字.父类与子类 1.2 尝试使用思维导图将这些关键词组织起来. 2. 书面作业 1. 面向对象设计(大 ...

  5. tornado web高级开发项目

    抽屉官网:http://dig.chouti.com/ 一.配置(settings) settings = { 'template_path': 'views', #模板文件路径 'static_pa ...

  6. openlayers调用瓦片地图分析

    网上有诸多资料介绍openlayers如何调用百度地图或者是天地图等常见互联网地图,本文作者使用的是不是常见的互联网瓦片,现将调用过程进行整理与大家分享. 首先,openlayers就不赘述了(官网: ...

  7. 第一章 创建WEB项目

    第一章   创建WEB项目 一.Eclipse创建WEB项目 方法/步骤1 首先,你要先打开Eclipse软件,打开后在工具栏依次点击[File]>>>[New]>>&g ...

  8. 20165226 2017-2018-4 《Java程序设计》第6周学习总结

    20165226 2017-2018-4 <Java程序设计>第6周学习总结 教材学习内容总结 第八章 常用实用类 string类 并置 两个常量进行并置,得到的仍是常量. public ...

  9. emqtt 试用(三)mqtt 知识

    一.概念 MQTT 协议客户端库: https://github.com/mqtt/mqtt.github.io/wiki/libraries 例如,mosquitto_sub/pub 命令行发布订阅 ...

  10. 离线Chrome插件安装文件(crx)的安装方法

    离线Chrome插件安装文件(crx)的安装方法 一.正常安装方法 1.开发谷歌浏览器,设置->扩展程序 在打开的谷歌浏览器的扩展管理器中用户可以看到一些已经安装程序的Chrome插件,或者一个 ...