Sklearn--(SVR）Regression学习笔记

今天介绍一个机器学习包，sklearn。其功能模块有regression\classification\clustering\Dimensionality reduction\data preprocessing\model selection

对我来说，常用的主要有regression（SVR）和classification（SVC）两个部分。

首先介绍一下用sklearn.svm.SVR来做回归，如下：

1）多元线性回归

import numpy as np

from sklearn.linear_model import LinearRegression

rng = np.random.RandomState(10)  # 设置随机局部种子

x = 100 * rng.rand(50, 3)  # 设置一个50行3列  所有值乘100的随机矩阵

x1 = x[:, 0]

x1.shape = 50, 1

x2 = x[:, 1]

x2.shape = 50, 1

x3 = x[:, 2]

x3.shape = 50, 1

y = 1.25 * x1 + 2.5 * x2 + 3 * x3 + 10 + rng.randn(50, 1)  # randn是标准正态分布,用于核验结果

model = LinearRegression(fit_intercept=True)

model.fit(x, y) 

a = np.linspace(0, 50, 1000)  # 从0到50创建1000个等差数列，验证模型

x1_fit = a[:, np.newaxis]  # 将a转置成列

x2_fit = a[:, np.newaxis]

x3_fit = a[:, np.newaxis]

x_fit = np.hstack((x1_fit, x2_fit, x3_fit))  # 将x1，x2，x3合并一起

y_fit = model.predict(x_fit)  # 对y预测

print("Model slope: ", model.coef_[0])

print("Model intercept:", model.intercept_)

print('方程的判定系数(R^2): %.2f' % model.score(x, y)) #计算得分，R^2

2）多项式回归

import random

import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression

from sklearn.preprocessing import PolynomialFeatures

x_data, y_data = [], []

# 随机生成30个点

for x in range(-10, 20):

    y = -  x ** 2 + 5 * x - 10 + random.random() * 20

    x_data.append([x])

    y_data.append([y])

# 特征构造

poly_reg = PolynomialFeatures(degree=2)  #多项式构造

x_poly = poly_reg.fit_transform(x_data)

# 创建线性模型

linear_reg = LinearRegression()

linear_reg.fit(x_poly, y_data)

plt.plot(x_data, y_data, 'b.')

# 用特征构造数据进行预测

plt.plot(x_data, linear_reg.predict(poly_reg.fit_transform(x_data)), 'r')

plt.show()

3）非线性回归（一元为例）

from sklearn.svm import SVR

from sklearn.model_selection import GridSearchCV #自动选择最佳模型 from sklearn.tree import DecisionTreeRegressor #决策树

from sklearn.ensemble import RandomForestRegressor #随机森林

import numpy as np

import matplotlib.pyplot as plt

x = np.array([68.67,54.351,92.991,80.39,64.46]).reshape(-1, 1)  #reshape为(-1,1),里面是[[1],[2]...]

y = np.array([68.67,54.351,92.991,80.39,64.46]).reshape(-1, 1)

# 选择模型

#model = SVR(kernel='rbf')

# model = DecisionTreeRegressor()

# model = RandomForestRegressor()

model = GridSearchCV(SVR(), param_grid={"kernel": ("linear", 'rbf', 'sigmoid'), "C": np.logspace(-3, 3, 7), "gamma": np.logspace(-3, 3, 7)})

model.fit(x, y)

xneed = np.arrray([[1.2],[3.6]])

y_pre = model.predict(xneed)# 进行预测

plt.scatter(x, y, c='k', label='data', zorder=1)

plt.plot(xneed, y_pre, c='r', label='SVR_fit')

plt.show()

print(model.best_params_)

补充：

1.如果要划分训练样本和测试样本数据集。

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=.3) #选取0.3的测试集

2.为了增强数据之间相关性，通常对数据进行预处理，如标准化。

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
x_std = scaler.fit_transform(x) # 标准化

3.可以用GridSearchCV自动选择最佳模型

from sklearn.model_selection import GridSearchCV

grid = GridSearchCV(svc, param_grid, cv=3, n_jobs=-1)

4.模型保存

from sklearn.externals import joblib #用于保存和读取模型pkl

joblib.dump(model, 'svr.pkl') # 保存模型

svr = joblib.load('svr.pkl') # 读取模型

过两天补充一下sklearn.svm.SVC...

Sklearn--(SVR）Regression学习笔记的更多相关文章

Logistic Regression学习笔记
1.李航<统计学习方法>: 2.https://blog.csdn.net/laobai1015/article/details/78113214 3.http://www.cnblogs ...
Stepwise regression 学习笔记
之前在 SPSS 中的回归分析算法中发现,在它里面实现的算法有 Enter 和 Stepwise 两种.Enter 很容易理解,就是将所有选定的自变量一起放入模型中,直接去计算包含所有自变量的整个模型 ...
Hands on Machine Learning with Sklearn and TensorFlow学习笔记——机器学习概览
一.什么是机器学习? 计算机程序利用经验E(训练数据)学习任务T(要做什么,即目标),性能是P(性能指标),如果针对任务T的性能P随着经验E不断增长,成为机器学习.[这是汤姆米切尔在1997年定义] ...
学习Logistic Regression的笔记与理解(转)
学习Logistic Regression的笔记与理解 1.首先从结果往前来看下how logistic regression make predictions. 设我们某个测试数据为X(x0,x1, ...
ufldl学习笔记和编程作业：Softmax Regression（softmax回报）
ufldl学习笔记与编程作业:Softmax Regression(softmax回归) ufldl出了新教程.感觉比之前的好,从基础讲起.系统清晰,又有编程实践. 在deep learning高质量 ...
[Machine Learning]学习笔记-Logistic Regression
[Machine Learning]学习笔记-Logistic Regression 模型-二分类任务 Logistic regression,亦称logtic regression,翻译为" ...
[ML学习笔记] 回归分析（Regression Analysis）
[ML学习笔记] 回归分析(Regression Analysis) 回归分析:在一系列已知自变量与因变量之间相关关系的基础上,建立变量之间的回归方程,把回归方程作为算法模型,实现对新自变量得出因变量 ...
ufldl学习笔记与编程作业：Softmax Regression(vectorization加速)
ufldl学习笔记与编程作业:Softmax Regression(vectorization加速) ufldl出了新教程,感觉比之前的好.从基础讲起.系统清晰,又有编程实践. 在deep learn ...
ufldl学习笔记与编程作业：Logistic Regression（逻辑回归）
ufldl学习笔记与编程作业:Logistic Regression(逻辑回归) ufldl出了新教程,感觉比之前的好,从基础讲起.系统清晰,又有编程实践. 在deep learning高质量群里面听 ...

随机推荐

Spring AOP 基于AspectJ
简介 AspectJ是一个基于Java语言的AOP框架,Spring2.0以后新增了对AspectJ切点表达式支持.因为Spring1.0的时候Aspectj还未出现; AspectJ1.5中新增了对 ...
3 分钟带你深入了解 Cookie、Session、Token
经常会有用户咨询,CDN 是否会传递 Cookie 信息,是否会对源站 Session 有影响,Token 的防盗链配置为什么总是配置失败?为此,我们就针对 Cookie.Session 和 Toke ...
MapInfo常见数据格式
在MapInfo 中所指的表是单纯的数据表或是图形与数据的结合.一个典型的MapInfo表将主要由*.tab.*.dat.*.wks.*.dbf.*.xls.*.map.*.id.*.ind文件格式组 ...
mysql索引创建和使用细节
最近困扰自己很久的膝盖积液手术终于做完,在家养伤,逛技术博客看到easyswoole开发组成员仙士可博客有关mysql索引方面的知识,自己打算重温下. 正常业务起步数据表数据量较少,不用考虑使用索引, ...
枚举 + exgcd
题意:已知xi=(a*xi-1+b) mod 10001,且告诉你x1,x3.........x2*t-1,让你求出其偶数列思路分析 : 题目所要求的的是对 10001 取余,由模运算的性质可知,a ...
JMeter——分布式压测
一.Jmeter4.0分布式压测准备工作压测注意事项 the firewalls on the systems are turned off or correct ports ...
关于neo4j初入门（2）
DELETE删除删除节点及相关节点和关系. DELETE <node-name-list> DELETE <node1-name>,<node2-name>,&l ...
C语言进阶——结构体，联合，枚举
----------------------------------------------------------我是一条划分线----------------------------------- ...
20191211 HNOI2017模拟赛 C题
题目: 分析: 开始觉得是神仙题... 然后发现n最多有2个质因子这说明sm呢... 学过物理的小朋友们知道,当一个物体受多个不同方向相同的力时,只有相邻力的夹角相等,受力就会平衡于是拆扇叶相当于 ...
CSS-12-盒子模型
<!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title> ...

Sklearn--(SVR）Regression学习笔记

Sklearn--(SVR）Regression学习笔记的更多相关文章

随机推荐

热门专题