scikit-learn 学习笔记-- Generalized Linear Models （三）

Bayesian regression

前面介绍的线性模型都是从最小二乘，均方误差的角度去建立的，从最简单的最小二乘到带正则项的 lasso，ridge 等。而 Bayesian regression 是从 Bayesian 概率模型的角度出发的，虽然最后也会转换成一个能量函数的形式。

从前面的线性模型中，我们都假设如下的关系：

y=wx" role="presentation">y=wxy=wx

上面这个关系式其实是直接从值的角度来考虑，其实我们也可以假设如下的关系：

y=wx+ϵ" role="presentation">y=wx+ϵy=wx+ϵ

这个 ϵ" role="presentation" style="position: relative;">ϵϵ 表示一种误差，或者噪声，如果估计的值非常准确，那么 ϵ=0" role="presentation" style="position: relative;">ϵ=0ϵ=0, 否则，这将是一个随机数。

如果我们有一组训练样本，那么每个观察值 y" role="presentation" style="position: relative;">yy 都会有个对应的 ϵ" role="presentation" style="position: relative;">ϵϵ, 而且我们假设 ϵ" role="presentation" style="position: relative;">ϵϵ 是满足独立同分布的。那么我们可以用概率的形式表示为：

对于一组训练集，我们可以表示为：

最后，利用最大似然估计，可以将上面的表达式转化为一个能量最小的形式。上面是从最大似然估计的角度去求系数。

下面我们考虑从最大后验概率的角度，

p(w|y)=p(y|w)p(w|α)p(α)" role="presentation">p(w|y)=p(y|w)p(w|α)p(α)p(w|y)=p(y|w)p(w|α)p(α)

p(w|α)=N(w|0,α−1I)" role="presentation">p(w|α)=N(w|0,α−1I)p(w|α)=N(w|0,α−1I)

p(α)" role="presentation" style="position: relative;">p(α)p(α) 本身是服从 gamma 分布的。

sklearn 上也给出了一个例子：

import numpy as np

import matplotlib.pyplot as plt

from scipy import stats

from sklearn.linear_model import BayesianRidge, LinearRegression

# #############################################################################

# Generating simulated data with Gaussian weights

np.random.seed(0)

n_samples, n_features = 100, 100

X = np.random.randn(n_samples, n_features)  # Create Gaussian data

# Create weights with a precision lambda_ of 4.

lambda_ = 4.

w = np.zeros(n_features)

# Only keep 10 weights of interest

relevant_features = np.random.randint(0, n_features, 10)

for i in relevant_features:

    w[i] = stats.norm.rvs(loc=0, scale=1. / np.sqrt(lambda_))

# Create noise with a precision alpha of 50.

alpha_ = 50.

noise = stats.norm.rvs(loc=0, scale=1. / np.sqrt(alpha_), size=n_samples)

# Create the target

y = np.dot(X, w) + noise

# #############################################################################

# Fit the Bayesian Ridge Regression and an OLS for comparison

clf = BayesianRidge(compute_score=True)

clf.fit(X, y)

ols = LinearRegression()

ols.fit(X, y)

# #############################################################################

# Plot true weights, estimated weights, histogram of the weights, and

# predictions with standard deviations

lw = 2

plt.figure(figsize=(6, 5))

plt.title("Weights of the model")

plt.plot(clf.coef_, color='lightgreen', linewidth=lw,

         label="Bayesian Ridge estimate")

plt.plot(w, color='gold', linewidth=lw, label="Ground truth")

plt.plot(ols.coef_, color='navy', linestyle='--', label="OLS estimate")

plt.xlabel("Features")

plt.ylabel("Values of the weights")

plt.legend(loc="best", prop=dict(size=12))

plt.figure(figsize=(6, 5))

plt.title("Histogram of the weights")

plt.hist(clf.coef_, bins=n_features, color='gold', log=True,

         edgecolor='black')

plt.scatter(clf.coef_[relevant_features], 5 * np.ones(len(relevant_features)),

            color='navy', label="Relevant features")

plt.ylabel("Features")

plt.xlabel("Values of the weights")

plt.legend(loc="upper left")

plt.figure(figsize=(6, 5))

plt.title("Marginal log-likelihood")

plt.plot(clf.scores_, color='navy', linewidth=lw)

plt.ylabel("Score")

plt.xlabel("Iterations")

# Plotting some predictions for polynomial regression

def f(x, noise_amount):

    y = np.sqrt(x) * np.sin(x)

    noise = np.random.normal(0, 1, len(x))

    return y + noise_amount * noise

degree = 10

X = np.linspace(0, 10, 100)

y = f(X, noise_amount=0.1)

clf_poly = BayesianRidge()

clf_poly.fit(np.vander(X, degree), y)

X_plot = np.linspace(0, 11, 25)

y_plot = f(X_plot, noise_amount=0)

y_mean, y_std = clf_poly.predict(np.vander(X_plot, degree), return_std=True)

plt.figure(figsize=(6, 5))

plt.errorbar(X_plot, y_mean, y_std, color='navy',

             label="Polynomial Bayesian Ridge Regression", linewidth=lw)

plt.plot(X_plot, y_plot, color='gold', linewidth=lw,

         label="Ground Truth")

plt.ylabel("Output y")

plt.xlabel("Feature X")

plt.legend(loc="lower left")

plt.show()

scikit-learn 学习笔记-- Generalized Linear Models （三）的更多相关文章

scikit-learn 学习笔记-- Generalized Linear Models (一)
scikit-learn 是非常优秀的一个有关机器学习的 Python Lib,包含了除深度学习之外的传统机器学习的绝大多数算法,对于了解传统机器学习是一个很不错的平台.每个算法都有相应的例子,既可以 ...
scikit-learn 学习笔记-- Generalized Linear Models （二）
Lasso regression 今天介绍另外一种带正则项的线性回归, ridge regression 的正则项是二范数,还有另外一种是一范数的,也就是lasso 回归,lasso 回归的正则项是系 ...
Andrew Ng机器学习公开课笔记 -- Generalized Linear Models
网易公开课,第4课 notes,http://cs229.stanford.edu/notes/cs229-notes1.pdf 前面介绍一个线性回归问题,符合高斯分布一个分类问题,logstic回 ...
机器学习-scikit learn学习笔记
scikit-learn官网:http://scikit-learn.org/stable/ 通常情况下,一个学习问题会包含一组学习样本数据,计算机通过对样本数据的学习,尝试对未知数据进行预测. 学习 ...
[Scikit-learn] 1.1 Generalized Linear Models - from Linear Regression to L1&L2
Introduction 一.Scikit-learning 广义线性模型 From: http://sklearn.lzjqsdd.com/modules/linear_model.html#ord ...
[Scikit-learn] 1.5 Generalized Linear Models - SGD for Regression
梯度下降一.亲手实现“梯度下降” 以下内容其实就是<手动实现简单的梯度下降>. 神经网络的实践笔记,主要包括: Logistic分类函数反向传播相关内容 Link: http://pe ...
[Scikit-learn] 1.5 Generalized Linear Models - SGD for Classification
NB: 因为softmax,NN看上去是分类,其实是拟合(回归),拟合最大似然. 多分类参见:[Scikit-learn] 1.1 Generalized Linear Models - Logist ...
[Scikit-learn] 1.1 Generalized Linear Models - Logistic regression & Softmax
二分类:Logistic regression 多分类:Softmax分类函数对于损失函数,我们求其最小值, 对于似然函数,我们求其最大值. Logistic是loss function,即: 在逻 ...
广义线性模型（Generalized Linear Models）
前面的文章已经介绍了一个回归和一个分类的例子.在逻辑回归模型中我们假设: 在分类问题中我们假设: 他们都是广义线性模型中的一个例子,在理解广义线性模型之前需要先理解指数分布族. 指数分布族(The E ...

随机推荐

mysql查询表和字段的注释
1,新建表以及添加表和字段的注释. create table t_user( ID INT(19) primary key auto_increment comment '主键', ...
Java基础教程：网络编程
Java基础教程:网络编程基础 Socket与ServerSocket Socket又称"套接字",网络上的两个程序通过一个双向的通信连接实现数据的交换,这个连接的一端称为一个s ...
java 中list进行动态remove处理
java中遍历 list遇到需要动态删除arraylist中的一些元素的情况错误的方式 for(int i = 0, len = list.size(); i < len; i++){ if ...
poj1496 Word Index / poj1850 Code（组合数学）
poj1850 Code 题意:输出若干个给定的字符串($length<=10$)在字典序中的位置,字符串中的字母必须严格递增. 读取到非法字符串时,输出“0”,终止程序.(poj1496:继续 ...
P1136 迎接仪式
P1136 迎接仪式 $O(n^{2}k)$:$f[i][k]$表示到第$i$个字符为止,交换$k$次,得到的最多子串数那么枚举位置$j$,状态可以从$f[j][k-1]+1$转移过来 $O(nk^ ...
# 20145106 《Java程序设计》第2周学习总结
教材学习内容总结感觉这本书的第三章和c语言有很多共同之处.因为以前学过c所以看起来并没有十分费劲.虽然以前学习过c, 但是还是忘记了long整数占8个字节这件事情.另外我第一次接触布尔这个概念.了解 ...
20145326 《Java程序设计》第6周学习总结
20145326 <Java程序设计>第6周学习总结教材学习内容总结第十章一.使用InputStream与OutputStream 1.串流设计的概念想活用输入/输出API,一定要 ...
采用注解方式实现security
采用注解方式使用security,首先我们需要用注解方式实现Spring MVC,新建一个Maven项目本项目目录结构如下: 我们会发现在WEB-INF中没有web.xml文件,下面会介绍,采用j ...
System.DateTimeKind 的用法
最近在使用SQLite的数据库,发现SQLiteConnection类,有一个属性DateTimeKind 去msdn上找了下资料,http://msdn.microsoft.com/en-us/li ...
Asynchronous Programming Using Delegates使用委托进行异步编程
http://msdn.microsoft.com/zh-cn/library/22t547yb(v=vs.110).aspx https://github.com/chucklu/Test/tree ...

scikit-learn 学习笔记-- Generalized Linear Models （三）

Bayesian regression

scikit-learn 学习笔记-- Generalized Linear Models （三）的更多相关文章

随机推荐

热门专题