This semester I'm teaching from Hastie, Tibshirani, and Friedman's book, The Elements of Statistical Learning, 2nd Edition. The authors provide aMixture Simulation data set that has two continuous predictors and a binary outcome. This data is used to demonstrate classification procedures by plotting classification boundaries in the two predictors. For example, the figure below is a reproduction of Figure 2.5 in the book:

The solid line represents the Bayes decision boundary (i.e., {x: Pr("orange"|x) = 0.5}), which is computed from the model used to simulate these data. The Bayes decision boundary and other boundaries are determined by one or more surfaces (e.g., Pr("orange"|x)), which are generally omitted from the graphics. In class, we decided to use the R package rgl to create a 3D representation of this surface. Below is the code and graphic (well, a 2D projection) associated with the Bayes decision boundary:

library(rgl)
load(url("http://statweb.stanford.edu/~tibs/ElemStatLearn/datasets/ESL.mixture.rda"))
dat <- ESL.mixture ## create 3D graphic, rotate to view 2D x1/x2 projection
par3d(FOV=1,userMatrix=diag(4))
plot3d(dat$xnew[,1], dat$xnew[,2], dat$prob, type="n",
xlab="x1", ylab="x2", zlab="",
axes=FALSE, box=TRUE, aspect=1) ## plot points and bounding box
x1r <- range(dat$px1)
x2r <- range(dat$px2)
pts <- plot3d(dat$x[,1], dat$x[,2], 1,
type="p", radius=0.5, add=TRUE,
col=ifelse(dat$y, "orange", "blue"))
lns <- lines3d(x1r[c(1,2,2,1,1)], x2r[c(1,1,2,2,1)], 1) ## draw Bayes (True) decision boundary; provided by authors
dat$probm <- with(dat, matrix(prob, length(px1), length(px2)))
dat$cls <- with(dat, contourLines(px1, px2, probm, levels=0.5))
pls <- lapply(dat$cls, function(p) lines3d(p$x, p$y, z=1)) ## plot marginal (w.r.t mixture) probability surface and decision plane
sfc <- surface3d(dat$px1, dat$px2, dat$prob, alpha=1.0,
color="gray", specular="gray")
qds <- quads3d(x1r[c(1,2,2,1)], x2r[c(1,1,2,2)], 0.5, alpha=0.4,
color="gray", lit=FALSE)

In the above graphic, the probability surface is represented in gray, and the Bayes decision boundary occurs where the plane f(x) = 0.5 (in light gray) intersects with the probability surface.

Of course, the classification task is to estimate a decision boundary given the data. Chapter 5 presents two multidimensional splines approaches, in conjunction with binary logistic regression, to estimate a decision boundary. The upper panel of Figure 5.11 in the book shows the decision boundary associated with additive natural cubic splines in x1 and x2 (4 df in each direction; 1+(4-1)+(4-1) = 7 parameters), and the lower panel shows the corresponding tensor product splines (4x4 = 16 parameters), which are much more flexible, of course. The code and graphics below reproduce the decision boundaries shown in Figure 5.11, and additionally illustrate the estimated probability surface (note: this code below should only be executed after the above code, since the 3D graphic is modified, rather than created anew):

Reproducing Figure 5.11 (top):

## clear the surface, decision plane, and decision boundary
par3d(userMatrix=diag(4)); pop3d(id=sfc); pop3d(id=qds)
for(pl in pls) pop3d(id=pl) ## fit additive natural cubic spline model
library(splines)
ddat <- data.frame(y=dat$y, x1=dat$x[,1], x2=dat$x[,2])
form.add <- y ~ ns(x1, df=3)+
ns(x2, df=3)
fit.add <- glm(form.add, data=ddat, family=binomial(link="logit")) ## compute probabilities, plot classification boundary
probs.add <- predict(fit.add, type="response",
newdata = data.frame(x1=dat$xnew[,1], x2=dat$xnew[,2]))
dat$probm.add <- with(dat, matrix(probs.add, length(px1), length(px2)))
dat$cls.add <- with(dat, contourLines(px1, px2, probm.add, levels=0.5))
pls <- lapply(dat$cls.add, function(p) lines3d(p$x, p$y, z=1)) ## plot probability surface and decision plane
sfc <- surface3d(dat$px1, dat$px2, probs.add, alpha=1.0,
color="gray", specular="gray")
qds <- quads3d(x1r[c(1,2,2,1)], x2r[c(1,1,2,2)], 0.5, alpha=0.4,
color="gray", lit=FALSE)

Reproducing Figure 5.11 (bottom)

## clear the surface, decision plane, and decision boundary
par3d(userMatrix=diag(4)); pop3d(id=sfc); pop3d(id=qds)
for(pl in pls) pop3d(id=pl) ## fit tensor product natural cubic spline model
form.tpr <- y ~ 0 + ns(x1, df=4, intercept=TRUE):
ns(x2, df=4, intercept=TRUE)
fit.tpr <- glm(form.tpr, data=ddat, family=binomial(link="logit")) ## compute probabilities, plot classification boundary
probs.tpr <- predict(fit.tpr, type="response",
newdata = data.frame(x1=dat$xnew[,1], x2=dat$xnew[,2]))
dat$probm.tpr <- with(dat, matrix(probs.tpr, length(px1), length(px2)))
dat$cls.tpr <- with(dat, contourLines(px1, px2, probm.tpr, levels=0.5))
pls <- lapply(dat$cls.tpr, function(p) lines3d(p$x, p$y, z=1)) ## plot probability surface and decision plane
sfc <- surface3d(dat$px1, dat$px2, probs.tpr, alpha=1.0,
color="gray", specular="gray")
qds <- quads3d(x1r[c(1,2,2,1)], x2r[c(1,1,2,2)], 0.5, alpha=0.4,
color="gray", lit=FALSE)

Although the graphics above are static, it is possible to embed an interactive 3D version within a web page (e.g., see the rgl vignette; best with Google Chrome), using the rgl function writeWebGL. I gave up on trying to embed such a graphic into this WordPress blog post, but I have created a separate page for the interactive 3D version of Figure 5.11b. Duncan Murdoch's work with this package is reall nice!

This entry was posted in Technical and tagged datagraphicsprogrammingRstatistics on February 1, 2015.

转自:http://biostatmatt.com/archives/2659

Some 3D Graphics (rgl) for Classification with Splines and Logistic Regression (from The Elements of Statistical Learning)(转)的更多相关文章

  1. More 3D Graphics (rgl) for Classification with Local Logistic Regression and Kernel Density Estimates (from The Elements of Statistical Learning)(转)

    This post builds on a previous post, but can be read and understood independently. As part of my cou ...

  2. 机器学习理论基础学习3.3--- Linear classification 线性分类之logistic regression(基于经验风险最小化)

    一.逻辑回归是什么? 1.逻辑回归 逻辑回归假设数据服从伯努利分布,通过极大化似然函数的方法,运用梯度下降来求解参数,来达到将数据二分类的目的. logistic回归也称为逻辑回归,与线性回归这样输出 ...

  3. 李宏毅机器学习笔记3:Classification、Logistic Regression

    李宏毅老师的机器学习课程和吴恩达老师的机器学习课程都是都是ML和DL非常好的入门资料,在YouTube.网易云课堂.B站都能观看到相应的课程视频,接下来这一系列的博客我都将记录老师上课的笔记以及自己对 ...

  4. Logistic Regression Using Gradient Descent -- Binary Classification 代码实现

    1. 原理 Cost function Theta 2. Python # -*- coding:utf8 -*- import numpy as np import matplotlib.pyplo ...

  5. Classification week2: logistic regression classifier 笔记

    华盛顿大学 machine learning: Classification 笔记. linear classifier 线性分类器 多项式: Logistic regression & 概率 ...

  6. Android Programming 3D Graphics with OpenGL ES (Including Nehe's Port)

    https://www3.ntu.edu.sg/home/ehchua/programming/android/Android_3D.html

  7. Logistic Regression and Classification

    分类(Classification)与回归都属于监督学习,两者的唯一区别在于,前者要预测的输出变量\(y\)只能取离散值,而后者的输出变量是连续的.这些离散的输出变量在分类问题中通常称之为标签(Lab ...

  8. Logistic Regression求解classification问题

    classification问题和regression问题类似,区别在于y值是一个离散值,例如binary classification,y值只取0或1. 方法来自Andrew Ng的Machine ...

  9. 分类和逻辑回归(Classification and logistic regression)

    分类问题和线性回归问题问题很像,只是在分类问题中,我们预测的y值包含在一个小的离散数据集里.首先,认识一下二元分类(binary classification),在二元分类中,y的取值只能是0和1.例 ...

随机推荐

  1. 《Algorithms Unlocked》读书笔记1——循环和递归

    <Algorithms Unlocked>是 <算法导论>的合著者之一 Thomas H. Cormen 写的一本算法基础. 书中没有涉及编程语言,直接用文字描述算法,我用 J ...

  2. AES算法,DES算法,RSA算法JAVA实现

    1     AES算法 1.1    算法描述 1.1.1      设计思想 Rijndael密码的设计力求满足以下3条标准: ① 抵抗所有已知的攻击. ② 在多个平台上速度快,编码紧凑. ③ 设计 ...

  3. call和apply的异同

    共同点: 作用:调用一个对象的一个方法,以另一个对象替换当前对象.将一个函数的对象上下文从初始的上下文改变为由 thisObj 指定的新对象.如果没有提供 thisObj 参数,那么 Global 对 ...

  4. MySQL数据库的安装布局

    首先我们要安装(mysql-5.0.18-win32_zip) 第一步:点击(Setup.exe) 第二步:开始安装(MySQL Server5.0版本) 1.点击(Next) 2.选Custom自定 ...

  5. node.js系列(实例):原生node.js+formidable模块实现简单的文件上传

    /** * 原生node.js结合formidable模块实现图片上传改名 * @Author:Ghost * @Date:2016/07/15 * @description: * 1.引入模块htt ...

  6. node.js系列:(调试工具)node-inspector调试Node.js应用

    如果你在编写Node.js代码,node-inspector是必备之选,比Node.js的内置调试器好出许多.使用起来跟Chrome的javascript调试器很相似. 使用npm安装: $ npm ...

  7. DOM 以及JS中的事件

    [DOM树节点] DOM节点分为三大节点:元素节点,文本节点,属性节点. 文本节点,属性节点为元素节点的两个子节点通过getElment系列方法,可以去到元素节点 [查看节点] 1 document. ...

  8. PMBOK 学习与实践分享视频

    本系列为自己在学习PMBOK时进行的总结与分享,每一节主要包括两部分: 对PMBOK本身的一个结构笔记和讲解. 对自己项目管理工作的一个总结和思考. PMBOK 学习与实践分享视频内容清单 人力资源管 ...

  9. python 错误之SyntaxError: Missing parentheses in call to 'print'

    SyntaxError: Missing parentheses in call to 'print' 由于python的版本差异,造成的错误. python2: print "hello ...

  10. 基于vue2.0前端组件库element中 el-form表单 自定义验证填坑

    eleme写的基于vue2.0的前端组件库: http://element.eleme.io 我在平时使用过程中,遇到的问题. 自定义表单验证出坑: 1: validate/resetFields 未 ...