Some 3D Graphics (rgl) for Classification with Splines and Logistic Regression (from The Elements of Statistical Learning)(转)
This semester I'm teaching from Hastie, Tibshirani, and Friedman's book, The Elements of Statistical Learning, 2nd Edition. The authors provide aMixture Simulation data set that has two continuous predictors and a binary outcome. This data is used to demonstrate classification procedures by plotting classification boundaries in the two predictors. For example, the figure below is a reproduction of Figure 2.5 in the book:
The solid line represents the Bayes decision boundary (i.e., {x: Pr("orange"|x) = 0.5}), which is computed from the model used to simulate these data. The Bayes decision boundary and other boundaries are determined by one or more surfaces (e.g., Pr("orange"|x)), which are generally omitted from the graphics. In class, we decided to use the R package rgl to create a 3D representation of this surface. Below is the code and graphic (well, a 2D projection) associated with the Bayes decision boundary:
library(rgl)
load(url("http://statweb.stanford.edu/~tibs/ElemStatLearn/datasets/ESL.mixture.rda"))
dat <- ESL.mixture ## create 3D graphic, rotate to view 2D x1/x2 projection
par3d(FOV=1,userMatrix=diag(4))
plot3d(dat$xnew[,1], dat$xnew[,2], dat$prob, type="n",
xlab="x1", ylab="x2", zlab="",
axes=FALSE, box=TRUE, aspect=1) ## plot points and bounding box
x1r <- range(dat$px1)
x2r <- range(dat$px2)
pts <- plot3d(dat$x[,1], dat$x[,2], 1,
type="p", radius=0.5, add=TRUE,
col=ifelse(dat$y, "orange", "blue"))
lns <- lines3d(x1r[c(1,2,2,1,1)], x2r[c(1,1,2,2,1)], 1) ## draw Bayes (True) decision boundary; provided by authors
dat$probm <- with(dat, matrix(prob, length(px1), length(px2)))
dat$cls <- with(dat, contourLines(px1, px2, probm, levels=0.5))
pls <- lapply(dat$cls, function(p) lines3d(p$x, p$y, z=1)) ## plot marginal (w.r.t mixture) probability surface and decision plane
sfc <- surface3d(dat$px1, dat$px2, dat$prob, alpha=1.0,
color="gray", specular="gray")
qds <- quads3d(x1r[c(1,2,2,1)], x2r[c(1,1,2,2)], 0.5, alpha=0.4,
color="gray", lit=FALSE)
In the above graphic, the probability surface is represented in gray, and the Bayes decision boundary occurs where the plane f(x) = 0.5 (in light gray) intersects with the probability surface.
Of course, the classification task is to estimate a decision boundary given the data. Chapter 5 presents two multidimensional splines approaches, in conjunction with binary logistic regression, to estimate a decision boundary. The upper panel of Figure 5.11 in the book shows the decision boundary associated with additive natural cubic splines in x1 and x2 (4 df in each direction; 1+(4-1)+(4-1) = 7 parameters), and the lower panel shows the corresponding tensor product splines (4x4 = 16 parameters), which are much more flexible, of course. The code and graphics below reproduce the decision boundaries shown in Figure 5.11, and additionally illustrate the estimated probability surface (note: this code below should only be executed after the above code, since the 3D graphic is modified, rather than created anew):
Reproducing Figure 5.11 (top):
## clear the surface, decision plane, and decision boundary
par3d(userMatrix=diag(4)); pop3d(id=sfc); pop3d(id=qds)
for(pl in pls) pop3d(id=pl) ## fit additive natural cubic spline model
library(splines)
ddat <- data.frame(y=dat$y, x1=dat$x[,1], x2=dat$x[,2])
form.add <- y ~ ns(x1, df=3)+
ns(x2, df=3)
fit.add <- glm(form.add, data=ddat, family=binomial(link="logit")) ## compute probabilities, plot classification boundary
probs.add <- predict(fit.add, type="response",
newdata = data.frame(x1=dat$xnew[,1], x2=dat$xnew[,2]))
dat$probm.add <- with(dat, matrix(probs.add, length(px1), length(px2)))
dat$cls.add <- with(dat, contourLines(px1, px2, probm.add, levels=0.5))
pls <- lapply(dat$cls.add, function(p) lines3d(p$x, p$y, z=1)) ## plot probability surface and decision plane
sfc <- surface3d(dat$px1, dat$px2, probs.add, alpha=1.0,
color="gray", specular="gray")
qds <- quads3d(x1r[c(1,2,2,1)], x2r[c(1,1,2,2)], 0.5, alpha=0.4,
color="gray", lit=FALSE)
Reproducing Figure 5.11 (bottom)
## clear the surface, decision plane, and decision boundary
par3d(userMatrix=diag(4)); pop3d(id=sfc); pop3d(id=qds)
for(pl in pls) pop3d(id=pl) ## fit tensor product natural cubic spline model
form.tpr <- y ~ 0 + ns(x1, df=4, intercept=TRUE):
ns(x2, df=4, intercept=TRUE)
fit.tpr <- glm(form.tpr, data=ddat, family=binomial(link="logit")) ## compute probabilities, plot classification boundary
probs.tpr <- predict(fit.tpr, type="response",
newdata = data.frame(x1=dat$xnew[,1], x2=dat$xnew[,2]))
dat$probm.tpr <- with(dat, matrix(probs.tpr, length(px1), length(px2)))
dat$cls.tpr <- with(dat, contourLines(px1, px2, probm.tpr, levels=0.5))
pls <- lapply(dat$cls.tpr, function(p) lines3d(p$x, p$y, z=1)) ## plot probability surface and decision plane
sfc <- surface3d(dat$px1, dat$px2, probs.tpr, alpha=1.0,
color="gray", specular="gray")
qds <- quads3d(x1r[c(1,2,2,1)], x2r[c(1,1,2,2)], 0.5, alpha=0.4,
color="gray", lit=FALSE)
Although the graphics above are static, it is possible to embed an interactive 3D version within a web page (e.g., see the rgl vignette; best with Google Chrome), using the rgl function writeWebGL. I gave up on trying to embed such a graphic into this WordPress blog post, but I have created a separate page for the interactive 3D version of Figure 5.11b. Duncan Murdoch's work with this package is reall nice!
This entry was posted in Technical and tagged data, graphics, programming, R, statistics on February 1, 2015.
转自:http://biostatmatt.com/archives/2659
Some 3D Graphics (rgl) for Classification with Splines and Logistic Regression (from The Elements of Statistical Learning)(转)的更多相关文章
- More 3D Graphics (rgl) for Classification with Local Logistic Regression and Kernel Density Estimates (from The Elements of Statistical Learning)(转)
This post builds on a previous post, but can be read and understood independently. As part of my cou ...
- 机器学习理论基础学习3.3--- Linear classification 线性分类之logistic regression(基于经验风险最小化)
一.逻辑回归是什么? 1.逻辑回归 逻辑回归假设数据服从伯努利分布,通过极大化似然函数的方法,运用梯度下降来求解参数,来达到将数据二分类的目的. logistic回归也称为逻辑回归,与线性回归这样输出 ...
- 李宏毅机器学习笔记3:Classification、Logistic Regression
李宏毅老师的机器学习课程和吴恩达老师的机器学习课程都是都是ML和DL非常好的入门资料,在YouTube.网易云课堂.B站都能观看到相应的课程视频,接下来这一系列的博客我都将记录老师上课的笔记以及自己对 ...
- Logistic Regression Using Gradient Descent -- Binary Classification 代码实现
1. 原理 Cost function Theta 2. Python # -*- coding:utf8 -*- import numpy as np import matplotlib.pyplo ...
- Classification week2: logistic regression classifier 笔记
华盛顿大学 machine learning: Classification 笔记. linear classifier 线性分类器 多项式: Logistic regression & 概率 ...
- Android Programming 3D Graphics with OpenGL ES (Including Nehe's Port)
https://www3.ntu.edu.sg/home/ehchua/programming/android/Android_3D.html
- Logistic Regression and Classification
分类(Classification)与回归都属于监督学习,两者的唯一区别在于,前者要预测的输出变量\(y\)只能取离散值,而后者的输出变量是连续的.这些离散的输出变量在分类问题中通常称之为标签(Lab ...
- Logistic Regression求解classification问题
classification问题和regression问题类似,区别在于y值是一个离散值,例如binary classification,y值只取0或1. 方法来自Andrew Ng的Machine ...
- 分类和逻辑回归(Classification and logistic regression)
分类问题和线性回归问题问题很像,只是在分类问题中,我们预测的y值包含在一个小的离散数据集里.首先,认识一下二元分类(binary classification),在二元分类中,y的取值只能是0和1.例 ...
随机推荐
- FineUIMvc
FineUIMvc 在线示例源代码下载(包含上面列出的全部 50 种主题,基础版可以免费使用) FineUIMvc 在线示例 FineUI 官网首页
- SysTick定时器
SysTick是一个24位的倒计数定时器,当计到0时,将从RELOAD寄存器中自动重装载定时初值.只要不把它在SysTick控制及状态寄存器中的使能位清除,就永不停息.下边小结了SysTick的相关寄 ...
- .NET Core中文分词组件jieba.NET Core
.NET Core中文分词组件jieba.NET Core,由于实际的一些需求,需要做中文分词. 找到了一个jieba.NET https://github.com/anderscui/jieba.N ...
- 初遇stm32
刚开始接触32,建一个工程都这么费劲,可能是keil安装时一些文件和库没有安装完整,真是坑啊. 回头可能还要从新安装,然后开始新的学习,争取十天之内入门32,在博客园这个强大的技术支持下, 想不入门都 ...
- Java Excel 合并单元格
//合并单元格CellRangeAddress cra = new CellRangeAddress(0, 0, 0, 4);sheet.addMergedRegion(cra);
- 篇3 安卓app自动化测试-搞定界面元素
篇3 安卓app自动化测试-搞定界面元素 --lamecho辣么丑 1.1概要 大家好! 我是lamecho(辣么丑),今天是<安卓app自动化测试>的第三 ...
- Vmware Vsphere WebService之vijava 开发(二)一性能信息的采集(实时监控)
最近一直没有更新这部分的内容,会利用五一时间完成vcenter这一个系列. 这里先给大家一本关于vijava开发的书,比较实用. 地址:http://pan.baidu.com/s/1gfkl9mj. ...
- APP品牌具体有哪几个要素?又是如何操作的?
对于品牌的一些认识 首先我们要知道,品牌是由用户与公司及其产品&服务互动后所产生的,失去了与用户互动并且承认的是伪品牌,对于开发者来说,APP的品牌要先从标志与颜色考虑起,但实话实说,标志与颜 ...
- JS判断当前手机类型
window.onload = function () { var u = navigator.userAgent; if (u.indexOf('Android') > -1 || u.ind ...
- Nginx 反向代理&负载均衡
1.反向代理 当我们请求一个网站时,nginx会决定由哪台服务器提供服务,就是反向代理. nginx只做请求的转发,后台有多个tomcat服务器提供服务,nginx的功能就是把请求转发给后面的服务器, ...