predict.glm -> which class does it predict?
predict.glm -> which class does it predict?
2 posts
|
Hi,
I have a question about logistic regression in R. Suppose I have a small list of proteins P1, P2, P3 that predict a model <- glm(T ~ ., data=d.f(Y), family=binomial) (Y is the dataset of This works fine. T is a factored vector with levels cancer, noncancer. Now, I want to use predict.glm to predict a new data. predict(model, newdata=testsamples, type="response") (testsamples is The result is a vector of the probabilites for each sample in Is this fallowing expression Thank you, Peter ______________________________________________ |
Re: predict.glm -> which class does it predict?
1330 posts
|
On Jul 10, 2009, at 9:46 AM, Peter Schüffler wrote:
> Hi,
> > I have a question about logistic regression in R. > > Suppose I have a small list of proteins P1, P2, P3 that predict a > two-class target T, say cancer/noncancer. Lets further say I know > that I can build a simple logistic regression model in R > > model <- glm(T ~ ., data=d.f(Y), family=binomial) (Y is the > dataset of the Proteins). > > This works fine. T is a factored vector with levels cancer, > noncancer. Proteins are numeric. > > Now, I want to use predict.glm to predict a new data. > > predict(model, newdata=testsamples, type="response") (testsamples > is a small set of new samples). > > The result is a vector of the probabilites for each sample in > testsamples. But probabilty WHAT for? To belong to the first level > in T? To belong to second level in T? > > Is this fallowing expression > factor(predict(model, newdata=testsamples, type="response") >= 0.5) > TRUE, when the new sample is classified to Cancer or when it's > classified to Noncancer? And why not the other way around? > > Thank you, > > Peter ... [show rest of quote]
As per the Details section of ?glm: A typical predictor has the form response ~ terms where response is So, given your description above, you are predicting If you want to predict "cancer", alter the factor levels thusly: T <- factor(T, levels = c("noncancer", "cancer")) By default, R will alpha sort the factor levels, so "cancer" would be Think of it in terms of using a 0,1 integer code for absence,presence, BTW, using 'T' as the name of the response vector is not a good habit: > T 'T' is shorthand for the built in R constant TRUE. R is generally HTH, Marc Schwartz ______________________________________________ |
Re: predict.glm -> which class does it predict?
2360 posts
|
In reply to this post by Peter Schüffler-2
Peter Schüffler wrote:
> Hi,
> > I have a question about logistic regression in R. > > Suppose I have a small list of proteins P1, P2, P3 that predict a > two-class target T, say cancer/noncancer. Lets further say I know that I > can build a simple logistic regression model in R > > model <- glm(T ~ ., data=d.f(Y), family=binomial) (Y is the dataset of > the Proteins). > > This works fine. T is a factored vector with levels cancer, noncancer. > Proteins are numeric. > > Now, I want to use predict.glm to predict a new data. > > predict(model, newdata=testsamples, type="response") (testsamples is > a small set of new samples). > > The result is a vector of the probabilites for each sample in > testsamples. But probabilty WHAT for? To belong to the first level in T? > To belong to second level in T? > > Is this fallowing expression > factor(predict(model, newdata=testsamples, type="response") >= 0.5) > TRUE, when the new sample is classified to Cancer or when it's > classified to Noncancer? And why not the other way around? ... [show rest of quote]
It's the probability of the 2nd level of a factor response (termed I find it easiest to sort ut this kind of issue by experimentation in > x <- sample(c("A","B"),10,replace=TRUE) (notice that the relative frequency of B is 0.6) > glm(x~1,binomial) (OK, so it won't go without conversion to factor. This is a good thing.) > glm(factor(x)~1,binomial) Call: glm(formula = factor(x) ~ 1, family = binomial) Coefficients: Degrees of Freedom: 9 Total (i.e. Null); 9 Residual (The intercept is positive, corresponding to log odds for a probability > predict(glm(factor(x)~1,binomial)) As for why it's not the other way around, well, if it had been, then you -- ______________________________________________ |
Re: predict.glm -> which class does it predict?
7686 posts
|
2009/7/10 Peter Dalgaard <[hidden email]>:
> Peter Schüffler wrote:
>> >> Hi, >> >> I have a question about logistic regression in R. >> >> Suppose I have a small list of proteins P1, P2, P3 that predict a >> two-class target T, say cancer/noncancer. Lets further say I know that I can >> build a simple logistic regression model in R >> >> model <- glm(T ~ ., data=d.f(Y), family=binomial) (Y is the dataset of >> the Proteins). >> >> This works fine. T is a factored vector with levels cancer, noncancer. >> Proteins are numeric. >> >> Now, I want to use predict.glm to predict a new data. >> >> predict(model, newdata=testsamples, type="response") (testsamples is a >> small set of new samples). >> >> The result is a vector of the probabilites for each sample in testsamples. >> But probabilty WHAT for? To belong to the first level in T? To belong to >> second level in T? >> >> Is this fallowing expression >> factor(predict(model, newdata=testsamples, type="response") >= 0.5) >> TRUE, when the new sample is classified to Cancer or when it's classified >> to Noncancer? And why not the other way around? > > It's the probability of the 2nd level of a factor response (termed "success" > in the documentation, even when your modeling the probability of disease or > death...), just like when interpreting the logistic regression itself. > > I find it easiest to sort ut this kind of issue by experimentation in > simplified situations. E.g. > >> x <- sample(c("A","B"),10,replace=TRUE) >> x > [1] "B" "A" "B" "B" "A" "B" "B" "A" "B" "A" >> table(x) > x > A B > 4 6 > > (notice that the relative frequency of B is 0.6) > >> glm(x~1,binomial) > Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1 > In addition: Warning message: > In model.matrix.default(mt, mf, contrasts) : > variable 'x' converted to a factor > > (OK, so it won't go without conversion to factor. This is a good thing.) > >> glm(factor(x)~1,binomial) > > Call: glm(formula = factor(x) ~ 1, family = binomial) > > Coefficients: > (Intercept) > 0.4055 > > Degrees of Freedom: 9 Total (i.e. Null); 9 Residual > Null Deviance: 13.46 > Residual Deviance: 13.46 AIC: 15.46 > > (The intercept is positive, corresponding to log odds for a probability > > 0.5 ; i.e., must be that "B": 0.4055==log(6/4)) > >> predict(glm(factor(x)~1,binomial)) > 1 2 3 4 5 6 7 8 > 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 > 0.4054651 > 9 10 > 0.4054651 0.4054651 >> predict(glm(factor(x)~1,binomial),type="response") > 1 2 3 4 5 6 7 8 9 10 > 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 > > As for why it's not the other way around, well, if it had been, then you > could have asked the same question.... > ... [show rest of quote]
Or more specifically: > resp <- factor(c("cancer", "noncancer", "noncancer", "noncancer")) and since noncancer occurs 75% of the time in the sample clearly ______________________________________________ |
Re: predict.glm -> which class does it predict?
2360 posts
|
In reply to this post by Peter Dalgaard
> As for why it's not the other way around, well, if it had been, then you
> could have asked the same question.... ...and come to think about it, it is rather convenient that it meshes -- ______________________________________________ |
predict.glm -> which class does it predict?的更多相关文章
- CF451C Predict Outcome of the Game 水题
Codeforces Round #258 (Div. 2) Predict Outcome of the Game C. Predict Outcome of the Game time limit ...
- tflearn tensorflow LSTM predict sin function
from __future__ import division, print_function, absolute_import import tflearn import numpy as np i ...
- 如何在R语言中使用Logistic回归模型
在日常学习或工作中经常会使用线性回归模型对某一事物进行预测,例如预测房价.身高.GDP.学生成绩等,发现这些被预测的变量都属于连续型变量.然而有些情况下,被预测变量可能是二元变量,即成功或失败.流失或 ...
- 简单介绍一下R中的几种统计分布及常用模型
统计学上分布有很多,在R中基本都有描述.因能力有限,我们就挑选几个常用的.比较重要的简单介绍一下每种分布的定义,公式,以及在R中的展示. 统计分布每一种分布有四个函数:d――density(密度函数) ...
- Machine Learning for hackers读书笔记(六)正则化:文本回归
data<-'F:\\learning\\ML_for_Hackers\\ML_for_Hackers-master\\06-Regularization\\data\\' ranks < ...
- 统计学习导论:基于R应用——第五章习题
第五章习题 1. 我们主要用到下面三个公式: 根据上述公式,我们将式子化简为 对求导即可得到得到公式5-6. 2. (a) 1 - 1/n (b) 自助法是有有放回的,所以第二个的概率还是1 - 1/ ...
- 统计学习导论:基于R应用——第四章习题
第四章习题,部分题目未给出答案 1. 这个题比较简单,有高中生推导水平的应该不难. 2~3证明题,略 4. (a) 这个问题问我略困惑,答案怎么直接写出来了,难道不是10%么 (b) 这个答案是(0. ...
- R与数据分析旧笔记(⑨)广义线性回归模型
广义线性回归模型 广义线性回归模型 例题1 R.Norell实验 为研究高压电线对牲畜的影响,R.Norell研究小的电流对农场动物的影响.他在实验中,选择了7头,6种电击强度, 0,1,2,3,4, ...
- logistic回归和probit回归预测公司被ST的概率(应用)
1.适合阅读人群: 知道以下知识点:盒状图.假设检验.逻辑回归的理论.probit的理论.看过回归分析,了解AIC和BIC判别准则.能自己跑R语言程序 2.本文目的:用R语言演示一个相对完整的逻辑回归 ...
随机推荐
- Hadoop平台的基本组成与生态系统
Hadoop系统运行于一个由普通商用服务器组成的计算集群上,该服务器集群在提供大规模分布式数据存储资源的同时,也提供大规模的并行化计算资源. 在大数据处理软件系统上,随着Apache Hadoop系统 ...
- callee 与 caller
arguments.callee 在函数内部指向函数本身 1.函数调用 function sum (num){ if(num <= 1){ return 1; }else{ return num ...
- MaintainableCSS 《可维护性 CSS》 --- 约定篇
约定 可维护的CSS具有以下约定: .<module>[-<component>][-<state>] {} 根据所讨论的模块,方括号是可选的.这里有些例子: /* ...
- EF大数据批量添加性能问题
前几天做一个批量发消息的功能,因为要向消息表中批量写入数据,用的EF框架的插入方法:不用不知道,一用吓一跳:就10000条数据就耗时好几分钟,对应追求用户体验的我来说这是极不能容忍的,后来改为拼接SQ ...
- (GoRails) Credential
之前的博客:https://www.cnblogs.com/chentianwei/p/9167489.html Guide: https://guides.rubyonrails.org/secu ...
- bzoj2705: [SDOI2012]Longge的问题 欧拉定理
题意:给定一个整数N,你需要求出∑gcd(i, N)(1<=i <=N). 题解:考虑n的所有因子,假设有因子k,那么对答案的贡献gcd(i,n)==k的个数即gcd(i/k,n/k)== ...
- 数论练习(6)——hdu A/B(逆元gcd)
A/B Time Limit: 1000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)Total Submiss ...
- lvs+keepalived+bind实现负载均衡高可用智能dns【转】
转:https://www.cnblogs.com/mikeluwen/p/7068356.html 整体架构: 1.IP地址规划: Dns1:172.28.0.54 Dns2:172.28.0.55 ...
- C++设计模式之桥接模式
[DP]书上定义:将抽象部分与它的实现部分分离,使它们都可以独立地变化.考虑装操作系统,有多种配置的计算机,同样也有多款操作系统.如何运用桥接模式呢?可以将操作系统和计算机分别抽象出来,让它们各自发展 ...
- iOS笔记之网络
用POST方式上传数据时,数组怎么处理? NSError *error; NSData *jsonData = [NSJSONSerialization dataWithJSONObject:data ...