Jul 10, 2009; 10:46pm

predict.glm -> which class does it predict?

2 posts
Hi,

I have a question about logistic regression in R.

Suppose I have a small list of proteins P1, P2, P3 that predict a 
two-class target T, say cancer/noncancer. Lets further say I know that I 
can build a simple logistic regression model in R

model <- glm(T ~ ., data=d.f(Y), family=binomial)   (Y is the dataset of 
the Proteins).

This works fine. T is a factored vector with levels cancer, noncancer. 
Proteins are numeric.

Now, I want to use predict.glm to predict a new data.

predict(model, newdata=testsamples, type="response")    (testsamples is 
a small set of new samples).

The result is a vector of the probabilites for each sample in 
testsamples. But probabilty WHAT for? To belong to the first level in T? 
To belong to second level in T?

Is this fallowing expression 
factor(predict(model, newdata=testsamples, type="response") >= 0.5) 
TRUE, when the new sample is classified to Cancer or when it's 
classified to Noncancer? And why not the other way around?

Thank you,

Peter

______________________________________________ 
[hidden email] mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 

Reply | Threaded | More 
Jul 10, 2009; 11:37pm

Re: predict.glm -> which class does it predict?

1330 posts
On Jul 10, 2009, at 9:46 AM, Peter Schüffler wrote:

> Hi, 

> I have a question about logistic regression in R. 

> Suppose I have a small list of proteins P1, P2, P3 that predict a   
> two-class target T, say cancer/noncancer. Lets further say I know   
> that I can build a simple logistic regression model in R 

> model <- glm(T ~ ., data=d.f(Y), family=binomial)   (Y is the   
> dataset of the Proteins). 

> This works fine. T is a factored vector with levels cancer,   
> noncancer. Proteins are numeric. 

> Now, I want to use predict.glm to predict a new data. 

> predict(model, newdata=testsamples, type="response")    (testsamples   
> is a small set of new samples). 

> The result is a vector of the probabilites for each sample in   
> testsamples. But probabilty WHAT for? To belong to the first level   
> in T? To belong to second level in T? 

> Is this fallowing expression 
> factor(predict(model, newdata=testsamples, type="response") >= 0.5) 
> TRUE, when the new sample is classified to Cancer or when it's   
> classified to Noncancer? And why not the other way around? 

> Thank you, 

> Peter

As per the Details section of ?glm:

A typical predictor has the form response ~ terms where response is   
the (numeric) response vector and terms is a series of terms which   
specifies a linear predictor forresponse. ***For binomial and   
quasibinomial families the response can also be specified as a factor   
(when the first level denotes failure and all others success)*** or as   
a two-column matrix with the columns giving the numbers of successes   
and failures. A terms specification of the form first + second   
indicates all the terms in first together with all the terms in second   
with any duplicates removed.

So, given your description above, you are predicting   
"noncancer"...that is, you are predicting the probability of the   
second level of the factor ("success"), given the covariates.

If you want to predict "cancer", alter the factor levels thusly:

T <- factor(T, levels = c("noncancer", "cancer"))

By default, R will alpha sort the factor levels, so "cancer" would be   
first.

Think of it in terms of using a 0,1 integer code for absence,presence,   
where you are predicting the probability of a '1', or the presence of   
the event or characteristic of interest.

BTW, using 'T' as the name of the response vector is not a good habit:

> T 
[1] TRUE

'T' is shorthand for the built in R constant TRUE. R is generally   
smart enough to know the difference, but it is better to avoid getting   
into trouble by not using it.

HTH,

Marc Schwartz

______________________________________________ 
[hidden email] mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 

Reply | Threaded | More 
Jul 10, 2009; 11:48pm

Re: predict.glm -> which class does it predict?

2360 posts
In reply to this post by Peter Schüffler-2
Peter Schüffler wrote:

> Hi, 

> I have a question about logistic regression in R. 

> Suppose I have a small list of proteins P1, P2, P3 that predict a 
> two-class target T, say cancer/noncancer. Lets further say I know that I 
> can build a simple logistic regression model in R 

> model <- glm(T ~ ., data=d.f(Y), family=binomial)   (Y is the dataset of 
> the Proteins). 

> This works fine. T is a factored vector with levels cancer, noncancer. 
> Proteins are numeric. 

> Now, I want to use predict.glm to predict a new data. 

> predict(model, newdata=testsamples, type="response")    (testsamples is 
> a small set of new samples). 

> The result is a vector of the probabilites for each sample in 
> testsamples. But probabilty WHAT for? To belong to the first level in T? 
> To belong to second level in T? 

> Is this fallowing expression 
> factor(predict(model, newdata=testsamples, type="response") >= 0.5) 
> TRUE, when the new sample is classified to Cancer or when it's 
> classified to Noncancer? And why not the other way around?

It's the probability of the 2nd level of a factor response (termed 
"success" in the documentation, even when your modeling the probability 
of disease or death...), just like when interpreting the logistic 
regression itself.

I find it easiest to sort ut this kind of issue by experimentation in 
simplified situations. E.g.

> x <- sample(c("A","B"),10,replace=TRUE) 
 > x 
  [1] "B" "A" "B" "B" "A" "B" "B" "A" "B" "A" 
 > table(x) 

A B 
4 6

(notice that the relative frequency of B is 0.6)

> glm(x~1,binomial) 
Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1 
In addition: Warning message: 
In model.matrix.default(mt, mf, contrasts) : 
   variable 'x' converted to a factor

(OK, so it won't go without conversion to factor. This is a good thing.)

> glm(factor(x)~1,binomial)

Call:  glm(formula = factor(x) ~ 1, family = binomial)

Coefficients: 
(Intercept) 
      0.4055

Degrees of Freedom: 9 Total (i.e. Null);  9 Residual 
Null Deviance:    13.46 
Residual Deviance: 13.46 AIC: 15.46

(The intercept is positive, corresponding to log odds for a probability 
 > 0.5 ; i.e.,  must be that "B": 0.4055==log(6/4))

> predict(glm(factor(x)~1,binomial)) 
         1         2         3         4         5         6         7 
        8 
0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 
0.4054651 
         9        10 
0.4054651 0.4054651 
 > predict(glm(factor(x)~1,binomial),type="response") 
   1   2   3   4   5   6   7   8   9  10 
0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6

As for why it's not the other way around, well, if it had been, then you 
could have asked the same question....

-- 
    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B 
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K 
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918 
~~~~~~~~~~ - ([hidden email])              FAX: (+45) 35327907

______________________________________________ 
[hidden email] mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 

Reply | Threaded | More 
Jul 11, 2009; 1:27am

Re: predict.glm -> which class does it predict?

7686 posts
2009/7/10 Peter Dalgaard <[hidden email]>:

> Peter Schüffler wrote: 
>> 
>> Hi, 
>> 
>> I have a question about logistic regression in R. 
>> 
>> Suppose I have a small list of proteins P1, P2, P3 that predict a 
>> two-class target T, say cancer/noncancer. Lets further say I know that I can 
>> build a simple logistic regression model in R 
>> 
>> model <- glm(T ~ ., data=d.f(Y), family=binomial)   (Y is the dataset of 
>> the Proteins). 
>> 
>> This works fine. T is a factored vector with levels cancer, noncancer. 
>> Proteins are numeric. 
>> 
>> Now, I want to use predict.glm to predict a new data. 
>> 
>> predict(model, newdata=testsamples, type="response")    (testsamples is a 
>> small set of new samples). 
>> 
>> The result is a vector of the probabilites for each sample in testsamples. 
>> But probabilty WHAT for? To belong to the first level in T? To belong to 
>> second level in T? 
>> 
>> Is this fallowing expression 
>> factor(predict(model, newdata=testsamples, type="response") >= 0.5) 
>> TRUE, when the new sample is classified to Cancer or when it's classified 
>> to Noncancer? And why not the other way around? 

> It's the probability of the 2nd level of a factor response (termed "success" 
> in the documentation, even when your modeling the probability of disease or 
> death...), just like when interpreting the logistic regression itself. 

> I find it easiest to sort ut this kind of issue by experimentation in 
> simplified situations. E.g. 

>> x <- sample(c("A","B"),10,replace=TRUE) 
>> x 
>  [1] "B" "A" "B" "B" "A" "B" "B" "A" "B" "A" 
>> table(x) 
> x 
> A B 
> 4 6 

> (notice that the relative frequency of B is 0.6) 

>> glm(x~1,binomial) 
> Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1 
> In addition: Warning message: 
> In model.matrix.default(mt, mf, contrasts) : 
>  variable 'x' converted to a factor 

> (OK, so it won't go without conversion to factor. This is a good thing.) 

>> glm(factor(x)~1,binomial) 

> Call:  glm(formula = factor(x) ~ 1, family = binomial) 

> Coefficients: 
> (Intercept) 
>     0.4055 

> Degrees of Freedom: 9 Total (i.e. Null);  9 Residual 
> Null Deviance:      13.46 
> Residual Deviance: 13.46        AIC: 15.46 

> (The intercept is positive, corresponding to log odds for a probability > 
> 0.5 ; i.e.,  must be that "B": 0.4055==log(6/4)) 

>> predict(glm(factor(x)~1,binomial)) 
>        1         2         3         4         5         6         7       8 
> 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 
> 0.4054651 
>        9        10 
> 0.4054651 0.4054651 
>> predict(glm(factor(x)~1,binomial),type="response") 
>  1   2   3   4   5   6   7   8   9  10 
> 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 

> As for why it's not the other way around, well, if it had been, then you 
> could have asked the same question.... 
>

Or more specifically:

> resp <- factor(c("cancer", "noncancer", "noncancer", "noncancer")) 
> mod <- glm(resp ~ 1, family = binomial) 
> predict(mod, type = "response") 
   1    2    3    4 
0.75 0.75 0.75 0.75

and since noncancer occurs 75% of the time in the sample clearly 
its predicting the probability of noncancer.

______________________________________________ 
[hidden email] mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 

Reply | Threaded | More 
Jul 11, 2009; 2:10am

Re: predict.glm -> which class does it predict?

2360 posts
In reply to this post by Peter Dalgaard
> As for why it's not the other way around, well, if it had been, then you 
> could have asked the same question....

...and come to think about it, it is rather convenient that it meshes 
with the default ordering of levels in factor(x) is x is 0/1 or FALSE/TRUE.

-- 
    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B 
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K 
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918 
~~~~~~~~~~ - ([hidden email])              FAX: (+45) 35327907

______________________________________________ 
[hidden email] mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 

predict.glm -> which class does it predict?的更多相关文章

  1. CF451C Predict Outcome of the Game 水题

    Codeforces Round #258 (Div. 2) Predict Outcome of the Game C. Predict Outcome of the Game time limit ...

  2. tflearn tensorflow LSTM predict sin function

    from __future__ import division, print_function, absolute_import import tflearn import numpy as np i ...

  3. 如何在R语言中使用Logistic回归模型

    在日常学习或工作中经常会使用线性回归模型对某一事物进行预测,例如预测房价.身高.GDP.学生成绩等,发现这些被预测的变量都属于连续型变量.然而有些情况下,被预测变量可能是二元变量,即成功或失败.流失或 ...

  4. 简单介绍一下R中的几种统计分布及常用模型

    统计学上分布有很多,在R中基本都有描述.因能力有限,我们就挑选几个常用的.比较重要的简单介绍一下每种分布的定义,公式,以及在R中的展示. 统计分布每一种分布有四个函数:d――density(密度函数) ...

  5. Machine Learning for hackers读书笔记(六)正则化:文本回归

    data<-'F:\\learning\\ML_for_Hackers\\ML_for_Hackers-master\\06-Regularization\\data\\' ranks < ...

  6. 统计学习导论:基于R应用——第五章习题

    第五章习题 1. 我们主要用到下面三个公式: 根据上述公式,我们将式子化简为 对求导即可得到得到公式5-6. 2. (a) 1 - 1/n (b) 自助法是有有放回的,所以第二个的概率还是1 - 1/ ...

  7. 统计学习导论:基于R应用——第四章习题

    第四章习题,部分题目未给出答案 1. 这个题比较简单,有高中生推导水平的应该不难. 2~3证明题,略 4. (a) 这个问题问我略困惑,答案怎么直接写出来了,难道不是10%么 (b) 这个答案是(0. ...

  8. R与数据分析旧笔记(⑨)广义线性回归模型

    广义线性回归模型 广义线性回归模型 例题1 R.Norell实验 为研究高压电线对牲畜的影响,R.Norell研究小的电流对农场动物的影响.他在实验中,选择了7头,6种电击强度, 0,1,2,3,4, ...

  9. logistic回归和probit回归预测公司被ST的概率(应用)

    1.适合阅读人群: 知道以下知识点:盒状图.假设检验.逻辑回归的理论.probit的理论.看过回归分析,了解AIC和BIC判别准则.能自己跑R语言程序 2.本文目的:用R语言演示一个相对完整的逻辑回归 ...

随机推荐

  1. hdu3951巴什博弈变型

    参考博客:http://blog.csdn.net/sun897949163/article/details/50609070 特判一下m=1的情况,然后m!=1时,无论对手取多少,我只要取的让这条链 ...

  2. torch中的多线程threads学习

    torch中的多线程threads学习 torch threads threads 包介绍 threads package的优势点: 程序中线程可以随时创建 Jobs被以回调函数的形式提交给线程系统, ...

  3. HDU1754 I hate it_线段树(入门级别)

    I Hate It Time Limit: 9000/3000 MS (Java/Others)    Memory Limit: 32768/32768 K (Java/Others)Total S ...

  4. mysql 判断表字段是否存在,然后修改

    -- ---------------------------- -- 判断 vrv_paw_rule 表是否存在 thresholdMin 字段,不存在则添加; 存在则修改字段类型 DELIMITER ...

  5. winform学习目录

    Winform混合式开发框架的特点总结 伍华聪 2016-02-26 10:47 阅读:1966 评论:2     代码生成工具Database2Sharp中增加视图的代码生成以及主从表界面生成功能  ...

  6. java之子类继承抽象类,子类构造器调用抽象类构造器问题

    package com.wtd; public abstract class Car { private String name= "car"; public Car(String ...

  7. Howto: 在ArcGIS10中将地图文档(mxd文档)批量保存到之前版本

     Howto: 在ArcGIS10中将地图文档(mxd文档)批量保存到之前版本 文章编号 : 38783 软件: ArcGIS - ArcEditor 10 ArcGIS - ArcInfo 10 A ...

  8. 201621123010 《Java程序设计》第2周学习总结

    1.本周学习总结 Java有基本数据类型(类似c)和引用数据类型(不同于c)两种数据类型. Java是面向对象的语言,引用类型变量存放指向对象的引用,而不是该对象本身.因此判断两对象值是否相等时,需使 ...

  9. Yii在window下的安装方法

    首先,在http://www.yiichina.com/上下载yii 然后,配置系统环境变量,在win8下,按win+x,找到系统->高级系统设置->环境变量->path 把php的 ...

  10. 【idea】如何安装jetty容器,并使用。

    参考:https://www.jetbrains.com/idea/help/run-debug-configuration-jetty-server.html背景:web开发当中,我觉得服务层的代码 ...