Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授。

PDF笔记下载(Academia.edu)

Summary

Chi-square test

  • Random sample or not / Good or bad

    • $$H_0: \text{Good model}$$ $$H_A: \text{Not good model}$$
    • Based on the expected proportion to calculate the expected values
    • $\chi^2$ statistic is $$\chi^2=\sum{\frac{(o-e)^2}{e}}$$ where $o$ is observed values, $e$ is expected values.
    • The degree of freedom is the number of categories minus one
    • Follows approximately the $\chi^2$ distribution, we can calculate its P-value by using R function:
      1-pchisq(chi, df)
  • Independent or not
    • $$H_0: \text{Independent}$$ $$H_A: \text{not Independent}$$
    • Contingency table
    • Under $H_0$, in each cell of the table $$\text{expected count}=\frac{\text{row total}\times\text{column total}}{\text{grand total}}$$ That is, $P(A\cap B)=P(A)\cdot P(B)$ under the independent assumption.
    • $\chi^2$ statistic is $$\chi^2=\sum{\frac{(o-e)^2}{e}}$$ where $o$ is observed values, $e$ is expected values.
    • The degree of freedom is $(\text{row}-1)\times(\text{column}-1)$
    • Follows approximately the $\chi^2$ distribution, we can calculate its P-value by using R function:
      1-pchisq(chi, df)

ADDITIONAL PRACTICE PROBLEMS FOR WEEK 5

The population is all patients at a large system of hospitals; each sampled patient was classified by the type of room he/she was in, and his/her level of satisfaction with the care received. The question is whether type of room is independent of level of satisfaction.

1. What are the null and alternative hypotheses?

2. Under the null, what is the estimated expected number of patients in the "shared room, somewhat satisfied" cell?

3. Degrees of freedom = ( )

4. The chi-square statistic is about 13.8. Roughly what is the P-value, and what is the conclusion of the test?

Solution

1. Null: The two variables are independent; Alternative: The two variables are not independent.

2. We need to expand the original table:

Thus the estimated expected number of patients in the shared room, somewhat satisfied is $$784\times\frac{322}{784}\times\frac{255}{784}=104.7321$$

3. Degree of freedom is $(3-1)\times(3-1)=4$

4. P-value is 0.007961505 which is smaller than 0.05, so we reject $H_0$. That is, the conclusion is the two variables are not independent. R code:

1 - pchisq(13.8, 4)
[1] 0.007961505

UNGRADED EXERCISE SET A PROBLEM 1

According to a genetics model, plants of a particular species occur in the categories A, B, C, and D, in the ratio 9:3:3:1. The categories of different plants are mutually independent. At a lab that grows these plants, 218 are in Category A, 69 in Category B, 84 in Category C, and 29 in Category D. Does the model look good? Follow the steps in Problems 1A-1F.

1A The null hypothesis is:

a. The model is good.

b. The model isn't good.

c. Too many of the plants are in Category C.

d. The proportion of plants in Category A is expected to be 9/16; the difference in the sample is due to chance.

1B The alternative hypothesis is:

a. The model is good.

b. The model isn't good.

c. Too many of the plants are in Category C.

d. The proportion of plants in Category A is expected to be 9/16; the difference in the sample is due to chance.

1C Under the null, the expected number of plants in Category D is( ).

1D The chi-square statistic is closest to

a. 1 b. 1.5 c. 2 d. 2.5 e. 3 f. 3.5 g. 4 h. 4.5

1E Degrees of freedom = ( ).

1F Based on this test, does the model look good? Yes No

Solution

1A) The null hypothesis is "the model is good". (a) is correct.

1B) The alternative hypothesis is "the model is not good". (b) is correct.

1C) The expected number of plants in Category D is $$(218+69+84+29)\times\frac{1}{9+3+3+1}=25$$

1D) (d) is correct. We can use the following table

R code:

o = c(218, 69, 84, 29)
e = c(225, 75, 75, 25)
chi = sum((o - e)^2 / e); chi
[1] 2.417778

1E) Degree of freedom is $4-1=3$.

1F) P-value is 0.4903339 which is larger than 0.05, so we reject $H_A$. The conclusion is "the model is good". R code:

1 - pchisq(chi, 3)
[1] 0.4903339

PROBLEM 2

A simple random sample of cars in a city was categorized according to fuel type and place of manufacture.

Are place of manufacture and fuel type independent? Follow the steps in Problems 2A-2D.

2A If the two variables were independent, the chance that a sampled car is a domestic gasoline fueled car would be estimated to be about

0.0362 0.0499 0.2775 0.3820 0.5

2B If the two variables were independent, the expected number of foreign gas/electric hybrids would be estimated to be ( ). (Please keep at least two decimal places; by now you should understand why you should not round off to an integer.)

2C Degrees of freedom =( )

1 2 3 4

2D The chi-square statistic is 0.6716. The test therefore concludes that the two variables are independent not independent

Solution

2A) Expand the table:

If the two variables were independent, then $$P(\text{domestic gasoline})=P(\text{domestic})\cdot P(\text{gasoline})=\frac{215}{511}\times\frac{337}{511}=0.2774767\doteq 0.2775$$

2B) If the two variables were independent, then $$511\times P(\text{foreign gasoline/electricity})=511\times\frac{296}{511}\times\frac{130}{511}=75.30333$$

2C) Degree of freedom is $(2-1)\times(3-1)=2$.

2D) The P-value is 0.714766 which is larger than 0.05, so we reject $H_A$. That is, the conclusion is independent. R code:

1 - pchisq(0.6716, 2)
[1] 0.714766

We can calculate $\chi^2$ statistic by using R built-in function

chisq.test()
data = matrix(c(146, 18, 51, 191, 26, 79), ncol = 2)
chisq.test(data) Pearson's Chi-squared test data: data
X-squared = 0.6716, df = 2, p-value = 0.7148

加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 5 Window to a Wider World的更多相关文章

  1. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 4 Dependent Samples

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  2. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 3 One-sample and two-sample tests

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  3. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 2 Testing Statistical Hypotheses

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  4. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 1 Estimating unknown parameters

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  5. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: FINAL

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  6. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Final

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  7. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 5 The accuracy of simple random samples

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  8. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 4 The Central Limit Theorem

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  9. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 3 The law of averages, and expected values

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

随机推荐

  1. vbs keys

    其使用格式为: object.SendKeys string "object":表示WshShell对象 "string":表示要发送的按键指令字符串,需要放在 ...

  2. 塔吊力矩限制器,塔吊黑匣子,塔吊电脑,tower crane

    塔机力矩限制器,tower crane 适用于各种类型的固定臂塔机和可变臂塔机 塔机力矩限制器是塔式起重机机械的安全保护装置,本产品采用32位高性能微处理器为硬件平台,软件算法采用国内最先进的三滑轮取 ...

  3. 使用Spring Sleuth和Zipkin跟踪微服务

    随着微服务数量不断增长,需要跟踪一个请求从一个微服务到下一个微服务的传播过程, Spring Cloud Sleuth 正是解决这个问题,它在日志中引入唯一ID,以保证微服务调用之间的一致性,这样你就 ...

  4. js的Object和Function

    自己闲的没事干,自己想通过js的了解写一个Function和Object之间的关系,可以肯定的是我写错了,但是希望可以有所启发. Function和Object Function.__proto__ ...

  5. [BZOJ 1260][CQOI2007]染色(DP)

    题目:http://www.lydsy.com:808/JudgeOnline/problem.php?id=1260 分析: f[i][j]表示i~j刷成s[i]~s[j]这个样子需要的最小次数 则 ...

  6. Go--避免SQL注入

    避免SQL注入 什么是SQL注入 SQL注入攻击(SQL Injection),简称注入攻击,是Web开发中最常见的一种安全漏洞.可以用它来从数据库获取敏感信息,或者利用数据库的特性执行添加用户,导出 ...

  7. Android ListView 详解

    我做Android已经有一段时间了,想想之前在学习Android基础知识的时候看到了许许多多博主的博文 和许多的论坛.网站.那时候就非常感谢那些博主们能吧自己的知识分享在互联网上,那时候我就想 如果我 ...

  8. 【BZOJ 3809】Gty的二逼妹子序列

    这个莫队如果用线段树来维护的话,复杂度是$O(n\sqrt{n}logn+qlogn)$ 很明显,可以看出来莫队每次$O(1)$的移动因为套上了线段树变成了$O(logn)$,但莫队移动的总数是非常大 ...

  9. mysql优化基础

    唯一索引(unique index)强调唯一,就是索引值必须唯一. create unique index [索引名] on 表名 (列名);alter table 表名 add unique ind ...

  10. 【CodeForces 589F】Gourmet and Banquet(二分+贪心或网络流)

    F. Gourmet and Banquet time limit per test 2 seconds memory limit per test 512 megabytes input stand ...