Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授。

PDF笔记下载(Academia.edu)

Summary

Chi-square test

  • Random sample or not / Good or bad

    • $$H_0: \text{Good model}$$ $$H_A: \text{Not good model}$$
    • Based on the expected proportion to calculate the expected values
    • $\chi^2$ statistic is $$\chi^2=\sum{\frac{(o-e)^2}{e}}$$ where $o$ is observed values, $e$ is expected values.
    • The degree of freedom is the number of categories minus one
    • Follows approximately the $\chi^2$ distribution, we can calculate its P-value by using R function:
      1-pchisq(chi, df)
  • Independent or not
    • $$H_0: \text{Independent}$$ $$H_A: \text{not Independent}$$
    • Contingency table
    • Under $H_0$, in each cell of the table $$\text{expected count}=\frac{\text{row total}\times\text{column total}}{\text{grand total}}$$ That is, $P(A\cap B)=P(A)\cdot P(B)$ under the independent assumption.
    • $\chi^2$ statistic is $$\chi^2=\sum{\frac{(o-e)^2}{e}}$$ where $o$ is observed values, $e$ is expected values.
    • The degree of freedom is $(\text{row}-1)\times(\text{column}-1)$
    • Follows approximately the $\chi^2$ distribution, we can calculate its P-value by using R function:
      1-pchisq(chi, df)

ADDITIONAL PRACTICE PROBLEMS FOR WEEK 5

The population is all patients at a large system of hospitals; each sampled patient was classified by the type of room he/she was in, and his/her level of satisfaction with the care received. The question is whether type of room is independent of level of satisfaction.

1. What are the null and alternative hypotheses?

2. Under the null, what is the estimated expected number of patients in the "shared room, somewhat satisfied" cell?

3. Degrees of freedom = ( )

4. The chi-square statistic is about 13.8. Roughly what is the P-value, and what is the conclusion of the test?

Solution

1. Null: The two variables are independent; Alternative: The two variables are not independent.

2. We need to expand the original table:

Thus the estimated expected number of patients in the shared room, somewhat satisfied is $$784\times\frac{322}{784}\times\frac{255}{784}=104.7321$$

3. Degree of freedom is $(3-1)\times(3-1)=4$

4. P-value is 0.007961505 which is smaller than 0.05, so we reject $H_0$. That is, the conclusion is the two variables are not independent. R code:

1 - pchisq(13.8, 4)
[1] 0.007961505

UNGRADED EXERCISE SET A PROBLEM 1

According to a genetics model, plants of a particular species occur in the categories A, B, C, and D, in the ratio 9:3:3:1. The categories of different plants are mutually independent. At a lab that grows these plants, 218 are in Category A, 69 in Category B, 84 in Category C, and 29 in Category D. Does the model look good? Follow the steps in Problems 1A-1F.

1A The null hypothesis is:

a. The model is good.

b. The model isn't good.

c. Too many of the plants are in Category C.

d. The proportion of plants in Category A is expected to be 9/16; the difference in the sample is due to chance.

1B The alternative hypothesis is:

a. The model is good.

b. The model isn't good.

c. Too many of the plants are in Category C.

d. The proportion of plants in Category A is expected to be 9/16; the difference in the sample is due to chance.

1C Under the null, the expected number of plants in Category D is( ).

1D The chi-square statistic is closest to

a. 1 b. 1.5 c. 2 d. 2.5 e. 3 f. 3.5 g. 4 h. 4.5

1E Degrees of freedom = ( ).

1F Based on this test, does the model look good? Yes No

Solution

1A) The null hypothesis is "the model is good". (a) is correct.

1B) The alternative hypothesis is "the model is not good". (b) is correct.

1C) The expected number of plants in Category D is $$(218+69+84+29)\times\frac{1}{9+3+3+1}=25$$

1D) (d) is correct. We can use the following table

R code:

o = c(218, 69, 84, 29)
e = c(225, 75, 75, 25)
chi = sum((o - e)^2 / e); chi
[1] 2.417778

1E) Degree of freedom is $4-1=3$.

1F) P-value is 0.4903339 which is larger than 0.05, so we reject $H_A$. The conclusion is "the model is good". R code:

1 - pchisq(chi, 3)
[1] 0.4903339

PROBLEM 2

A simple random sample of cars in a city was categorized according to fuel type and place of manufacture.

Are place of manufacture and fuel type independent? Follow the steps in Problems 2A-2D.

2A If the two variables were independent, the chance that a sampled car is a domestic gasoline fueled car would be estimated to be about

0.0362 0.0499 0.2775 0.3820 0.5

2B If the two variables were independent, the expected number of foreign gas/electric hybrids would be estimated to be ( ). (Please keep at least two decimal places; by now you should understand why you should not round off to an integer.)

2C Degrees of freedom =( )

1 2 3 4

2D The chi-square statistic is 0.6716. The test therefore concludes that the two variables are independent not independent

Solution

2A) Expand the table:

If the two variables were independent, then $$P(\text{domestic gasoline})=P(\text{domestic})\cdot P(\text{gasoline})=\frac{215}{511}\times\frac{337}{511}=0.2774767\doteq 0.2775$$

2B) If the two variables were independent, then $$511\times P(\text{foreign gasoline/electricity})=511\times\frac{296}{511}\times\frac{130}{511}=75.30333$$

2C) Degree of freedom is $(2-1)\times(3-1)=2$.

2D) The P-value is 0.714766 which is larger than 0.05, so we reject $H_A$. That is, the conclusion is independent. R code:

1 - pchisq(0.6716, 2)
[1] 0.714766

We can calculate $\chi^2$ statistic by using R built-in function

chisq.test()
data = matrix(c(146, 18, 51, 191, 26, 79), ncol = 2)
chisq.test(data) Pearson's Chi-squared test data: data
X-squared = 0.6716, df = 2, p-value = 0.7148

加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 5 Window to a Wider World的更多相关文章

  1. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 4 Dependent Samples

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  2. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 3 One-sample and two-sample tests

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  3. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 2 Testing Statistical Hypotheses

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  4. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 1 Estimating unknown parameters

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  5. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: FINAL

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  6. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Final

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  7. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 5 The accuracy of simple random samples

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  8. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 4 The Central Limit Theorem

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  9. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 3 The law of averages, and expected values

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

随机推荐

  1. Java7并发编程实战(一) 线程的等待

    试想一个情景,有两个线程同时工作,还有主线程,一个线程负责初始化网络,一个线程负责初始化资源,然后需要两个线程都执行完毕后,才能执行主线程 首先创建一个初始化资源的线程 public class Da ...

  2. 机械大楼电梯控制项目软件 -- github团队组建

    目前在Github网站上建立了机械大楼电梯控制项目软件的软件仓库(Repository),提供了软件功能需求说明文档和Automation Studio程序模板.地址为 https://github. ...

  3. 实现Linux与Windows下一致的命令行

    这其实是个非常简单的东西. 我们会写一些命令行的工具,一般跨平台的话,会用python或者perl写,比如叫foo.py,然后在Windows和Linux下调用这个脚本: Linux: foo.py ...

  4. ROS系统python代码测试之rostest

    ROS系统中提供了测试框架,可以实现python/c++代码的单元测试,python和C++通过不同的方式实现, 之后的两篇文档分别详细介绍各自的实现步骤,以及测试结果和覆盖率的获取. ROS系统中p ...

  5. Spring 依赖注入方式详解

    平常的Java开发中,程序员在某个类中需要依赖其它类的方法. 通常是new一个依赖类再调用类实例的方法,这种开发存在的问题是new的类实例不好统一管理. Spring提出了依赖注入的思想,即依赖类不由 ...

  6. TextBox自定义控件

    首先来一发图: 今天主要说的textBox内部给予提示: 使用自定义控件方式:TextBoxTip继承TextBox 利用TextBox的背景画刷功能 VisualBrush是一种比较特殊的笔刷,它的 ...

  7. SQL2008R2 不支持用该后端版本设计数据库关系图或表

    向下不兼容. 要么安装SQL2012,要么把SQL2012数据库通过脚本转成2008

  8. swfupload提示“错误302”的解决方法

    1.关于图片上传控件,flash控件的显示效果要好一些,本人使用swfupload 2.swfupload上传控件使用方式详见文档 http://www.leeon.me/upload/other/s ...

  9. Hibernate Tools 自动生成hibernate的hbm文件

    本文有待商榷 当我们在新增插件的时候发现会出现duplicate location,意思是所选的anchive所包含的zip路径已经复用,现象如下: 如上图所示黄色标记部分“Duplicate loc ...

  10. android 调用电话功能

    今天用到了打电话的功能,这要如何实现呢? 很简单 1.创建对应对的xml展示页面喝java文件 2.在manifest中添加权限 下面上代码吧: 这是布局的一部分 <LinearLayout a ...