加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 5 Window to a Wider World
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授。
Summary
Chi-square test
- Random sample or not / Good or bad
- $$H_0: \text{Good model}$$ $$H_A: \text{Not good model}$$
- Based on the expected proportion to calculate the expected values
- $\chi^2$ statistic is $$\chi^2=\sum{\frac{(o-e)^2}{e}}$$ where $o$ is observed values, $e$ is expected values.
- The degree of freedom is the number of categories minus one
- Follows approximately the $\chi^2$ distribution, we can calculate its P-value by using R function:
1-pchisq(chi, df)
- Independent or not
- $$H_0: \text{Independent}$$ $$H_A: \text{not Independent}$$
- Contingency table
- Under $H_0$, in each cell of the table $$\text{expected count}=\frac{\text{row total}\times\text{column total}}{\text{grand total}}$$ That is, $P(A\cap B)=P(A)\cdot P(B)$ under the independent assumption.
- $\chi^2$ statistic is $$\chi^2=\sum{\frac{(o-e)^2}{e}}$$ where $o$ is observed values, $e$ is expected values.
- The degree of freedom is $(\text{row}-1)\times(\text{column}-1)$
- Follows approximately the $\chi^2$ distribution, we can calculate its P-value by using R function:
1-pchisq(chi, df)
ADDITIONAL PRACTICE PROBLEMS FOR WEEK 5
The population is all patients at a large system of hospitals; each sampled patient was classified by the type of room he/she was in, and his/her level of satisfaction with the care received. The question is whether type of room is independent of level of satisfaction.
1. What are the null and alternative hypotheses?
2. Under the null, what is the estimated expected number of patients in the "shared room, somewhat satisfied" cell?
3. Degrees of freedom = ( )
4. The chi-square statistic is about 13.8. Roughly what is the P-value, and what is the conclusion of the test?
Solution
1. Null: The two variables are independent; Alternative: The two variables are not independent.
2. We need to expand the original table:
Thus the estimated expected number of patients in the shared room, somewhat satisfied is $$784\times\frac{322}{784}\times\frac{255}{784}=104.7321$$
3. Degree of freedom is $(3-1)\times(3-1)=4$
4. P-value is 0.007961505 which is smaller than 0.05, so we reject $H_0$. That is, the conclusion is the two variables are not independent. R code:
1 - pchisq(13.8, 4)
[1] 0.007961505
UNGRADED EXERCISE SET A PROBLEM 1
According to a genetics model, plants of a particular species occur in the categories A, B, C, and D, in the ratio 9:3:3:1. The categories of different plants are mutually independent. At a lab that grows these plants, 218 are in Category A, 69 in Category B, 84 in Category C, and 29 in Category D. Does the model look good? Follow the steps in Problems 1A-1F.
1A The null hypothesis is:
a. The model is good.
b. The model isn't good.
c. Too many of the plants are in Category C.
d. The proportion of plants in Category A is expected to be 9/16; the difference in the sample is due to chance.
1B The alternative hypothesis is:
a. The model is good.
b. The model isn't good.
c. Too many of the plants are in Category C.
d. The proportion of plants in Category A is expected to be 9/16; the difference in the sample is due to chance.
1C Under the null, the expected number of plants in Category D is( ).
1D The chi-square statistic is closest to
a. 1 b. 1.5 c. 2 d. 2.5 e. 3 f. 3.5 g. 4 h. 4.5
1E Degrees of freedom = ( ).
1F Based on this test, does the model look good? Yes No
Solution
1A) The null hypothesis is "the model is good". (a) is correct.
1B) The alternative hypothesis is "the model is not good". (b) is correct.
1C) The expected number of plants in Category D is $$(218+69+84+29)\times\frac{1}{9+3+3+1}=25$$
1D) (d) is correct. We can use the following table
R code:
o = c(218, 69, 84, 29)
e = c(225, 75, 75, 25)
chi = sum((o - e)^2 / e); chi
[1] 2.417778
1E) Degree of freedom is $4-1=3$.
1F) P-value is 0.4903339 which is larger than 0.05, so we reject $H_A$. The conclusion is "the model is good". R code:
1 - pchisq(chi, 3)
[1] 0.4903339
PROBLEM 2
A simple random sample of cars in a city was categorized according to fuel type and place of manufacture.
Are place of manufacture and fuel type independent? Follow the steps in Problems 2A-2D.
2A If the two variables were independent, the chance that a sampled car is a domestic gasoline fueled car would be estimated to be about
0.0362 0.0499 0.2775 0.3820 0.5
2B If the two variables were independent, the expected number of foreign gas/electric hybrids would be estimated to be ( ). (Please keep at least two decimal places; by now you should understand why you should not round off to an integer.)
2C Degrees of freedom =( )
1 2 3 4
2D The chi-square statistic is 0.6716. The test therefore concludes that the two variables are independent not independent
Solution
2A) Expand the table:
If the two variables were independent, then $$P(\text{domestic gasoline})=P(\text{domestic})\cdot P(\text{gasoline})=\frac{215}{511}\times\frac{337}{511}=0.2774767\doteq 0.2775$$
2B) If the two variables were independent, then $$511\times P(\text{foreign gasoline/electricity})=511\times\frac{296}{511}\times\frac{130}{511}=75.30333$$
2C) Degree of freedom is $(2-1)\times(3-1)=2$.
2D) The P-value is 0.714766 which is larger than 0.05, so we reject $H_A$. That is, the conclusion is independent. R code:
1 - pchisq(0.6716, 2)
[1] 0.714766
We can calculate $\chi^2$ statistic by using R built-in function
chisq.test()
data = matrix(c(146, 18, 51, 191, 26, 79), ncol = 2)
chisq.test(data) Pearson's Chi-squared test data: data
X-squared = 0.6716, df = 2, p-value = 0.7148
加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 5 Window to a Wider World的更多相关文章
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 4 Dependent Samples
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 3 One-sample and two-sample tests
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 2 Testing Statistical Hypotheses
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 1 Estimating unknown parameters
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: FINAL
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Final
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 5 The accuracy of simple random samples
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 4 The Central Limit Theorem
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 3 The law of averages, and expected values
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
随机推荐
- WP7开发 Sqlite数据库的使用 解决Unable open the database
WP7本身不支持Sqlite数据库,但我们可以添加第三方组件让它支持Sqlite. 首先在项目中添加引用Community.CsharpSqlite.WP.dll,我会放后面让大家下载,我下了有几天了 ...
- [MetaHook] SearchPattern function
By Nagi void *SearchPattern(void *pStartSearch, DWORD dwSearchLen, char *pPattern, DWORD dwPatternLe ...
- [MCSM] Slice Sampler
1. 引言 之前介绍的MCMC算法都具有一般性和通用性(这里指Metropolis-Hasting 算法),但也存在一些特殊的依赖于仿真分布特征的MCMC方法.在介绍这一类算法(指Gibbs samp ...
- Android Stduio统计项目的代码行数
android studio统计项目的代码行数的步骤如下: 1)按住Ctrl+Shift+A,在弹出的框输入‘find’,然后选择Find in Path.(或者使用快捷键Ctrl+Shift+F) ...
- bat批处理文件启动Eclipse和ivy本地仓库的配置
一.bat批处理文件启动Eclipse 所需文件: 1.eclipse 2.jre 3.startup-eclipse.bat 确保以上三个文件夹同级 startup-eclipse.bat: set ...
- 如何设置div高度为100%
div高度通常都是固定值,直接将div高度设为100%是无效的,那么如何设置才能有效呢? 直接给div设置height:100%即可,无效的原因一定是父元素的高度为0,最常见的就是给body的直接元素 ...
- 74 partprobe-磁盘管理
partprobe命令用于重读分区表,当出现删除文件后,出现仍然占用空间.可以partprobe在不重启的情况下重读分区. 语法 partprobe (选项) (参数) 选项 -d:不更新内核: -s ...
- 把时间转成适合符合日常习惯的格式【js】
假设现在是7月30日12点,我们可以说今天12点,意思也非常明确. 我们习惯说昨天12点,而不习惯说29号12点. 我们习惯说周一12点,而不习惯说28号12点,这样不用翻日历看今天是几号. so,上 ...
- oracle判断字段是否存在语句
declare v_cnt number; begin select count(*) into v_cnt from dba_tab_columns where table_name='T_IDC_ ...
- supervisor
文章转自:http://cpper.info/2016/04/14/supervisor-usage.html在此只是当做笔记使用,不做他用 Linux进程管理工具supervisor安装及使 ...