Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授。

PDF笔记下载(Academia.edu)

Summary

  • Estimating population means and percents
    Sampling assumptions:

    • Simple Random Sample (SRS)
    • Large enough so that the probability histogram for the sample mean / percents is roughly normal, by Central Limit Theorem.
    • Small enough relative to the population size that the correction factor is close to 1.
  • Confidence Interval (CI) $$\mu:\ \bar{X}\pm Z\cdot SE$$ $$p:\hat{p}\pm Z\cdot SE$$ where $\bar{X}, \mu$ are sample mean and population mean respectively, $\hat{p}, p$ are sample percents and population percents respectively. For calculating $Z$ can use R function, for instance, 90% CI
    z = qnorm((1 - 0.9) / 2)

ADDITIONAL PRACTICE PROBLEMS FOR EXERCISE SET 1 PROBLEM 1

A simple random sample of size 300 is taken from a population of hundreds of thousands of adults. The average weight of the sampled people is 150 pounds and the SD of their weights is 30 pounds. a) The average weight of the population is estimated to be _______ pounds; the SE for this estimate is about ______ pounds. b) An approximate 99%-confidence interval for the average weight in the population goes from ________ pounds to _________ pounds.

Solution

a) $$\bar{X}=150, \sigma=30, n=300$$ $$\Rightarrow SE=\frac{\sigma}{\sqrt{n}}=\frac{30}{\sqrt{300}}\doteq1.732051$$ Thus the estimated population mean is 150, the SE is 1.732051.

b) In roughly 99% of all samples, the population mean is in the range $$\bar{X}\pm Z\cdot SE$$ Calculating in R:

x = 150; sd = 30; n = 300
se = sd / sqrt(n)z = qnorm((1 - 0.99) / 2)
x + se
[1] 145.5385
x - se
[1] 154.4615

Therefore, $$\bar{X}\pm Z\cdot SE=[145.5385, 154.4615]$$

PROBLEM 2

In a simple random sample of size 400 taken from over 500,000 workers, 21% of the sampled workers are in carpools.

a) In the population, the percent of workers in carpools is estimated to be _____%; the SE for this estimate is about _____%.

b) An approximate 95%-confidence interval for the percent of carpooling workers in the population goes from _____ % to _____ %.

Solution

a) $$\hat{p}=0.21, n=400$$ $$\Rightarrow SE=\frac{\sqrt{\hat{p}\cdot(1-\hat{p})}}{\sqrt{n}}=\frac{\sqrt{0.21\times0.79}}{\sqrt{400}}\doteq0.02036541$$

b) $$\hat{p}\pm Z\cdot SE=[0.1700845, 0.2499155]$$ R code:

p = 0.21; n = 400
se = sqrt(p * (1 - p) / n)
z = qnorm((1 - 0.95) / 2)
p + z * se
[1] 0.1700845
p - z * se
[1] 0.2499155

PROBLEM 3

A simple random sample of 150 undergraduates is taken at a large university. The average MSAT score of the sampled students is 528 with an SD of 90. Construct an approximate 90%-confidence interval for the average MSAT score of undergraduates at the university.

Solution $$\bar{X}=528, \sigma=90, n=150\Rightarrow SE=\frac{\sigma}{\sqrt{n}}$$ Thus the 90%-confidence interval for the average is $$\bar{X}\pm Z\cdot SE=[515.9128, 540.0872]$$ R code:

x = 528; sd = 90; n = 150
se = sd / sqrt(n); z = qnorm((1 - 0.9) / 2)
x + z * se
[1] 515.9128
x - z * se
[1] 540.0872

PROBLEM 4

In a simple random sample of 500 students taken at a large university, 180 have undeclared majors. Construct an approximate 85%-confidence interval for the percent of students at the university who have undeclared majors.

Solution $$\hat{p}=\frac{180}{500}=0.36, n=500\Rightarrow SE=\frac{\sqrt{\hat{p}\cdot(1-\hat{p})}}{\sqrt{n}}$$ Thus the 85%-confidence interval of the percentage is $$\hat{p}\pm Z\cdot SE=[0.3290987, 0.3909013]$$ R code:

p = 180 / 500; n = 500
se = sqrt(p * (1 - p) / n); z = qnorm((1 - 0.85) / 2)
p + z * se
[1] 0.3290987
p - z * se
[1] 0.3909013

PROBLEM 5

A simple random sample of 900 households is taken in a city. The average household size in the sample is 2.2 people, with an SD of 2 people.

a) Pick one of the two options: The average household size in the sample is (i) known to be 2.2. (ii) estimated to be 2.2.

b) Pick one of the two options: The average household size in the city is (i) known to be 2.2. (ii) estimated to be 2.2.

c) Pick one of the two options (justify your answer carefully). The distribution of household sizes in the sample (i) is approximately normal. (ii) is not normal, not even approximately.

d) Do you think the distribution of household sizes in the city is approximately normal? Explain.

e) Pick one of the two options (justify your answer carefully). The normal curve (i) can be used (ii) cannot be used to construct an approximate 95%-confidence interval for the average household size in the city. If you picked option (i), construct the interval.

f) True or false (explain): Approximately 95% of the households had sizes in the range 2.07 to 2.33 people.

Solution

a) $$\bar{X}=2.2$$ is known. So (i) is correct.

b) $$\mu=\bar{X}=2.2$$ is estimated. So (ii) is correct.

c) More than $\frac{2.2}{2}=1.1$ SD of the values are smaller than 0, which means more than 13.57% households size are lower than 0. This is impossible. Thus the distribution cannot be normal. R code:

1 - pnorm(1.1)
[1] 0.1356661

d) If the large SRS (simple random sample) is not normal then the population would not be normal, too.

e) The confidence interval is constructed using the probability histogram for the sample average. The sample size is large. By the Central Limit Theorem, this probability histogram will be approximately normal no matter what the shape of the distribution of the population. $$\bar{X}=2.2, \sigma=2, n=900\Rightarrow SE=\frac{\sigma}{\sqrt{n}}$$ Thus the 95%-confidence interval for the average household size in the city is $$\bar{X}\pm Z\cdot SE=[2.069336,2.330664]$$ R code:

mu = 2.2; sd = 2; n = 900
se = sd / sqrt(n); z = qnorm((1 - 0.95) / 2)
mu + z * se
[1] 2.069336
mu - z * se
[1] 2.330664

f) False. The number of household size should be integer. That is, no household contains between 2.07 and 2.33 people. This interval only provides an estimation for the average household size.

PROBLEM 6

A survey organization took a simple random sample of 275 units out of all the rental units in a city. The average monthly rent of the sampled units was \$920 and the SD was \$500. There were 964 people living in the sampled units, and there were 120 children among the these 964 people. In parts (a)-(c) construct an approximate 68%-confidence interval for the given quantity, if possible. If this is not possible, explain why not.

a) the average monthly rent of the sampled units

b) the average monthly rent of all the rental units in the city

c) the percent of children among all people living in rental units in the city

d) "About 68% of the sampled units had rents in the range \$420 to \$1420." Do you agree with the quoted statement? Why or why not?

Solution

a) Not possible. It is known to be 920.

b) $$\bar{X}=920, n=275, \sigma=500\Rightarrow SE=\frac{\sigma}{\sqrt{n}}$$ Thus the 68%-confidence interval of the average monthly rent of the population is $$\bar{X}\pm Z\cdot SE=[890.016,949.984]$$ R code:

mu = 920; sd = 500; n = 275
se = sd / sqrt(n); z = qnorm((1 - 0.68) / 2)
mu + z * se
[1] 890.016
mu - z * se
[1] 949.984

c) Not possible. The 964 people are not a SRS of the population, but instead it is a cluster sample of the population.

d) Disagree. Because the sample is not normal. 2 SDs below average takes into negative number: $920-500\times2 < 0$.

EXERCISE SET 1 PROBLEM 1

A simple random sample of 100 people is taken from all the people in a city. The ages of the sampled people have an average of 35 years and an SD of 15 years.

1A 35 years is known to be the average age of the people in the sample city

1B The interval "32 years to 38 years" is an approximate 95% confidence interval for the average age of the people in the sample city

1C In the interval in Problem 2 above, the observed average age of the people in the sample city is being used as an estimate of the unknown average age of the people in the sample city

1D Pick all of the options that will correctly complete the sentence: The normal curve used in the construction of the interval in Problem 2 is an approximation to the

a) histogram of ages in the sample

b) histogram of ages in the city

c) probability histogram of the average age in a simple random sample of 100 people drawn from the city

d) probability histogram of the average age in the city

e) histogram of the averages of all possible simple random samples of 100 people drawn from the city

Solution

1A) in the sample.

1B) The average age of the people in the city is unknown, so it needs to be estimated. $$\mu=\bar{X}=35, \sigma=15, n=100\Rightarrow SE=\frac{\sigma}{\sqrt{n}}$$ Thus the 95%-confidence interval is $$\mu\pm Z\cdot SE=[32, 38]$$ R code:

mu = 35; sd = 15; n = 100
se = sd / sqrt(n); z = qnorm((1 - 0.95) / 2)
mu + z * se
[1] 32.06005
mu - z * se
[1] 37.93995

1C) The observed average age of the people in the sample is being used as an estimate of the unknown average age of the people in the city.

1D) c) and e) are correct. By Central Limit Theorem, this is a simple random sample from a large population so its probability histogram is normal. Note that the histogram of ages in the sample is not normal because $\bar{X}-3\sigma=-10 < 0$, but the age cannot be negative. Thus a) and b) are incorrect. d) is the point probability which is 100%.

PROBLEM 2

A simple random sample of 400 households is taken in a large city, and the number of cars in each sampled household is noted. The average number of cars in the sampled households is 1.8 and the SD is 1.3. Among the sampled households, 10% had no car.

2A An approximate 90% confidence interval for the average number of cars among households in the city goes from _____ to ____.

2B An approximate 99% confidence interval for the percent of city households that have no car goes from _____% to ____%.

Solution

2A) $$\bar{X}=1.8, \sigma=1.3, n=400\Rightarrow SE=\frac{\sigma}{\sqrt{n}}$$ Thus the 90%-confidence inter val for the average number of cars in the population is $$\bar{X}\pm Z\cdot SE=[1.693085, 1.906915]$$ R code:

mu = 1.8; sd = 1.3; n = 400
se = sd / sqrt(n); z = qnorm((1 - 0.9) / 2)
mu + z * se
[1] 1.693085
mu - z * se
[1] 1.906915

2B) $$\hat{p}=0.1, n=400\Rightarrow SE=\frac{\sqrt{\hat{p}\cdot(1-\hat{p})}}{\sqrt{n}}$$ Thus the 99%-confidence interval for the percent of the population is $$\hat{p}\pm Z\cdot SE=[0.06136256, 0.1386374]$$ R code:

p = 0.1; n = 400
se = sqrt(p * (1 - p) / n); z = qnorm((1 - 0.99) / 2)
p + z * se
[1] 0.06136256
p - z * se
[1] 0.1386374

PROBLEM 3

A simple random sample of 400 students is taken at a large university. The average height of the sampled students is 68 inches and the SD is 2 inches. The distribution of heights in the sample follows the normal curve very closely.

3A An approximate 68% confidence for the average height of students at the university is 68 inches plus or minus _____ inches.

3B Approximately 68% of the students in the sample have heights in the range 68 inches plus or minus _____ inches.

Solution

3A) $$\sigma=2, n=400\Rightarrow Z\cdot SE=Z\cdot\frac{\sigma}{\sqrt{n}}=0.09944579$$ R code:

sd = 2; n = 400
se = sd / sqrt(n)
se = sd / sqrt(n); z = qnorm((1 - 0.68) / 2)
z * se
[1] -0.09944579

3B) The distribution of heights in the sample is close to normal. Thus 68%-confidence interval will be "average plus or minue one SD", that is, the answer is 2 inches.

PROBLEM 4

A simple random sample of voters is taken from the voters in a large state. Using the methods of our course, researchers construct an approximate 99% confidence interval for the percent of the state’s voters who will vote for Candidate A. The interval goes from 37.3% to 48.7%.

4A In the sample, the percent of voters who will vote for Candidate A is equal to ____%.

4B An approximate 95% confidence interval for the percent of the state’s voters who will vote for Candidate A goes from ___% to __%.

Solution

4A) The sample percent is the center of the interval: $$\hat{p}=(37.3\%+48.7\%)\times\frac{1}{2}=43\%$$

4B) $$\text{CI}_{99\%}=\hat{p}\pm Z_1\cdot SE=[0.373, 0.487]$$ Thus the 95%-confidence interval is $$\hat{p}\pm Z_2\cdot SE=[0.3866284, 0.4733716]$$ R code:

p = 0.43; z1 = qnorm((1 - 0.99) / 2); z2 = qnorm((1 - 0.95) / 2)
p + (0.487 - p) / z1 * z2
[1] 0.4733716
p - (0.487 - p) / z1 * z2
[1] 0.3866284

PROBLEM 5

A simple random sample of voters will be taken in a large state. Researchers will use the methods of our course to construct an approximate 95% confidence interval for the percent of the state’s voters who will vote for Candidate X. The minimum sample size needed to ensure that the width of the interval (right end minus left end) is at most 6% is __________. (Fill in the blank with a positive integer; correct to the nearest 50 is OK.)

Solution $$\hat{p}=0.5,\text{CI}_{95\%}=\hat{p}\pm Z\cdot SE$$ $$\Rightarrow 2\cdot Z\cdot SE\leq6\%$$ $$\Rightarrow SE\leq\frac{0.06}{2\cdot Z}$$ $$\Rightarrow \frac{\sqrt{\hat{p}\cdot(1-\hat{p})}}{\sqrt{n}}\leq\frac{0.06}{2\cdot Z}$$ $$\Rightarrow\frac{\hat{p}\cdot(1-\hat{p})}{n}\leq(\frac{0.06}{2\cdot Z})^2$$ $$\Rightarrow n\geq\frac{\hat{p}\cdot(1-\hat{p})}{(\frac{0.06}{2\cdot Z})^2}$$ $$\Rightarrow n\geq1067.072$$ R code:

p = 0.5; z = -qnorm((1 - 0.95) / 2)
p * (1 - p) / (0.06 / (2 * z))^2
[1] 1067.072

加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 1 Estimating unknown parameters的更多相关文章

  1. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 5 Window to a Wider World

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  2. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 4 Dependent Samples

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  3. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 3 One-sample and two-sample tests

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  4. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 2 Testing Statistical Hypotheses

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  5. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: FINAL

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  6. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Final

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  7. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 5 The accuracy of simple random samples

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  8. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 4 The Central Limit Theorem

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  9. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 3 The law of averages, and expected values

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

随机推荐

  1. TRIGGER command denied to user 'root'@'LAPTOP-M7KUFN86' for table 'growtest' | Table 'MyDatabase.tmpIdentity_Invites' doesn't exist

    是因为创建表的时候,用户权限不够 NaviCat for Mysql 用这个工具打开MYSQL 在用户 下找到 root@% 这个用户,双击打开 设置服务器权限,最后两个权限勾上就OK 了,需要把MY ...

  2. lecture5-对象识别与卷积神经网络

    Hinton第五课 突然不知道object recognition 该翻译成对象识别好,还是目标识别好,还是物体识别好,但是鉴于范围性,还是翻译成对象识别吧.这一课附带了两个论文<Convolu ...

  3. NVelocity学习笔记一——linq2sql+NVelocity完整demo

    (一)前言      刚刚进入新公司,看公司的项目,发现开发流程几乎和以前的完全不同,再看看页面布局竟然没有发现html.神马情况????一番探究发现使用了NVelocity模板引擎开发的.于是乎花了 ...

  4. 解决 未能从程序集“System.ServiceModel, Version=3.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089”中加载

    先安装了 IIS ,再安装了 .net framework4.0 ,这样一来就要在cmd下注册.net framework4.0 步骤 第一步:修改配置文件 %windir%/system32/ine ...

  5. 关于Hellas和Greece

    一直以来我就好奇,为什么希腊的中文名字“希腊”和英文名字”Greece”听起来都不像(就像“德国”不像“Germany”一样),而且,为什么在很多体育比赛中看到希腊运动员的衣服上都是“Hellas”, ...

  6. mybatis generator使用总结

    一.mybatis项目的体系结构 百度mybaits,可以进入mybatis的github:https://github.com/mybatis. mybatis是一个大大的体系,它不是孤立的,它可以 ...

  7. Hibernate之Annotation(注解的方式,非映射)

    在hibernate 3.0之后,可以建立一个符合JPA标准的Annotation,以hibernate3.3.2GA为例 Annotation 以 hibernate Annotation 3.3. ...

  8. Windows平台下安装Hadoop

    今天参照这个网址(http://www.cnblogs.com/kinglau/archive/2013/08/20/3270160.html)安装了下,前面七步没有问题. 到第八步出问题了,后来看了 ...

  9. [转]实体类(VO,DO,DTO)的划分

    原文地址:http://blog.sina.com.cn/s/blog_7a4cdec80100wkyh.html 经常会接触到VO,DO,DTO的概念,本文从领域建模中的实体划分和项目中的实际应用情 ...

  10. spring-从普通java类取得注入spring Ioc容器的对象的方案

    1.启动服务时通过spring容器的监听器(继承ContextLoaderListener 监听器的方法) public class ListenerSpringContext extends Con ...