Sample Means(耶鲁大学教材)
Sample Means
The sample mean from a group of observations is an estimate of the population mean
.
Given a sample of sizen, consider n independent random variables
X1,X2, ..., Xn, each corresponding to one randomly selected observation. Each of these variables has the distribution of the population, with mean
and standard deviation. The sample mean is defined to be
.
By the properties of means and variances of random variables, the mean and variance of the sample mean are the following:

Although the mean of the distribution of is identical to the mean of the population distribution, the variance is much smaller for large sample sizes.
For example, suppose the random variable X records a randomly selected student's score on a national test, where the population distribution for the score is normal with mean 70 and standard deviation 5 (N(70,5)). Given asimple
random sample (SRS) of 200 students, the distribution of the sample mean score has mean 70 and standard deviation 5/sqrt(200) = 5/14.14 = 0.35.
Distribution of the Sample Mean
When the distribution of the population is normal, then the distribution of the sample mean is also normal. For a normal population distribution with mean and
standard deviation, the distribution of the sample mean is normal, with mean
and standard deviation.
This result follows from the fact that any linear combination of independent normal random variables is also normally distributed. This means that for two independent normal random variablesX and
Y and any constants a and b, aX + bY will be normally distributed. In the case of the sample mean, the linear combination is
=(1/n)*(X1 + X2 + ... Xn).
For example, consider the distributions of yearly average test scores on a national test in two areas of the country. In the first area, the test scoreX is normally distributed with mean 70 and standard deviation 5. In the second area, the yearly
average test scoreY is normally distributed with mean 65 and standard deviation 8. The differenceX - Y between the two areas is normally distributed, with mean 70-65 = 5 and variance 5² + 8² = 25 + 64 = 89. The standard deviation is the square
root of the variance, 9.43. The probability that areaX will have a higher score than area
Y may be calculated as follows:
P(X > Y) = P(X - Y > 0)
= P(((X - Y) - 5)/9.43 > (0 - 5)/9.43)
= P(Z > -0.53) = 1 - P(Z < -0.53) = 1 - 0.2981 = 0.7019.
Area X will have a higher average score than area Y about 70% of the time.
The Central Limit Theorem
The most important result about sample means is the Central Limit Theorem. Simply stated, this theorem says that for a large enough sample sizen, the distribution of the sample mean
will approach a normal distribution.This is true for a sample of independent random variables from
any population distribution, as long as the population has a finite standard deviation.
A formal statement of the Central Limit Theorem is the following:
If is the mean of a random sampleX1, X2, ... , Xn of size
n from a distribution with a finite mean and a finite positive variance
²,
then the distribution ofW = isN(0,1) in the limit as n approaches infinity.
This means that the variable is distributedN(
,
).
One well-known application of this theorem is the normal approximation to the binomial distribution.
Example
Using the MINITAB "RANDOM" command with the "UNIFORM" subcommand, I generated 100 samples of size 50 each from the Uniform(0,1) distribution. The mean of this distribution is 0.5, and its standard deviation is approximately 0.3. I then applied the "RMEAN" command
to calculate the sample mean across the rows of my sample, resulting in 50 sample mean values (each of which represents the mean of 100 observations). The MINITAB "DESCRIBE" command gave the following information about the sample mean data:
- Descriptive Statistics
- Variable N Mean Median Tr Mean StDev SE Mean
- C101 50 0.49478 0.49436 0.49450 0.02548 0.00360
- Variable Min Max Q1 Q3
- C101 0.43233 0.55343 0.47443 0.51216
The mean 0.49 is nearly equal to the population mean 0.5. The desired value for the standard deviation is the population standard deviation divided by the square root of the size of the sample (which is 10 in this case), approximately 0.3/10 = 0.03. The calculated
value for this sample is 0.025. To evaluate the normality of the sample mean data, I used the "NSCORES" and "PLOT" commands to create a normal quantile plot of the data, shown below.
The plot indicates that the data follow an approximately normal distribution, lying close to a diagonal line through the main body of the points.
Sample Means(耶鲁大学教材)的更多相关文章
- Joel在耶鲁大学的演讲
Joel Spolsky是一个美国的软件工程师,他的网络日志"Joel谈软件"(Joel on Software)非常有名,读者人数可以排进全世界前100名. 上个月28号,他回到 ...
- 如何获得大学教材的PDF版本?
最近急需一本算法书的配套答案,这本配套单独出售,好像在市面上还买不到,在淘宝上搜索也只是上一个版本,并没有最新版本,让我很无奈.加上平时肯定会有这么一种情况,想看一些书,但买回来也看不了几次,加上计算 ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 4 Dependent Samples
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 3 One-sample and two-sample tests
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 世界名校网络课程大盘点,美国大学CS专业十三大研究方向,世界50所知名大学提供开放课程
世界名校网络课程大盘点 加州大学伯克利分校http://webcast.berkeley.edu/ 加州大学伯克利分校与斯坦福大学. 麻省理工学院等一同被誉为美国工程科技界的学术 领袖,其常年位居 ...
- 美国大学排名之本科中最用功的学校top15
美国大学排名之本科中最用功的学校top15 威久留学2016-07-29 13:15:59美国留学 留学新闻 留学选校阅读(490)评论(1) 去美国留学的同学可能都知道USnews美国大学排名, ...
- 斯坦福大学Andrew Ng教授主讲的《机器学习》公开课观后感[转]
近日,在网易公开课视频网站上看完了<机器学习>课程视频,现做个学后感,也叫观后感吧. 学习时间 从2013年7月26日星期五开始,在网易公开课视频网站上,观看由斯坦福大学Andrew Ng ...
- 计算机专业-世界大学学术排名,QS排名,U.S.NEWS排名
2015年美国大学计算机专业排名 计算机专业介绍:计算机涉及的领域非常广泛,其分支学科也是非常多.所以在美国将主要的专业方向分为人工智能,程序应用,计算机系统(Systems)以及计算机理论(theo ...
- 办理卡尔加里大学(本科)学历认证『微信171922772』calgary学位证成绩单使馆认证University of calgary
办理卡尔加里大学(本科)学历认证『微信171922772』calgary学位证成绩单使馆认证University of calgary Q.微信:171922772办理教育部国外学历学位认证海外大学毕 ...
随机推荐
- 小程序快捷键(mac中)
快捷键 格式调整 - Ctrl+S:保存文件 - Ctrl+[, Ctrl+]:代码行缩进 - Ctrl+Shift+[, Ctrl+Shift+]:折叠打开代码块 - Ctrl+C Ctrl ...
- Hadoop 系列文章(三) 配置部署启动YARN及在YARN上运行MapReduce程序
这篇文章里我们将用配置 YARN,在 YARN 上运行 MapReduce. 1.修改 yarn-env.sh 环境变量里的 JAVA_HOME 路径 [bamboo@hadoop-senior ha ...
- Python学习第三章
1.模块: 其实每个.py文件本身就是一个模块,当读者做完了一个.py文件,如果别人打算直接分享你的成果,只要在他编写的.py文件中倒入(import)就好了. 比如想在hello1.py文件里直接使 ...
- Mysql主从复制读写分离
一.前言:为什么MySQL要做主从复制(读写分离)?通俗来讲,如果对数据库的读和写都在同一个数据库服务器中操作,业务系统性能会降低.为了提升业务系统性能,优化用户体验,可以通过做主从复制(读写分离)来 ...
- 机器学习之KNN算法
1 KNN算法 1.1 KNN算法简介 KNN(K-Nearest Neighbor)工作原理:存在一个样本数据集合,也称为训练样本集,并且样本集中每个数据都存在标签,即我们知道样本集中每一数据与所属 ...
- .NET MVC 学习笔记(一)— 新建MVC工程
一..NET MVC 学习笔记(一)—— 新建MVC工程 接触MVC有段时间了,一直想找机会整理一下,可是限于文笔太差,所以一直迟迟羞于下手,想到最近做过的MVC项目也有一些了,花点时间整理一下方便以 ...
- 认识不一定熟悉的opencv
对很多人来说,opencv就像在旅行路上遇到的某个人,很有可能,这个只是你生命中的匆匆过客.可是,对于一个立志要做熟悉图像处理的人来说,你不能绕过他. 他是什么? OpenCV是一个基于BSD许可(开 ...
- Win10手记-取色器ColorPicker的实现
最近个人项目需要用到ColorPicker,但是适用于WinRT和Win10的基本没用,所以只能自己造轮子了. 平台环境 Windows 10 Visual Studio 2015 思路 确定需求后, ...
- .NET手记-友盟消息推送服务器端加密算法的实现
最近为App开发消息推送功能,这里我们采用了友盟的消息推送服务,但其后台简陋,可定制化程度低,所以决定接入服务器端API,在自己的服务器上部署一套推送服务. 其中涉及到很多问题,首先要解决的就是与友盟 ...
- Liferay7 BPM门户开发之8: Activiti实用问题集合
1.如何实现审核的上级获取(任务逐级审批) 这个是必备功能,通过Spring的注入+Activiti表达式可以很容易解决. 可参考: http://blog.csdn.net/sunxing007/a ...