The Central Limit Theorem (CLT), and the concept of the sampling distribution, are critical for understanding why statistical inference works. There are at least a handful of problems that require you to invoke the Central Limit Theorem on every ASQ Certified Six Sigma Black Belt (CSSBB) exam. The CLT says that if you take many repeated samples from a population, and calculate the averages or sum of each one, the collection of those averages will be normally distributed… and it doesn’t matter what the shape of the source distribution is!

I wrote some R code to help illustrate this principle for my students. This code allows you to choose a sample size (n), a source distribution, and parameters for that source distribution, and generate a plot of the sampling distributions of the mean, sum, and variance. (Note: the sampling distribution for the variance is a Chi-square distribution!)

sdm.sim <- function(n,src.dist=NULL,param1=NULL,param2=NULL) {
r <- 10000 # Number of replications/samples - DO NOT ADJUST
# This produces a matrix of observations with
# n columns and r rows. Each row is one sample:
my.samples <- switch(src.dist,
"E" = matrix(rexp(n*r,param1),r),
"N" = matrix(rnorm(n*r,param1,param2),r),
"U" = matrix(runif(n*r,param1,param2),r),
"P" = matrix(rpois(n*r,param1),r),
"C" = matrix(rcauchy(n*r,param1,param2),r),
"B" = matrix(rbinom(n*r,param1,param2),r),
"G" = matrix(rgamma(n*r,param1,param2),r),
"X" = matrix(rchisq(n*r,param1),r),
"T" = matrix(rt(n*r,param1),r))
all.sample.sums <- apply(my.samples,1,sum)
all.sample.means <- apply(my.samples,1,mean)
all.sample.vars <- apply(my.samples,1,var)
par(mfrow=c(2,2))
hist(my.samples[1,],col="gray",main="Distribution of One Sample")
hist(all.sample.sums,col="gray",main="Sampling Distributionnof
the Sum")
hist(all.sample.means,col="gray",main="Sampling Distributionnof the Mean")
hist(all.sample.vars,col="gray",main="Sampling Distributionnof
the Variance")
}

There are 9 population distributions to choose from: exponential (E), normal (N), uniform (U), Poisson (P), Cauchy (C), binomial (B), gamma (G), Chi-Square (X), and the Student’s t distribution (t). Note also that you have to provide either one or two parameters, depending upon what distribution you are selecting. For example, a normal distribution requires that you specify the mean and standard deviation to describe where it’s centered, and how fat or thin it is (that’s two parameters). A Chi-square distribution requires that you specify the degrees of freedom (that’s only one parameter). You can find out exactly what distributions require what parameters by going here:http://en.wikibooks.org/wiki/R_Programming/Probability_Distributions.

Here is an example that draws from an exponential distribution with a mean of 1/1 (you specify the number you want in the denominator of the mean):

sdm.sim(50,src.dist="E",param1=1)

The code above produces this sequence of plots:

You aren’t allowed to change the number of replications in this simulation because of the nature of the sampling distribution: it’s a theoretical model that describes the distribution of statistics from an infinite number of samples. As a result, if you increase the number of replications, you’ll see the mean of the sampling distribution bounce around until it converges on the mean of the population. This is just an artifact of the simulation process: it’s not a characteristic of the sampling distribution, because to be a sampling distribution, you’ve got to have an infinite number of samples. Watkins et al. have a great description of this effect that all statistics instructors should be aware of. I chose 10,000 for the number of replications because 1) it’s close enough to infinity to ensure that the mean of the sampling distribution is the same as the mean of the population, but 2) it’s far enough away from infinity to not crash your computer, even if you only have 4GB or 8GB of memory.

Here are some more examples to try. You can see that as you increase your sample size (n), the shapes of the sampling distributions become more and more normal, and the variance decreases, constraining your estimates of the population parameters more and more.

sdm.sim(10,src.dist="E",1)
sdm.sim(50,src.dist="E",1)
sdm.sim(100,src.dist="E",1)
sdm.sim(10,src.dist="X",14)
sdm.sim(50,src.dist="X",14)
sdm.sim(100,src.dist="X",14)
sdm.sim(10,src.dist="N",param1=20,param2=3)
sdm.sim(50,src.dist="N",param1=20,param2=3)
sdm.sim(100,src.dist="N",param1=20,param2=3)
sdm.sim(10,src.dist="G",param1=5,param2=5)
sdm.sim(50,src.dist="G",param1=5,param2=5)
sdm.sim(100,src.dist="G",param1=5,param2=5) 转自:http://www.r-bloggers.com/sampling-distributions-and-central-limit-theorem-in-r/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+RBloggers+%28R+bloggers%29

Sampling Distributions and Central Limit Theorem in R(转)的更多相关文章

  1. Sampling Distribution of the Sample Mean|Central Limit Theorem

    7.3 The Sampling Distribution of the Sample Mean population:1000:Scale are normally distributed with ...

  2. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 4 The Central Limit Theorem

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  3. 【概率论】6-3:中心极限定理(The Central Limit Theorem)

    title: [概率论]6-3:中心极限定理(The Central Limit Theorem) categories: - Mathematic - Probability keywords: - ...

  4. Appendix 1- LLN and Central Limit Theorem

    1. 大数定律(LLN) 设Y1,Y2,……Yn是独立同分布(iid,independently identically distribution)的随机变量,A = SY /n = (Y1+...+ ...

  5. Law of large numbers and Central limit theorem

    大数定律 Law of large numbers (LLN) 虽然名字是 Law,但其实是严格证明过的 Theorem weak law of large number (Khinchin's la ...

  6. 中心极限定理(Central Limit Theorem)

    中心极限定理:每次从总体中抽取容量为n的简单随机样本,这样抽取很多次后,如果样本容量很大,样本均值的抽样分布近似服从正态分布(期望为  ,标准差为 ). (注:总体数据需独立同分布) 那么样本容量n应 ...

  7. 中心极限定理 | central limit theorem | 大数定律 | law of large numbers

    每个大学教材上都会提到这个定理,枯燥地给出了定义和公式,并没有解释来龙去脉,导致大多数人望而生畏,并没有理解它的美. <女士品茶>有感 待续~ 参考:怎样理解和区分中心极限定理与大数定律?

  8. 【转载】Recommendations with Thompson Sampling (Part II)

    [原文链接:http://engineering.richrelevance.com/recommendations-thompson-sampling/.] [本文链接:http://www.cnb ...

  9. (main)贝叶斯统计 | 贝叶斯定理 | 贝叶斯推断 | 贝叶斯线性回归 | Bayes' Theorem

    2019年08月31日更新 看了一篇发在NM上的文章才又明白了贝叶斯方法的重要性和普适性,结合目前最火的DL,会有意想不到的结果. 目前一些最直觉性的理解: 概率的核心就是可能性空间一定,三体世界不会 ...

随机推荐

  1. AngularJS进入使用前的准备工作

    安装 AngularJS是以JavaScript文件形式发布的,可以通过 script 标签添加到网页中. 下载地址:https://github.com/angular/angular.js/rel ...

  2. 统计web应用程序的访问人数

    新建一个空网站,添加Global.asax全局处理. 文件目录如图: 在global类中添加代码: using System; using System.Collections.Generic; us ...

  3. JavaScript定时器分析

    一.事件循环 JavaScript是单线程,同一个时间只能做一件事情,所以执行任务需要排队.如果前一个耗时很长,那么下一个只能等待. 1)两种任务 为了更好的处理任务,JavaScript语言的设计者 ...

  4. 20170410 --- Linux备课资料 --- vim的使用

    首先我们要了解一下什么是vim? -----> vim是从vi发展出来的一个文本编辑器. 那问题又来了,什么是vi呢? ------> vi 是Unix like (可以理解为linux) ...

  5. C++命名空间【转】

    本讲基本要求 * 掌握:命名空间的作用及定义:如何使用命名空间.     * 了解:使用早期的函数库 重点.难点     ◆命名空间的作用及定义:如何使用命名空间.     在学习本书前面各章时,读者 ...

  6. asp.net SignalR 一对一聊天

    <script src="~/Scripts/jquery-1.8.2.min.js"></script> <script src="~/S ...

  7. Unity属性的封装、继承、方法隐藏

    (一)Unity属性封装.继承.方法隐藏的学习和总结 一.属性的封装 1.属性封装的定义:通过对属性的读和写来保护类中的域. 2.格式例子: private string departname; // ...

  8. webmagic源码学习(一)

    最近工作主要是一些爬虫相关的东西,由于公司需要构建自己的爬虫框架,在调研过程中参考了许多优秀的开源作品,包括webmagic,webcollector,Spiderman等,通过学习这些优秀的源码获益 ...

  9. 请为main函数提供返回值

    很多人甚至市面上的一些书籍,都使用了void main( ) ,其实这是错误的.C/C++ 中从来没有定义过void main( ).C++ 之父 Bjarne Stroustrup 在他的主页上的 ...

  10. Cocos2d-x 3.2 环境搭建

    参考文章地址: 1.Cocos2d-x官方安装说明文档:http://cn.cocos2d-x.org/tutorial/show?id=781 2.CSDN博客:http://blog.csdn.n ...