Two Types of Estimation

One of the major applications of statistics is estimating population parameters from sample statistics. There are types of estimation:
  • Point Estimate: the value of sample statistics

Point estimates of average height with multiple samples (Source: Zhihu)

  • Confidence Intervals: intervals constructed using a method that contains the population parameter a specified proportion of the time.

95% confidence interval of average height with multiple samples (Source: Zhihu)

Confidence Interval for the Mean

Population Variance is known

Suppose that M is the mean of N samples X1, X2, ......, Xn, i.e.

According to Central Limit Theorem, the the sampling distribution of the mean M is

where μ and σ2 are the mean and variance of the population respectively. If repeated samples were taken and the 95% confidence interval computed for each sample, 95% of the intervals would contain the population mean. So the 95% confidence interval for M is the inverval that is symetric about the point estimate μ so that the area under normal distribution is 0.95.

That is,

Since we don't know the mean of population, we could use the sample mean  instead.

Population Variance is Unknown

Dregree of Freedom

The degrees of freedom (df) of an estimate is the number of independent pieces of information on which the estimate is based. In general, the degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated en route to the estimate in question. 

If the variance in a sample is used to estimate the variance in a population, we couldn't calculate the sample variace as

That's because we have two parameters to estimate (i.e., sample mean and sample variance). The degree of freedom should be N-1, so the previous formula underestimates the variance. Instead, we should use the following formula

where s2 is the estimate of the variance and M is the sample mean. The denominator of this formula is the degree of freedom.

Student's t-Distribution

Suppose that X is a random variable of normal distribution, i.e., X ~ N(μ, σ2)

is sample mean and

is sample deviation.

is a random variable of normal distribution.

is a random variable of student's t distribution.

The probability density function of T is

where  is the degree of freedom,  is a gamma function.

The t distribution is very similar to the normal distribution when the estimate of variance is based on many degrees of freedom, but has relatively more scores in its tails when there are fewer degrees of freedom. Here are t distributions with 2, 4, and 10 degrees of freedom and the standard normal distribution. Notice that the normal distribution has relatively more scores in the center of the distribution and the t distribution has relatively more in the tails.

The t distribution is therefore leptokurtic. The t distribution approaches the normal distribution as the degrees of freedom increase. 

Confidence Interval of t Distribution

Now consider the case in which you have a normal distribution but you do not know the standard deviation. You sample N values and compute the sample mean (M) and estimate the standard error of the mean (σM) with sM. What is the probability that M will be within 1.96 sM of the population mean (μ)? This is a difficult problem because there are two ways in which M could be more than 1.96 sM from μ: (1) M could, by chance, be either very high or very low and (2) sM could, by chance, be very low. Intuitively, it makes sense that the probability of being within 1.96 standard errors of the mean should be smaller than in the case when the standard deviation is known (and cannot be underestimated).

Luckily, however, we can prove that random variable T will be student's t distribution. So we can use t distribution to estimate the mean of a normal distribution population in situations where the sample size is small and population standard deviation is unknown. For 90% confidence interval, it can be calculated as

where A is value of T that contains 90% of the area of the t distribution for n-1 degree of freedom. We can calculate A through the t table.

[Math Review] Statistics Basic: Estimation的更多相关文章

  1. [Math Review] Statistics Basic: Sampling Distribution

    Inferential Statistics Generalizing from a sample to a population that involves determining how far ...

  2. [Math Review] Statistics Basics: Main Concepts in Hypothesis Testing

    Case Study The case study Physicians' Reactions sought to determine whether physicians spend less ti ...

  3. [Math Review] Linear Algebra for Singular Value Decomposition (SVD)

    Matrix and Determinant Let C be an M × N matrix with real-valued entries, i.e. C={cij}mxn Determinan ...

  4. 统计处理包Statsmodels: statistics in python

    http://blog.csdn.net/pipisorry/article/details/52227580 Statsmodels Statsmodels is a Python package ...

  5. FAQ: Automatic Statistics Collection (文档 ID 1233203.1)

    In this Document   Purpose   Questions and Answers   What kind of statistics do the Automated tasks ...

  6. Machine and Deep Learning with Python

    Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...

  7. How do I learn machine learning?

    https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644   How Can I Learn X? ...

  8. 本人AI知识体系导航 - AI menu

    Relevant Readable Links Name Interesting topic Comment Edwin Chen 非参贝叶斯   徐亦达老板 Dirichlet Process 学习 ...

  9. [book]awesome-machine-learning books

    https://github.com/josephmisiti/awesome-machine-learning/blob/master/books.md Machine-Learning / Dat ...

随机推荐

  1. 剑指Offer - 九度1514 - 数值的整数次方

    剑指Offer - 九度1514 - 数值的整数次方2013-11-30 00:49 题目描述: 给定一个double类型的浮点数base和int类型的整数exponent.求base的exponen ...

  2. Hastable和Dictionary以及ArrayList和(List,LinkedList,数组)的区别

    Hastable和Dictionary的区别:(键值对) 1:单线程程序中推荐使用 Dictionary, 有泛型优势, 且读取速度较快, 容量利用更充分. 2:多线程程序中推荐使用 Hashtabl ...

  3. 【转载】Unity3D研究院之共享材质的巧妙用法(sharedMaterial效率问题)

    如果你需要修改模型材质的颜色,或者是修改材质Shader的一些属性, 通常情况是用获取模型的Renderer组件,然后获取它的material属性. 举个简单的例子,修改颜色或者直接更换shader ...

  4. coreos ipa image Injection of public key

    查看readme To embed the oem/ directory into a CoreOS pxe image:   Note: In order to have the ability t ...

  5. 为什么js获取图片高度的值 都为0

    尼玛 这个问题困扰我好久~ 看别人取值都是 img.width 我取到的总是0: 终于发现取图片尺寸的时候 图片还没有加载完毕.所以在 <img id ='sImg' class='thumbI ...

  6. GYM - 101147 C.The Wall

    题意: 长和宽分别为M+N/2,N的矩形中.有很多敌人的点.有两种方法消灭敌人. 1.N个桶,第i个桶可以消灭i-1<=x<i中的敌人.2.M个摆(半圆)每个摆可以消灭距离他前面不超过1以 ...

  7. Hadoop中maptask数量的决定因素

    刚开始接触hadoop平台的时候 部分初学者对于mapreduce中的maptask的数量是怎么确定的 可能有点迷惑,如果看了jobclient里面的maptask初始化的那段源码,那么就比较清楚了, ...

  8. 禅与园林艺术(garden)

    禅与园林艺术(garden) 上了大学之后,小W和小Z一起报了一门水课,在做作业时遇到了问题. 有一个长度为nn的数列{ai},为一列树木的美观值. 现在有mm次询问,每次给出三个数l,r,p 询问对 ...

  9. Codeforces Round #324 (Div. 2) B

    B. Kolya and Tanya time limit per test 1 second memory limit per test 256 megabytes input standard i ...

  10. [CQOI2018]异或序列 (莫队,异或前缀和)

    题目链接 Solution 有点巧的莫队. 考虑到区间 \([L,R]\) 的异或和也即 \(sum[L-1]~\bigoplus~sum[R]\) ,此处\(sum\)即为异或前缀和. 然后如何考虑 ...