Two Types of Estimation

One of the major applications of statistics is estimating population parameters from sample statistics. There are types of estimation:
  • Point Estimate: the value of sample statistics

Point estimates of average height with multiple samples (Source: Zhihu)

  • Confidence Intervals: intervals constructed using a method that contains the population parameter a specified proportion of the time.

95% confidence interval of average height with multiple samples (Source: Zhihu)

Confidence Interval for the Mean

Population Variance is known

Suppose that M is the mean of N samples X1, X2, ......, Xn, i.e.

According to Central Limit Theorem, the the sampling distribution of the mean M is

where μ and σ2 are the mean and variance of the population respectively. If repeated samples were taken and the 95% confidence interval computed for each sample, 95% of the intervals would contain the population mean. So the 95% confidence interval for M is the inverval that is symetric about the point estimate μ so that the area under normal distribution is 0.95.

That is,

Since we don't know the mean of population, we could use the sample mean  instead.

Population Variance is Unknown

Dregree of Freedom

The degrees of freedom (df) of an estimate is the number of independent pieces of information on which the estimate is based. In general, the degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated en route to the estimate in question. 

If the variance in a sample is used to estimate the variance in a population, we couldn't calculate the sample variace as

That's because we have two parameters to estimate (i.e., sample mean and sample variance). The degree of freedom should be N-1, so the previous formula underestimates the variance. Instead, we should use the following formula

where s2 is the estimate of the variance and M is the sample mean. The denominator of this formula is the degree of freedom.

Student's t-Distribution

Suppose that X is a random variable of normal distribution, i.e., X ~ N(μ, σ2)

is sample mean and

is sample deviation.

is a random variable of normal distribution.

is a random variable of student's t distribution.

The probability density function of T is

where  is the degree of freedom,  is a gamma function.

The t distribution is very similar to the normal distribution when the estimate of variance is based on many degrees of freedom, but has relatively more scores in its tails when there are fewer degrees of freedom. Here are t distributions with 2, 4, and 10 degrees of freedom and the standard normal distribution. Notice that the normal distribution has relatively more scores in the center of the distribution and the t distribution has relatively more in the tails.

The t distribution is therefore leptokurtic. The t distribution approaches the normal distribution as the degrees of freedom increase. 

Confidence Interval of t Distribution

Now consider the case in which you have a normal distribution but you do not know the standard deviation. You sample N values and compute the sample mean (M) and estimate the standard error of the mean (σM) with sM. What is the probability that M will be within 1.96 sM of the population mean (μ)? This is a difficult problem because there are two ways in which M could be more than 1.96 sM from μ: (1) M could, by chance, be either very high or very low and (2) sM could, by chance, be very low. Intuitively, it makes sense that the probability of being within 1.96 standard errors of the mean should be smaller than in the case when the standard deviation is known (and cannot be underestimated).

Luckily, however, we can prove that random variable T will be student's t distribution. So we can use t distribution to estimate the mean of a normal distribution population in situations where the sample size is small and population standard deviation is unknown. For 90% confidence interval, it can be calculated as

where A is value of T that contains 90% of the area of the t distribution for n-1 degree of freedom. We can calculate A through the t table.

[Math Review] Statistics Basic: Estimation的更多相关文章

  1. [Math Review] Statistics Basic: Sampling Distribution

    Inferential Statistics Generalizing from a sample to a population that involves determining how far ...

  2. [Math Review] Statistics Basics: Main Concepts in Hypothesis Testing

    Case Study The case study Physicians' Reactions sought to determine whether physicians spend less ti ...

  3. [Math Review] Linear Algebra for Singular Value Decomposition (SVD)

    Matrix and Determinant Let C be an M × N matrix with real-valued entries, i.e. C={cij}mxn Determinan ...

  4. 统计处理包Statsmodels: statistics in python

    http://blog.csdn.net/pipisorry/article/details/52227580 Statsmodels Statsmodels is a Python package ...

  5. FAQ: Automatic Statistics Collection (文档 ID 1233203.1)

    In this Document   Purpose   Questions and Answers   What kind of statistics do the Automated tasks ...

  6. Machine and Deep Learning with Python

    Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...

  7. How do I learn machine learning?

    https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644   How Can I Learn X? ...

  8. 本人AI知识体系导航 - AI menu

    Relevant Readable Links Name Interesting topic Comment Edwin Chen 非参贝叶斯   徐亦达老板 Dirichlet Process 学习 ...

  9. [book]awesome-machine-learning books

    https://github.com/josephmisiti/awesome-machine-learning/blob/master/books.md Machine-Learning / Dat ...

随机推荐

  1. 《Cracking the Coding Interview》——第8章:面向对象设计——题目10

    2014-04-24 00:05 题目:用拉链法设计一个哈希表. 解法:一个简单的哈希表,就看成一个数组就好了,每个元素是一个桶,用来放入元素.当有多个元素落入同一个桶的时候,就用链表把它们连起来.由 ...

  2. Swift 与众不同的地方

    Swift 与众不同的地方 switch(元组) 特点 其他语言中的switch语句只能比较离散的整形数据(字符可以转换成整数) 但是swift中可以比较整数.浮点数.字符.字符串.和元组数据类型,而 ...

  3. Python 爬取网页中JavaScript动态添加的内容(二)

    使用 selenium + phantomjs 实现 1.准备环境 selenium(一个用于web应用程测试的工具)安装:pip install seleniumphantomjs(是一种无界面的浏 ...

  4. Python SimpleHTTPServer简单HTTP服务器

    搭建FTP,或者是搭建网络文件系统,这些方法都能够实现Linux的目录共享.但是FTP和网络文件系统的功能都过于强大,因此它们都有一些不够方便的地方.比如你想快速共享Linux系统的某个目录给整个项目 ...

  5. Cannot create a secure XMLInputFactory --CXF调用出错

    在调用方法前加上下面三句即可调用成功: Properties props = System.getProperties(); props.setProperty("org.apache.cx ...

  6. ubuntu wifi连接出现Network service discovery disabled的解决办法

    修改/etc/default/avahi-daemon,将AVAHI_DAEMON_DETECT_LOCAL从1改为0(关闭avahi) ------------------------------- ...

  7. 使用puTTY或Xshell连接阿里云TimeOut超时

    根据网上很多主流的说法,我依次检查了 ssh: service sshd status 防火墙:service iptables stop (CentOS 7好像已经没有这个iptable了) 都没有 ...

  8. poj 3436 网络流构图经典

    ACM Computer Factory Time Limit: 1000MS   Memory Limit: 65536K Total Submissions: 6012   Accepted: 2 ...

  9. 【bzoj4881】[Lydsy2017年5月月赛]线段游戏 树状数组+STL-set

    题目描述 quailty和tangjz正在玩一个关于线段的游戏.在平面上有n条线段,编号依次为1到n.其中第i条线段的两端点坐标分别为(0,i)和(1,p_i),其中p_1,p_2,...,p_n构成 ...

  10. Delivering Goods UVALive - 7986(最短路+最小路径覆盖)

    Delivering Goods UVALive - 7986(最短路+最小路径覆盖) 题意: 给一张n个点m条边的有向带权图,给出C个关键点,问沿着最短路径走,从0最少需要出发多少次才能能覆盖这些关 ...