Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授。

PDF笔记下载(Academia.edu)

Summary

  • Standard Error
    The standard error of a random variable $X$ is defined by $$SE(X)=\sqrt{E((X-E(X))^2)}$$ $SE$ measures the rough size of the chance error in $X$: roughly how far off $X$ is from $E(X)$.
  • Standard Deviation
    The standard deviation of a list of numbers is $$SD=\sqrt{E((x-\mu)^2)}$$ where $\mu=E(x)$. $SD$ measures the rough size of the deviations: roughly how far off the numbers are from the average.
  • $SE$ of the Sum of the Draws
    $n$ draws at random with replacement from a box of numbered tickets, the standard error of the sum of the draw is $$SE=\sqrt{\text{number of draws}}\cdot(SD\ \text{of the box})=\sqrt{n}\cdot\sigma$$ where $\sigma=\sqrt{E((x-\mu)^2)}$
  • Chebychev's Inequality
    The probability that $X$ is $k$ or more $SEs$ away from $E(X)$ is at most $\frac{1}{k^2}$, that is $$P(X\ \text{is outside the interval}\ E(X)\pm k\cdot SE(X))\leq\frac{1}{k^2}$$ For instance, $$P(X\ \text{is inside the interval}\ E(X)\pm2\cdot SE(X))\geq1-\frac{1}{2^2}=\frac{3}{4}$$
  • De Moivre - Laplace Theorem
    Fix any $p$ strictly between $0$ and $1$. As the number of trials $n$ increases, the probability histogram for the binomial distribution looks like the normal curve with mean $\mu=n\cdot p$ and $SD=\sqrt{n\cdot p\cdot(1-p)}$.
  • Central Limit Theorem
    Let $X_1, X_2, \ldots, X_n$ be independent and identically distributed, each with expected value $\mu$ and standard error $\sigma$. Let $S_n=X_1+X_2+\ldots+X_n$. Then for large $n$, the probability distribution of $S_n$ is approximately normal with mean $n\mu$ and standard deviation $\sqrt{n}\sigma$, no matter what the distribution of each $X_i$.
  • Normal Approximation of Binomial Distribution
    $$\mu=n\cdot p, SE=\sqrt{n\cdot p\cdot(1-p)}$$ $$Z_1=\frac{X_1-\mu}{SE}, Z_2=\frac{X_2-\mu}{SE}$$ $$P(X_1\leq X\leq X_2)=\text{Area under the standard normal curve between}\ X_1,X_2 $$ R code:
    mu = n * p; se = sqrt(n * p * (1 - p))
    z1 = (x1 - mu) / se; z2 = (x2 - mu) / se
    pnorm(z2) - pnorm(z1)

PRACTICE

PROBLEM 1

In 6000 rolls of a die, approximately what is the chance of getting between 950 and 1050 sixes (inclusive)?

Solution

Binomial distribution $n=6000, k=950:1050, p=1/6$: $$P(\text{between 950 and 1050 sixes})$$ $$=\sum_{k=950}^{1050}C_{6000}^{k}(\frac{1}{6})^k\cdot(\frac{5}{6})^{6000-k}\doteq0.9198021$$ R code:

sum(dbinom(x = 950:1050, size = 6000, p = 1/6))
[1] 0.9198021

Alternatively, using Normal Approximation: $$\mu=np=6000\times\frac{1}{6}=1000$$ $$SE=\sqrt{n\cdot p\cdot(1-p)}\doteq28.86751$$ $$Z_1=\frac{950-1000}{SE}, Z_2=\frac{1050-1000}{SE}$$ $$P(\text{between 950 and 1050 sixes})$$ $$=\text{Area under the standard normal curve between}\ Z_1\ \text{and}\ Z_2$$ $$=0.9167355$$ R code:

n = 6000; p = 1/6
mu = n * p; se = sqrt(n * p * (1 - p))
z1 = (950 - mu) / se; z2 = (1050 - mu) / se
pnorm(z2) - pnorm(z1)
[1] 0.9167355

PROBLEM 2

The “column” bet in roulette pays 2 to 1 and there are 12 chances in 38 to win. Suppose you bet \$1 100 times independently on a column. Find

a) the expected number of times you win

b) the SE of the number of times you win

c) the expected value of your net gain

d) the $SE$ of your net gain

e) the chance that you come out ahead

Solution

2a) $$E(\text{times of win})=100\times\frac{12}{38}\doteq31.57895$$

2b) $$SE=\sqrt{n\cdot p\cdot(1-p)}=\sqrt{100\times\frac{12}{38}\times\frac{26}{38}}\doteq4.648295$$

2c) $$E(\text{net gain})=100\times(2\times\frac{12}{38}+(-1)\times\frac{26}{38})\doteq-5.263158$$ Alternatively, Let $W$ be the number of wins and $X$ the net gain. Then $$X=2\cdot W-1\cdot(100-W)=3\cdot W-100$$ $$E(X)=3\cdot E(W)-100=3\times31.579895-100=-5.26315$$

2d) Because $SE=\sqrt{n}\sigma$ and $$n=100, \mu=2\times\frac{12}{38}+(-1)\times\frac{26}{38}=-\frac{1}{19}$$ $$\sigma=\sqrt{E((X-\mu)^2)}=\sqrt{(2+\frac{1}{19})^2\times\frac{12}{38}+(-1+\frac{1}{19})^2\times\frac{26}{38}}\doteq1.394489$$ Thus $$SE=\sqrt{n}\sigma\doteq13.94489$$ Alternatively, $$SE(X)=3\cdot SE(W)=3\times4.6483=13.945$$

2e) $X > 0 \Rightarrow W > \frac{100}{3}\Rightarrow W \geq 34$. Binomial distribution $n=100, k=34:100, p=12/38$: $$\sum_{k=34}^{100}C_{100}^{k}\cdot(\frac{12}{38})^k\cdot(\frac{26}{38})^{100-k}\doteq0.3357928$$ R code:

sum(dbinom(x = 34:100, size = 100, p = 12/38))
[1] 0.3357928

PROBLEM 3

Find the normal approximation to the chance of getting 43 heads in 100 tosses of a coin.

Solution

Normal Approximation: $$\mu=100\times0.5=50, SE=\sqrt{n\cdot p\cdot(1-p)}=\sqrt{100\times0.5\times0.5}=5$$ $$Z_1=\frac{42.5-50}{5}, Z_2=\frac{43.5-50}{5}$$ $$P(\text{getting 43 heads in 100 tosses of a coin})\doteq0.02999328$$ R code:

n = 100; p = 1/2
mu = n * p; se = sqrt(n * p * (1 - p))
z1 = (42.5 - mu) / se; z2 = (43.5 - mu) / se
pnorm(z2) - pnorm(z1)
[1] 0.02999328

Binomial distribution (exact value): $$C_{100}^{43}\times(\frac{1}{2})^{100}\doteq0.03006864$$ R code:

dbinom(x = 43, size = 100, p = 1/2)
[1] 0.03006864

Therefore the normal approximation is excellent.

EXERCISE 4

PROBLEM 1

A random variable $W$ has the probability distribution

value                1         2             3              4

probability        0.5      0.25        0.125        0.125

(For those of you who are interested, this is the geometric $p=0.5$ “killed” at 4. $W$ is the number of times I toss a coin if I follow this rule: I’ll toss the coin till I get the first head, but I’ll stop after 4 tosses even if I haven’t got a head by that time.)

1A Find $E(W)$

1B Find $SE(W)$

Solution

1A) $$E(W)=1\times0.5+2\times0.25+3\times0.125+4\times0.125=1.875$$

1B) $$SE(W)=\sqrt{E[(W-E(W))^2]}$$ $$=\sqrt{(1-1.875)^2\times0.5+(2-1.875)^2\times0.25+(3-1.875)^2\times0.125+(4-1.875)^2\times0.125}$$ $$\doteq1.053269$$ R code:

v = 1:4; p = c(.5, .25, .125, .125)
mu = sum(v * p)
sqrt(sum((v - mu) ^ 2 * p))
[1] 1.053269

PROBLEM 2

A true-false test consists of 20 questions, each of which has one correct answer: true, or false. One point is awarded for every correct answer, but one point is taken off for each wrong answer. Suppose a student answers every question by guessing at random, independently of other questions. Let $S$ be the student’s score on the test.

2A Find $E(S)$

2B Find $SE(S)$

2C Find $P(S=0)$ without using a large-sample approximation.

Solution

2A) This is very similar to the net gain, $$E(S)=20\times(1\times\frac{1}{2}+(-1)\times\frac{1}{2})=0$$

2B) $S$ is the sum score, $$\mu=1\times\frac{1}{2}+(-1)\times\frac{1}{2}=0$$ $$SE(S)=\sqrt{n}\sigma=\sqrt{20\times((1-0)^2\times\frac{1}{2}+(-1-0)^2\times\frac{1}{2})}\doteq4.472136$$

2C) $S=0$ means there are 10 correct answers and 10 incorrect answers, binomial distribution $n=20, k=10, p=\frac{1}{2}$, $$P(S=0)=C_{20}^{10}\times(\frac{1}{2})^{20}\doteq0.1761971$$ R code:

dbinom(x = 10, size = 20, prob = 1/2)
[1] 0.1761971

PROBLEM 3

A die is rolled 60 times.

3A Find the expected number of times the face with 6 spots appears.

3B Find the $SE$ of the number of times the face with 6 spots appears.

3C Find the normal approximation to the chance that the face with six spots appears 10 times.

3D Find the exact chance that the face with six spots appears 10 times.

3E Find the normal approximation to the chance that the face with six spots appears 9, 10, or 11 times.

3F Find the exact chance that the face with six spots appears 9, 10, or 11 times.

Solution

3A) $$E(\text{6 spots appears})=60\times\frac{1}{6}=10$$

3B) $$SE(\text{6 spots appears})=\sqrt{60\times\frac{1}{6}\times(1-\frac{1}{6})}\doteq2.886751$$

3C) $$Z_1=\frac{9.5-10}{SE}, Z_2=\frac{10.5-10}{SE}$$ Computing in R:

mu = 10; se = sqrt(60 * 1/6 * 5/6)
z1 = (9.5 - mu) / se; z2 = (10.5 - mu) / se
pnorm(z2) - pnorm(z1)
[1] 0.1375098

3D) Binomial distribution $n=60, k=10, p=\frac{1}{6}$: $$C_{60}^{10}\times(\frac{1}{6})^{10}\times(\frac{5}{6})^{50}\doteq0.1370131$$ R code:

dbinom(x = 10, size = 60, prob = 1/6)
[1] 0.1370131

3E) $$Z_1=\frac{8.5-10}{SE}, Z_2=\frac{11.5-10}{SE}$$ Computing in R:

mu = 10; se = sqrt(60 * 1/6 * 5/6)
z1 = (8.5 - mu) / se; z2 = (11.5 - mu) / se
pnorm(z2) - pnorm(z1)
[1] 0.3966682

3F) Binomial distribution $n=60, k=9:11, p=\frac{1}{6}$: $$\sum_{k=9}^{11}C_{60}^{k}\cdot(\frac{1}{6})^{k}\cdot(\frac{5}{6})^{60-k}\doteq0.3958971$$ R code:

sum(dbinom(x = 9:11, size = 60, prob = 1/6))
[1] 0.3958971

PROBLEM 4

According to genetic theory, plants of a particular species have a 25% chance of being red-flowering, independently of other plants. Find the normal approximation to the chance that among 10,000 plants of this species, more than 2400 are red-flowering.

Solution

Normal approximation: $$p=0.25, n=10000$$ $$\mu=np, SE=\sqrt{np(1-p)}, Z=\frac{2400.5-\mu}{SE}$$ Computing in R:

n = 10000; p = 0.25
mu = n * p; se = sqrt(n * p * (1 - p))
z = (2400.5 - mu) / se
1 - pnorm(z)
[1] 0.989215

Binomial distribution $$\sum_{k=2401}^{10000}C_{10000}^{k}\cdot(0.25)^k\cdot(0.75)^{10000-k}$$ R code:

sum(dbinom(x = 2401:10000, size = 10000, prob = 0.25))
[1] 0.9894525

PROBLEM 5

A random number generator draws at random with replacement from the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. In 5000 draws, the chance that the digit 0 appears fewer than 495 times is closest to

Solution

Normal approximation: $$n=5000, p=0.1$$ $$\mu=np, SE=\sqrt{np(1-p)}, Z=\frac{494.5-\mu}{SE}$$ Computing in R:

mu = n * p; se = sqrt(n * p * (1 - p))
z = (494.5 - mu) / se
pnorm(z)
[1] 0.3977125

Binomial distribution $$\sum_{k=0}^{494}C_{5000}^{k}\cdot(0.1)^k\cdot(0.9)^{5000-k}$$ R code:

sum(dbinom(x = 0:494, size = 5000, prob = 0.1))
[1] 0.3999814

EXERCISE 5

PROBLEM 1

The durations of phone calls taken by the receptionist at an office are like draws made at random with replacement from a list that has an average of 8.5 minutes (that's 8 minutes and 30 seconds) and an $SD$ of 3 minutes. Approximately what is the chance that the total duration of the next 100 calls is more than 15 hours?

Solution

Central Limit Theorem: $$\mu=8.5, SD=3, SE=\sqrt{n}\cdot SD=30$$ $$Z=\frac{900-850}{30}$$ Computing in R:

z = (900 - 850) / 30
1 - pnorm(z)
[1] 0.04779035

PROBLEM 2

A multiple choice test consists of 100 questions. Each question has 5 possible answers, only one of which is correct. Four points are awarded for each correct answer, and 1 point is taken off for each wrong answer. Suppose you answer all the questions by guessing at random, independently of all other questions.

2A In order to score more than 30 points, you have to get more than ________ answers right. Fill in the blank with the smallest correct whole number.

2B What is the chance that you get more than 30 points?

Solution

2A) Let $x$ be the number of correct answers, we have $$4x+(-1)\cdot(100-x) > 30\Rightarrow x > 26$$ Therefore you have to get more than 26 answers right.

2B) Binomial distribution $n=100, k=27:100, p=\frac{1}{5}$: $$P(\text{more than 30 points})=\sum_{k=27}^{100}C_{100}^{k}\cdot(\frac{1}{5})^k\cdot(\frac{4}{5})^{100-k}\doteq0.05583272$$ R code:

sum(dbinom(x = 27:100, size = 100, prob = 1/5))
[1] 0.05583272

Normal approximation: $$n=100, p=\frac{1}{5}, \mu=np=20, SE=\sqrt{np(1-p)}=4$$ $$Z=\frac{26.5-20}{SE}$$ Computing in R:

z = (26.5 - 20) / 4
> 1 - pnorm(z)
[1] 0.05208128

This approximation is not sufficient good.

PROBLEM 3

Assume that each person in a population has chance 2/1000 of carrying a particular disease, independently of all other people. Among 1000 people in this population, the number of people that carry the disease [pick all that are correct]

Solution

First, this is binomial distribution. Second, because $p$ is very small so it is right-skewed.

PROBLEM 4

Jack and Jill gamble on a roll of a die (yes, a fair die), as follows. If the die shows 1 or 2 spots, Jack gives Jill $\$1$. If the die shows 5 or 6 spots, Jill gives Jack $\$1$. If the die shows 3 or 4 spots, no money changes hands. Suppose Jack and Jill play this game 400 times. The chance that Jill’s net gain is more than $\$20$ is closest to?

Solution

$$P(\text{Jill wins 1})=P(\text{Jill loses 1})=P(\text{no money changes hands})=\frac{1}{3}$$ $$\mu=1\times\frac{1}{3}+(-1)\times\frac{1}{3}+0\times\frac{1}{3}=0$$ $$SD=\sqrt{(1-0)^2\times\frac{1}{3}+(-1-0)^2\times\frac{1}{3}+(0-0)^2\times\frac{1}{3}}=\sqrt{\frac{2}{3}}$$ $$SE=\sqrt{n}\cdot SD=\sqrt{\frac{800}{3}}, Z=\frac{20-0}{SE}$$ Computing in R:

se = sqrt(800 / 3)
z = (20 - 0) / se
1 - pnorm(z)
[1] 0.1103357

PROBLEM 5

In roulette, the bet on a “split” pays 17 to 1 and there are 2 chances in 38 to win. The bet on “red” pays 1 to 1 and there are 18 chances in 38 to win. Compare the following two strategies: A: bet $\$1$ on a split, 200 times independently B: bet $\$1$ on red, 200 times independently In what follows, “making more than $\$x$” means having a net gain of more than $\$x$; “losing more than $\$x$” means having a net gain of less than $-\$x$. Compare the chances between A and B that "coming out ahead, winning more than $\$20$, losing more than $\$20$".

Solution

By using Central Limit Theorem.

Let $P_{X0}$ be "coming out ahead" when following strategy $X$. Similarly, $P_{X20^{+}}$ and $P_{X20^{-}}$ denotes wining and losing $\$20$ respectively. Strategy $A$: $$n=200, \mu=200\times(17\times\frac{2}{38}+(-1)\times\frac{36}{38})=-\frac{200}{19}$$ $$SE=\sqrt{n}\cdot SD=\sqrt{200\times[(17-\mu)^2\times\frac{2}{38}+(-1-\mu)^2\times\frac{36}{38}]}$$ Similarly, we can calculate strategy $B$ in the same way. And finally computing in R:

netgain = function(n, prob, value, gain){
mu = n * (sum(prob * value))
se = sqrt(n * sum((value - mu) ^ 2 * prob))
if (gain >= 0){
z = (gain + 0.5 - mu) / se
print(1 - pnorm(z))
} else {
z = (gain - 0.5 - mu) / se
print(pnorm(z))
}
}
netgain(n = 200, prob = c(2/38, 36/38), value = c(17, -1), gain = 0)
[1] 0.4722959 # A
netgain(n = 200, prob = c(18/38, 20/38), value = c(1, -1), gain = 0)
[1] 0.4704632 # B
netgain(n = 200, prob = c(2/38, 36/38), value = c(17, -1), gain = 20)
[1] 0.4224767 # A
netgain(n = 200, prob = c(18/38, 20/38), value = c(1, -1), gain = 20)
[1] 0.4174109 # B
netgain(n = 200, prob = c(2/38, 36/38), value = c(17, -1), gain = -20)
[1] 0.474937 # A
netgain(n = 200, prob = c(18/38, 20/38), value = c(1, -1), gain = -20)
[1] 0.4732785 # B

According to the results above, $$P_{A0} > P_{B0}$$ $$P_{A20^{+}} > P_{B20^{+}}$$ $$P_{A20^{-}} > P_{B20^{-}}$$ That is, $P_A > P_B$ when

  • Coming out ahead
  • Winning more than $\$20$
  • Losing more than $\$20$

加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 4 The Central Limit Theorem的更多相关文章

  1. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 5 The accuracy of simple random samples

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  2. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 3 The law of averages, and expected values

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  3. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 2 Random sampling with and without replacement

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  4. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 1 The Two Fundamental Rules (1.5-1.6)

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  5. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Final

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  6. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Midterm

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  7. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: FINAL

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  8. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 2 Testing Statistical Hypotheses

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  9. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 1 Estimating unknown parameters

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

随机推荐

  1. WebPack系列:Webpack编译的代码如何在tomcat中使用时静态资源路径不对的问题如何解决

    问题:     使用webpack+vue做前端,使用tomcat提供api,然后npm run build之后需要将编译,生成如下文件: |   index.html \---appserver   ...

  2. 【转】加快网站访问速度——Yslow极限优化

    Yslow是一套雅虎的网页评分系统,详细的列出了各项影响网页载入速度的参数,这里不做多说. 我之前就一直参考Yslow做博客优化,经过长时间的学习也算是有所收获,小博的YslowV2分数达到了94分( ...

  3. WPF Adorner+附加属性 实现控件友好提示

    标题太空泛,直接上图 无论是在验证啊,还是提示方面等一些右上角的角标之类的效果,我们会怎么做? 这里介绍一种稍微简单一些的方法,利用附加属性和Adorner来完成. 例如WPF自带的控件上要加这样的效 ...

  4. http缓存提高性能

    秋招也算是正式结束了,现在整理一下笔记,当作巩固一下知识,也希望这个对大家有帮助 http 缓存 和 cdn 缓存可以说是面试必问的问题,竟然是必问的问题,那就总结全面一点- http缓存机制 缓存分 ...

  5. opencv1-安装及资料

    本科用过opencv2..3.1版本,当时按照 http://wiki.opencv.org.cn/index.php/首页 上面的步骤安装的,而且使用的是IplImage和CvMat等C接口的的AP ...

  6. 更好的逐帧动画函数 — requestAnimationFrame 简介

    本文将会简单讲讲 requestAnimationFrame 函数的用法,与 setTimeout/setInterval 的区别和联系,以及当标签页隐藏时 requestAnimationFrame ...

  7. C#操作XML方法集合

    一 前言 先来了解下操作XML所涉及到的几个类及之间的关系  如果大家发现少写了一些常用的方法,麻烦在评论中指出,我一定会补上的!谢谢大家 * 1 XMLElement 主要是针对节点的一些属性进行操 ...

  8. .Net 程序集按需加载机制

    在开始本文之前先提两个疑问: 1.一个.Net程序依赖很多的dll,那个他们是在应用程序启动的时候全部把所依赖的动态库全部都加载到应用程序域中的呢还是有选择的加载呢? 2.当应用程序已经启动后我们动态 ...

  9. strlen 与 sizeof 的区别

    void ngx_time_init(void) { ngx_cached_err_log_time.len = sizeof("1970/09/28 12:00:00") - 1 ...

  10. 如何设置div高度为100%

    div高度通常都是固定值,直接将div高度设为100%是无效的,那么如何设置才能有效呢? 直接给div设置height:100%即可,无效的原因一定是父元素的高度为0,最常见的就是给body的直接元素 ...