Study notes for Discrete Probability Distribution
The Basics of Probability
- Probability measures the amount of uncertainty of an event: a fact whose occurence is uncertain.
- Sample space refers to the set of all possible events, denoted as
.
- Some properties:
- Sum rule:
- Union bound:
- Sum rule:
- Conditional probability:
. To emphasize that p(A) is unconditional, p(A) is called "marginal probability", and p(B, A) is called "joint probability", where p(A, B)=p(B|A) p(A) is called the "multiplication rule" or "factorization rule".
- Total probability theorem: p(B) = p(B|A)p(A) + p(B|~A)p(~A)
- Bayes' Theorem:
Bayes' Theorem can be regarded as a rule to update a prior probability p(A) into a posterior probability p(A|B), taking into account the amount/occurrence of evidence/event B.
- Conditional independence: Two events A and B, with p(A)>0 and p(B)>0 are independent, given C, if p(A, B|C)=p(A|C) p(B|C).
- Probability mass function (p.m.f) of random variable X is a function
- Joint probability mass function of X and Y is a function
- Cumulative distribution function (c.d.f) of a random variable X is a function:
- The c.d.f describes the probability in a specific interval, whereas the p.m.f describes the probability in a specific event.
- Expectation: the expectationof a random variable X is:
- linearity: E[aX+bY]=aE[x]+bE[Y]
- if X and Y are independent: E[XY]=E[X]*E[Y]
- Markov's inequality: let X be a nonnegative random variable with
, then for all
- Variance: the variance of a random variable X is:
, where
is called the standard deviation of the random variable X.
- Var[aX] = a2Var[X]
- if X and Y are independent, Var[X+Y]=Var[X]+Var[Y]
- Chebyshev's inequality: let X be a random variable
, then for all
Bernoulli Distribution
- A (single) Bernoulli trial is an experiment whose outcome is random and can be either of two possible outcomes, "success" and "failure", or "yes" and "no". Examples of Bernoulli trials include: flipping a coin, political option poll, etc.
- The Bernoulli distribution is a discrete probability distribution ofone (a) discrete random variable X, which takes value 1 with success probability p: Pr(X=1)=p, and value 0 with failure probability Pr(X=0)=q=1-p. For formally, the Bernoulli distribution is summarized as follows:
- notation: Bern(p), where 0<p<1 is the probability of success.
- support: X={0, 1}
- p.m.f: Pr[X=0]=q=1-p, Pr[X=1]=p
- mean: E[X]=p
- variance: Var[X]=p(1-p)
- It is a special case of Binomial distribution B(n, p). Bernoulli distribution is B(1, p).
Binomial Distribution
- The Binomial distribution is the discrete probability distribution of the number of successes in a sequence ofn independent Bernoulli trials with success probabilityp, denoted asX~B(n, p).
- The Binomial distribution is often used to model the number of successes in a sample of sizen drawn with replacement from a population of sizeN. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hypergeometric distribution, not a binomial one.
- The Binomial distribution is summarized as follows:
- notation: B(n, p), where n is the number of trials and p is the success probability in each trial
- support: k = {0, 1, ..., n} the number of successes
- p.m.f:
- mean: np
- variance: np(1-p)
- If n is large enough, then the skew of the distribution is not too great. In this case, a reasonable approximation to B(n, p) is given by the normal distribution:
since a large n will result in difficulty to compute the p.m.f of Binomial distribution.
- one rule to determine if such approximation is reasonable, or if n is large enough is that both np and np(1-p) must be greater than 5. If both are greater than 15 then the approximation should be good.
- A second rule is than for n>5, the normal approximation is adequate if:
- Another commonly used rule holds that the normal approximation is appropriate only if everything within 3 standard deviation of its mean is within the range of possible values, that is if:
- To improve the accuracy of the approximation, we usually use a correction factor to take into account that the binomial random variable is discrete while the normal random variable is continuous. In particular, the basic idea is to treat the discrete value k as the continuous interval from k-0.5 to k+0.5.
- In addition, Poisson distribution can be used to approximate the Binomial distribution when n is very large. A rule of thumb stating that the Poisson distribution is a good approximation oof the binomial distribution if n is at least 20 and p is smaller than or equal to 0.05, and an excellent approximation if n>=100, and np<=10:
Poisson Distribution
- Poisson distribution: Let X be a discrete random variable taking values in the set of integer numbers
with probability:
My understanding. Poisson distribution describes the fact that the probability of drawing a specific integer from a set of integers is not uniform. For example, it is well-known that if someone is asked to pick a random integer from 1-10, some integers are occurring with greater probability whereas some others happen with lower probability. Although it seems that all possible integers get equal chance to be picked, it is not true in real case. I think this may be due to subjectivity of people, i.e., some one prefers larger values while other tends to pick smaller ones. This point needs to be verified as I got this feeling totally from intuitions. - The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independent of the time since the last event.
- The Poisson distribution is summarized as follows.
- notation:
, where
is a real number, indicating the number of events occurring that will be observed in the time interval
.
- support: k = {0, 1, 2, 3, ...}
- mean:
- variance:
- notation:
- Applications of Poisson distribution
- Telecommunication: telephone calls arriving in a system
- Management: customers arriving at a counter or call center
- Civil engineering: cars arriving at a traffic light
- Generating Poisson random variables
algorithm poisson_random_number:
init:
Let,
, and
.
do:Generate uniform random number u in [0, 1], and let
while p>L.
return k-1.
References
- Paola Sebastiani, A tutorial on probability theory
- Mehryar Mohri, Introduction to Machine Learning - Basic Probability Notations.
Study notes for Discrete Probability Distribution的更多相关文章
- Generating a Random Sample from discrete probability distribution
If is a discrete random variable taking on values , then we can write . Implementation of this formu ...
- Machine Learning Algorithms Study Notes(2)--Supervised Learning
Machine Learning Algorithms Study Notes 高雪松 @雪松Cedro Microsoft MVP 本系列文章是Andrew Ng 在斯坦福的机器学习课程 CS 22 ...
- Notes on the Dirichlet Distribution and Dirichlet Process
Notes on the Dirichlet Distribution and Dirichlet Process In [3]: %matplotlib inline Note: I wrote ...
- Study note for Continuous Probability Distributions
Basics of Probability Probability density function (pdf). Let X be a continuous random variable. The ...
- Machine Learning Algorithms Study Notes(3)--Learning Theory
Machine Learning Algorithms Study Notes 高雪松 @雪松Cedro Microsoft MVP 本系列文章是Andrew Ng 在斯坦福的机器学习课程 CS 22 ...
- Machine Learning Algorithms Study Notes(1)--Introduction
Machine Learning Algorithms Study Notes 高雪松 @雪松Cedro Microsoft MVP 目 录 1 Introduction 1 1.1 ...
- Study notes for Latent Dirichlet Allocation
1. Topic Models Topic models are based upon the idea that documents are mixtures of topics, where a ...
- Study notes for Clustering and K-means
1. Clustering Analysis Clustering is the process of grouping a set of (unlabeled) data objects into ...
- ORACLE STUDY NOTES 01
[JSU]LJDragon's Oracle course notes In the first semester, junior year DML数据操纵语言 DML指:update,delete, ...
随机推荐
- mongodb操作:利用javaScript封装db.collection.find()后可调用函数源码解读
{ "_mongo" : connection to YOURIP:27017{ SSL: { sslSupport: false, sslPEMKeyFile: "&q ...
- Wowza流媒体Live直播和VOD点播配置实战
Wowza是当今可以说最流行的流媒体服务器之一,近来因为需要搭建相应的服务器,但又不想用camera等作真实的直播,所以想办法用媒体文件转换成直播流再提供给Wowza进行直播.这里把该设置步骤以及设计 ...
- 表单验证的3个函数ISSET()、empty()、is_numeric()的使用方法
原文:表单验证的3个函数ISSET().empty().is_numeric()的使用方法 本文就简单讲一下php中表单验证的三个函数,应该比较常用吧,最后给一些示例,请看下文. ISSET();—— ...
- jquery validate remote验证唯一性
jquery.validate.js 的 remote 后台验证 之前已经有一篇关于jquery.validate.js验证的文章,还不太理解的可以先看看:jQuery Validate 表单验证(这 ...
- Util应用程序框架公共操作类
随笔分类 - Util应用程序框架公共操作类 Util应用程序框架公共操作类 Util应用程序框架公共操作类(五):异常公共操作类 摘要: 任何系统都需要处理错误,本文介绍的异常公共操作类,用于对业务 ...
- QtNetwork说明(两)使用QT实现360的ctrl+ctrl特征
头文字说明: <span style="font-size:18px;">#ifndef GOOGLESUGGEST_H #define GOOGLESUGGEST_H ...
- javascript脚本化文档
1.getElememtById /** * 获取指定id的的元素数组 */ function getElements(/*ids...*/) { var elements = {}; for(var ...
- CSS知识点:font小细节
font是用来设置网页字体属性的关键字,使用频率非常高,大家也比较熟悉.它有两种写法,一种是简写,一种分开写. 简写--font:italic bold 12px/20px arial,sans-se ...
- asp.net访问WebService的各种方式
WebService的访问形式主要有:SOAP调用.XMLHTTP POST.GET调用.MicroSoft.XMLDOMC调用.webbehavior.htc调用 我们知道的在C#后台本地调用Web ...
- 架构师Jack专访:全面认识软件测试架构师
◇ 测试架构师的职责 测试的职业通道基本是管理线和技术线两条路. 管理线主要的职责:更多是项目管理和资源管理. 技术线主要的职责:更多是技术管理和业务知识. 软件测试架构师更多就是技术线的带头人.管理 ...