One of the most fundamental concepts of modern statistics is that of likelihood. In each of the discrete random variables we have considered thus far, the distribution depends on one or more parameters that are, in most statistical applications, unknown. In the Poisson distribution, the parameter is λ. In the binomial, the parameter of interest is p (since n is typically fixed and known).

Likelihood is a tool for summarizing the data’s evidence about unknown parameters. Let us denote the unknown parameter(s) of a distribution generically by θ. Since the probability distribution depends on θ, we can make this dependence explicit by writing f(x) as f(x ; θ). For example, in the Bernoulli distribution the parameter is θ =  π , and the distribution is

f(x;π)=πx(1−π)1−xx=0,1f(x;π)=πx(1−π)1−xx=0,1    (2)

Once a value of X has been observed, we can plug this observed value x into f(x ; π ) and obtain a function of π only. For example, if we observe X = 1, then plugging x = 1 into (2) gives the function π . If we observe X = 0, the function becomes 1 − π .

Whatever function of the parameter we get when we plug the observed data x into f(x ; θ), we call that function thelikelihood function.

We write the likelihood function as L(θ;x)=∏ni=1f(Xi;θ)L(θ;x)=∏i=1nf(Xi;θ) or sometimes just L(θ). Algebraically, the likelihoodL(θ ; x) is just the same as the distribution f(x ; θ), but its meaning is quite different because it is regarded as a function of θ rather than a function of x. Consequently, a graph of the likelihood usually looks very different from a graph of the probability distribution.

For example, suppose that X has a Bernoulli distribution with unknown parameter π . We can graph the probability distribution for any fixed value of π  . For example, if π = .5 we get this:

Now suppose that we observe a value of X, say X = 1. Plugging x = 1 into the distribution πx(1−π)1−xπx(1−π)1−x gives the likelihood function L(π ; x) = π , which looks like this:

For discrete random variables, a graph of the probability distribution f(x ; θ) has spikes at specific values of x, whereas a graph of the likelihood L(θ ; x) is a continuous curve (e.g. a line) over the parameter space, the domain of possible values for θ.

L(θ ; x) summarizes the evidence about θ contained in the event X = xL(θ ; x) is high for values of θ that make X =x more likely, and small for values of θ that make X = x unlikely. In the Bernoulli example, observing X = 1 gives some (albeit weak) evidence that π  is nearer to 1 than to 0, so the likelihood for x = 1 rises as p moves from 0 to 1.

For example, if we observe xx from Bin(n,π)Bin(n,π), the likelihood function is

L(π|x)=n!(n−x)!x!πx(1−π)n−x.L(π|x)=n!(n−x)!x!πx(1−π)n−x.

Any multiplicative constant which does not depend on θθ is irrelevant and may be discarded, thus,

L(π|x)∝πx(1−π)n−x.L(π|x)∝πx(1−π)n−x.

Loglikelihood

In most cases, for various reasons, but often computational convenience, we work with the loglikelihood

l(θ|x)=logL(θ|x)l(θ|x)=log⁡L(θ|x)

which is defined up to an arbitrary additive constant.

For example, the binomial loglikelihood is

l(π|x)=xlogπ+(n−x)log(1−π).l(π|x)=xlog⁡π+(n−x)log⁡(1−π).

In many problems of interest, we will derive our loglikelihood from a sample rather than from a single observation. If we observe an independent sample x1,x2,...,xnx1,x2,...,xn  from a distribution f(x|θ)f(x|θ), then the overall likelihood is the product of the individual likelihoods:

L(θ|x)==∏i=1nf(xi|θ)∏i=1nL(θ|xi)L(θ|x)=∏i=1nf(xi|θ)=∏i=1nL(θ|xi)

and the loglikelihood is:

l(θ|x)==log∏i=1nf(xi|θ)∑i=1nlogf(xi|θ)=∑i=1nl(θ|xi).l(θ|x)=log∏i=1nf(xi|θ)=∑i=1nlogf(xi|θ)=∑i=1nl(θ|xi).

Binomial loglikelihood examples:  
Plot of binomial loglikelihood function if n = 5 and we observe x = 0, x = 1, and x = 2 (see the lec1fig.R code on ANGEL on how to produce these figures):

In regular problems, as the total sample size nn grows, the loglikelihood function does two things:

  • it  becomes more sharply peaked around its maximum,  and
  • its shape becomes nearly quadratic (i.e. a  parabola, if there is a single parameter).

This is important since the tests such as Wald test based on z=statisticSE of statisticz=statisticSE of statistic only works if the logL approximates well to quadratic form. For example, the loglikelihood for a normal-mean problem is exactly quadratic. As the sample size grows, the inference comes to resemble the normal-mean problem. This is true even for discrete data. The extent to which normal-theory approximations work for discrete data does not depend on how closely the distribution of responses resembles a normal curve, but on how closely the loglikelihood resembles a quadratic function.

Transformations may help us to improve the shape of loglikelihood. More on this in Section 1.6 on Alternative Parametrizations. Next we will see how we use the likelihood, that is the corresponding loglikelihood, to estimate the most likely value of the unknown parameter of interest.

from: https://onlinecourses.science.psu.edu/stat504/node/27

似然和对数似然Likelihood & LogLikelihood的更多相关文章

  1. 负对数似然(negative log-likelihood)

    negative log likelihood文章目录negative log likelihood似然函数(likelihood function)OverviewDefinition离散型概率分布 ...

  2. 挑子学习笔记:对数似然距离(Log-Likelihood Distance)

    转载请标明出处:http://www.cnblogs.com/tiaozistudy/p/log-likelihood_distance.html 本文是“挑子”在学习对数似然距离过程中的笔记摘录,文 ...

  3. 二次代价函数、交叉熵(cross-entropy)、对数似然代价函数(log-likelihood cost)(04-1)

    二次代价函数 $C = \frac{1} {2n} \sum_{x_1,...x_n} \|y(x)-a^L(x) \|^2$ 其中,C表示代价函数,x表示样本,y表示实际值,a表示输出值,n表示样本 ...

  4. 最大似然预计(Maximum likelihood estimation)

    一.定义     最大似然预计是一种依据样本来预计模型參数的方法.其思想是,对于已知的样本,如果它服从某种模型,预计模型中未知的參数,使该模型出现这些样本的概率最大.这样就得到了未知參数的预计值. 二 ...

  5. 朴素贝叶斯-对数似然Python实现-Numpy

    <Machine Learning in Action> 为防止连续乘法时每个乘数过小,而导致的下溢出(太多很小的数相乘结果为0,或者不能正确分类) 训练: def trainNB0(tr ...

  6. 【MLE】最大似然估计Maximum Likelihood Estimation

    模型已定,参数未知 已知某个随机样本满足某种概率分布,但是其中具体的参数不清楚,参数估计就是通过若干次试验,观察其结果,利用结果推出参数的大概值.最大似然估计是建立在这样的思想上:已知某个参数能使这个 ...

  7. 回归——线性回归,Logistic回归,范数,最大似然,梯度,最小二乘……

    写在前面:在本篇博客中,旨在对线性回归从新的角度考虑,然后引入解决线性回归中会用到的最大似然近似(Maximum Likelihood Appropriation-MLA) 求解模型中的参数,以及梯度 ...

  8. EM 最大似然概率估计

    转载请注明出处 Leavingseason http://www.cnblogs.com/sylvanas2012/p/5053798.html EM框架是一种求解最大似然概率估计的方法.往往用在存在 ...

  9. LR为什么用极大似然估计,损失函数为什么是log损失函数(交叉熵)

    首先,逻辑回归是一个概率模型,不管x取什么值,最后模型的输出也是固定在(0,1)之间,这样就可以代表x取某个值时y是1的概率 这里边的参数就是θ,我们估计参数的时候常用的就是极大似然估计,为什么呢?可 ...

随机推荐

  1. DotNetOpenAuth实践之搭建验证服务器

    系列目录: DotNetOpenAuth实践系列(源码在这里) DotNetOpenAuth是OAuth2的.net版本,利用DotNetOpenAuth我们可以轻松的搭建OAuth2验证服务器,不废 ...

  2. Vue.js—快速入门及实现用户信息的增删

    Vue.js是什么 Vue.js 是一套构建用户界面的渐进式框架.与其他重量级框架不同的是,Vue 采用自底向上增量开发的设计.Vue 的核心库只关注视图层,它不仅易于上手,还便于与第三方库或既有项目 ...

  3. HDU - 4809 树形dp

    找了半天bug 发现把q打成了p... 思路:用dp[ i ][ j ][ k ] 表示在 i 这个点 这个点的状态为 j (0:不选 1:属于奇联通块 2:属于偶联通块) 且 奇联通块 - 偶联通块 ...

  4. 8-2 Party Games uva1610 (贪心)

    题意: 给出n个串(n为偶数): 要构造一个串,使n串中有一半小于等于它,另外一半大于它: 要求这个串长度尽量小,同时字典序小: 一开始我的优先级是放左   其实优先级是放左加一. 如 AAAA AA ...

  5. 《Android源码设计模式》--单例模式

    No1: 懒汉单例模式优缺点分析 public class Singleton{ private static Singleton instance; private Singleton(){} pu ...

  6. JAVAEE——Lucene基础:什么是全文检索、Lucene实现全文检索的流程、配置开发环境、索引库创建与管理

    1. 学习计划 第一天:Lucene的基础知识 1.案例分析:什么是全文检索,如何实现全文检索 2.Lucene实现全文检索的流程 a) 创建索引 b) 查询索引 3.配置开发环境 4.创建索引库 5 ...

  7. Linux软件管理(rpm、yum、tar)

    RPM软件包安装 YUM安装 源代码安装 TAR包管理:实现对文件的备份和压缩 rpm包管理 rpm命令是RPM软件包的管理工具. -a:查询所有套件:-b<完成阶段><套件档> ...

  8. quote函数什么意思,怎么用

    转自: https://blog.csdn.net/qiqiyingse/article/details/70046543 quote函数 属于urllib库里面的一个函数 屏蔽特殊的字符.比如如果u ...

  9. window.open()/剪切板ZeroClipboard

    <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8&quo ...

  10. ARC 067 E - Grouping

    题面在这里! 很显然是个暴力dp. 我们先枚举一下 队伍人数的种类,然后再逆序枚举一下dp数组里的总人数(顺序就会算重),最后枚举一下这种队伍的数量,之后就可以O(1)算方案了. 具体的,O(1)算方 ...