MLE vs MAP: the connection between Maximum Likelihood and Maximum A Posteriori Estimation

Reference:MLE vs MAP.

Maximum Likelihood Estimation (MLE) and Maximum A Posteriori (MAP), are both a method for estimating some variable in the setting of probability distributions or graphical models. They are similar, as they compute a single estimate, instead of a full distribution.

MLE, as we, who have already indulge ourselves in Machine Learning, would be familiar with this method. Sometimes, we even use it without knowing it. Take for example, when fitting a Gaussian to our dataset, we immediately take the sample mean and sample variance, and use it as the parameter of our Gaussian. This is MLE, as, if we take the derivative of the Gaussian function with respect to the mean and variance, and maximizing it (i.e. setting the derivative to zero), what we get is functions that are calculating sample mean and sample variance. Another example, most of the optimization in Machine Learning and Deep Learning (neural net, etc), could be interpreted as MLE.

Speaking in more abstract term, let’s say we have a likelihood function P(X|θ)P(X|θ). Then, the MLE for θ , the parameter we want to infer, is:

As taking a product of some numbers less than 1 would approaching 0 as the number of those numbers goes to infinity, it would be not practical to compute, because of computation underflow. Hence, we will instead work in the log space, as logarithm is monotonically increasing, so maximizing a function is equal to maximizing the log of that function.

To use this framework, we just need to derive the log likelihood of our model, then maximizing it with regard of θ using our favorite optimization algorithm like Gradient Descent.

Up to this point, we now understand what does MLE do. From here, we could draw a parallel line with MAP estimation.

MAP usually comes up in Bayesian setting. Because, as the name suggests, it works on a posterior distribution, not only the likelihood.

Recall, with Bayes’ rule, we could get the posterior as a product of likelihood and prior:

We are ignoring the normalizing constant as we are strictly speaking about optimization here, so proportionality is sufficient.

If we replace the likelihood in the MLE formula above with the posterior, we get:

Comparing both MLE and MAP equation, the only thing differs is the inclusion of prior P(θ) in MAP, otherwise they are identical. What it means is that, the likelihood is now weighted with some weight coming from the prior.

Let’s consider what if we use the simplest prior in our MAP estimation, i.e. uniform prior. This means, we assign equal weights everywhere, on all possible values of the θ. The implication is that the likelihood equivalently weighted by some constants. Being constant, we could be ignored from our MAP equation, as it will not contribute to the maximization.

Let’s be more concrete, let’s say we could assign six possible values into θ . Now, our prior P(θ) is 1/6 everywhere in the distribution. And consequently, we could ignore that constant in our MAP estimation.

We are back at MLE equation again!

If we use different prior, say, a Gaussian, then our prior is not constant anymore, as depending on the region of the distribution, the probability is high or low, never always the same.

What we could conclude then, is that MLE is a special case of MAP, where the prior is uniform!

MLE vs MAP: the connection between Maximum Likelihood and Maximum A Posteriori Estimation的更多相关文章

Maximum Likelihood及Maximum Likelihood Estimation
1.What is Maximum Likelihood? 极大似然是一种找到最可能解释一组观测数据的函数的方法. Maximum Likelihood is a way to find the mo ...
最大似然估计实例 | Fitting a Model by Maximum Likelihood (MLE)
参考:Fitting a Model by Maximum Likelihood 最大似然估计是用于估计模型参数的,首先我们必须选定一个模型,然后比对有给定的数据集,然后构建一个联合概率函数,因为给定 ...
机器学习的MLE和MAP：最大似然估计和最大后验估计
https://zhuanlan.zhihu.com/p/32480810 TLDR (or the take away) 频率学派 - Frequentist - Maximum Likelihoo ...
Linear Regression and Maximum Likelihood Estimation
Imagination is an outcome of what you learned. If you can imagine the world, that means you have lea ...
似然函数 | 最大似然估计 | likelihood | maximum likelihood estimation | R代码
学贝叶斯方法时绕不过去的一个问题,现在系统地总结一下. 之前过于纠结字眼,似然和概率到底有什么区别?以及这一个奇妙的对等关系(其实连续才是f,离散就是p). 似然函数 | 似然值 wiki:在数理统计 ...
[Bayes] Maximum Likelihood estimates for text classification
Naïve Bayes Classifier. We will use, specifically, the Bernoulli-Dirichlet model for text classifica ...
最大似然估计（Maximum Likelihood，ML）
先不要想其他的,首先要在大脑里形成概念! 最大似然估计是什么意思?呵呵,完全不懂字面意思,似然是个啥啊?其实似然是likelihood的文言翻译,就是可能性的意思,所以Maximum Likeliho ...
MLE、MAP、贝叶斯三种估计框架
三个不同的估计框架. MLE最大似然估计:根据训练数据,选取最优模型,预测.观测值D,training data:先验为P(θ). MAP最大后验估计:后验概率. Bayesian贝叶斯估计:综合模型 ...
Maximum Likelihood Method最大似然法
最大似然法,英文名称是Maximum Likelihood Method,在统计中应用很广.这个方法的思想最早由高斯提出来,后来由菲舍加以推广并命名. 最大似然法是要解决这样一个问题:给定一组数据和一 ...

随机推荐

v-on可以监听多个方法吗？
原文地址 v-on可以监听多个方法 <template> <div class="about"> <button @click="mycli ...
python-Web-数据库-mysql
概念: 服务器->数据库管理系统(软件)->数据库(文件夹)->表(文件) 关系型安装与配置: >>>下载-安装-环境变量 >>>启动 mysq ...
【miscellaneous】海康威视监控摄像头实现web端无插件监控实拍效果
[rtsp]海康威视监控摄像头实现web端无插件监控实拍效果详细介绍参见:http://live.cuplayer.com/RtspCameraLive.html web端无须装插件(支持PC,安卓 ...
【计算机视觉】UCLA开源图像检测器
UCLA (加州大学洛杉矶分校)发布了一个强大的图像检测软件的源码 ,该软件可以非常高速的检测每个图像的细节,例如可用于检测指纹和虹膜,或者用于自动驾驶.通过识别物体的边界进行提取.首先确定一个物体的 ...
alertmanager + federate - Prometheus outside k8s cluster + 总体架构图对接企业微信告警 + curl alertmanager to send alert
1.实验的架构 2.k8s 集群外的Prometheus的配置文件 [root@do1cloud03 prometheus]# cat prometheus.yml |egrep -v '#' glo ...
python urllib2 实现大文件下载
使用urllib2下载并分块copy: # from urllib2 import urlopen # Python 2 from urllib.request import urlopen # Py ...
记录sql中统计近五天数据的口径（While+IF）
话不多说,直接上码↓ IF OBJECT_ID('tempdb..#Table') IS NOT NULL BEGIN DROP TABLE #Table; END; DECLARE @tbRange ...
day20 logging模块、re模块
今日内容: 1.logging模块 2.re模块 1.logging模块 -- 什么是logging模块 -- logging模块是用来进行记录日志的模块,主要作用是将想要输出的日志进行分级,然后以不 ...
12.Flume的安装
先把flume包上传并解压给flume创建一个软链接给flume配置环境变量 #flume export FLUME_HOME=/opt/modules/flume export PATH=$PA ...
（转）关于Android中为什么主线程不会因为Looper.loop()里的死循环卡死？引发的思考，事实可能不是一个 epoll 那么简单。
( 转载请务必标明出处:http://www.cnblogs.com/linguanh/, 本文出自:[林冠宏(指尖下的幽灵)的博客]) 前序本文将会把一下三个问题阐述清楚以及一个网上的普遍观点的补 ...

MLE vs MAP: the connection between Maximum Likelihood and Maximum A Posteriori Estimation

MLE vs MAP: the connection between Maximum Likelihood and Maximum A Posteriori Estimation的更多相关文章

随机推荐

热门专题