Part 1: Theory

目录:

  • What's GMM?
  • How to solve GMM?
  • What's EM?
  • Explanation of the result

What's GMM?

GMM is short for Guassian Mixture Model, which can be represented as follows:
\[
p(\mathbf{x}) = \sum_{k=1}^{K}\pi_kp(\mathbf{x}|\theta_k)
\]

where,
\[
p(\mathbf{x}|\theta_k) = \frac{1}{2\pi^{\frac{d}{2}}|\Sigma_k|^{\frac{1}{2}}}exp\left[-\frac{1}{2}\left( \mathbf{x} - \mathbf{\mu_k} \right)^T\Sigma_k^{-1}\left( \mathbf{x} - \mathbf{\mu_k} \right)\right]
\]

represents the k$th$ Guassian componets of GMM and $\pi_k$ represents the scale factor of the k$th$ Guassian componets.

GMM can be used to estimate the PDF of given data, that is to say, we can suppose that the given data obey GMM distribution(we can also suppose that the given data obey single Guassian distribution, but GMM can describe more complex distribution).

Here is the problem, if the given data is showed as Figure 1, how can we estimate the distribution of these data?

Figure 1

If we use MLE(Maximum Likehood Estimation) to solve this problem, namely:
\[
\begin{split}
&\max L = \max log \prod_{n=1}^{N}p(\mathbf{x_n}) = max \sum_{n=1}^{N}log\sum_{k=1}^{K}\pi_kp(\mathbf{x_n}|\theta_k)\\
&\nabla_{\pi_k}L = 0 \quad \nabla_{\mu_k}L = 0 \quad \nabla_{\Sigma_k}L = 0
\end{split}
\]

We can't get the analytic solution, thus we should use the other algorithm to solve GMM.

How to solve GMM?

To begin with, let's analysis this GMM problem first. If we can get the parameters in GMM, which are $\pi_k, \Sigma_k$ and $\mu_k$, we solve GMM. So, our algorithm should estimate $\pi_k, \Sigma_k$ and $\mu_k$.

To simplify this problem, if we know each data point's Guassian distribution sperately, in other words, each data point belongs to one certain Guassian distribution and we have known that which Guassian distribution each data point belongs to, then we can use MLE to solve GMM sperately.

For example, in Figure 2, if have known that the same color data point from the same Guassian distribution, we can use MLE to each color group sperately to estimate $\Sigma_k$ and $\mu_k$. If these five color group have the same quntity of data points, then $\pi_k = 0.2$, $k=1,2,3,4,5$. In this situation, GMM can be easily solved.

Figure 2

But, the problem is, we don't know which Guassian distribution each data point belongs to ! Thus there should be a hidden parameter to control which Guassian distribution the n$th$ data point belongs to.

Now lets define
$$z_{nk}\in\{0,1\}$$

$z_{nk}=1$ for the n$th$ point belongs the k$th$ Guassian distribution

$z_{nk}=0$ for the n$th$ point doesn't belong the k$th$ Guassian distribution

Use $z_{nk}$ we can rewrite the likehood function as follows:
\[
L = \log \prod_{i=1}^{N}\prod_{k=1}^{K}\pi_k^{z_{nk}}p(\mathbf{x_n}|\theta_k)^{z_{nk}}\\
\]

Notice that, if we define $z_{nk}$, then each data point can be decribe by only one guassian distribution. Thus, $\prod_{k=1}^{K}\pi_k^{z_{nk}}p(\mathbf{x_n}|\theta_k)^{z_{nk}}$ can be used to describe each data point's probability density. Although $\prod_{k=1}^{K}\pi_k^{z_{nk}}p(\mathbf{x_n}|\theta_k)^{z_{nk}}$ has the form of '$\prod$', $z_{nk}$ can be 1 only one time when given $n$ for all $k$.

Lets continue to write likehood function:
\[
\begin{split}
L &= \log \prod_{i=1}^{N}\prod_{k=1}^{K}\pi_k^{z_{nk}}p(\mathbf{x_n}|\theta_k)^{z_{nk}}\\
& = \sum_{i=1}^{N}\sum_{k=1}^{K}\log\pi_k^{z_{nk}} p(\mathbf{x_n}|\theta_k)^{z_{nk}}\\
& = \sum_{i=1}^{N}\sum_{k=1}^{K}\left[z_{nk}\log\pi_k + z_{nk}\log p(\mathbf{x_n}|\theta_k)\right]\\
& = \sum_{i=1}^{N}\sum_{k=1}^{K}\left[z_{nk}\log\pi_k + z_{nk}\log p(\mathbf{x_n}|\Sigma_k,\mathbf{\mu_k})\right]
\end{split}
\]

In this likehood function, there are three exposed parameters $\pi_k$, $\Sigma_k$ and $\mathbf{\mu_k}$, which we will solve. There is one hidden parameter $z_{nk}$, which is not included in the final result.

Now, how to solve exposed parameter $\pi_k$, $\Sigma_k$ and $\mathbf{\mu_k}$ with respect to the hidden paramer $z_{nk}$ ?

What's EM?

To solve above question, we should use EM algorithm, which has two parts: E(Expection) part and M(Maximum) part.

E part: calculating the exception of the likehood function with respect to hidden parameter.

M part: finding the right exposed parameters that maximize the expection.And go back E part to iterate.(Notice that the hidden parameter and exposed parameters influence each other! Thus, when go to the E part again, the exception will change.)

As for the above GMM problem, the hidden parameter is $z_{nk}$.

So, in E part, we should calculate the expection of the likehood function with respect to $z_{nk}$, which is:

\[
\begin{split}
Q &= E_{z_{nk}}\{L\}\\
& = E_{z_{nk}}\{ \sum_{i=1}^{N}\sum_{k=1}^{K}\left[z_{nk}\log\pi_k + z_{nk}\log p(\mathbf{x_n}|\Sigma_k,\mathbf{\mu_k})\right] \}\\
& = \sum_{i=1}^{N}\sum_{k=1}^{K}p(z_{nk}=1)\left[\log\pi_k + \log p(\mathbf{x_n}|\Sigma_k,\mathbf{\mu_k})\right] + \sum_{i=1}^{N}\sum_{k=1}^{K}p(z_{nk}=0)\left[0\log\pi_k + 0\log p(\mathbf{x_n}|\Sigma_k,\mathbf{\mu_k})\right]\\
& = \sum_{i=1}^{N}\sum_{k=1}^{K}p(z_{nk}=1)\left[\log\pi_k + \log p(\mathbf{x_n}|\Sigma_k,\mathbf{\mu_k})\right]
\end{split}
\]

Notice that, in interation process (Suppose we have known $\pi_k$, $\Sigma_k$ and $\mathbf{\mu_k}$)
\[
p(z_{nk}=1) = \frac{\pi_k p(\mathbf{x_n}|\Sigma_k,\mathbf{\mu_k})}{\sum_{j=1}^{K}\pi_j p(\mathbf{x_n}|\Sigma_j,\mathbf{\mu_j})}
\]

In M part:
\[
\nabla_{\pi_k}Q = 0 \quad \nabla_{\mu_k}Q = 0 \quad \nabla_{\Sigma_k}Q = 0
\]

We can get:
\[
\begin{split}
&\mathbf{\mu_k}^{new} = \frac{1}{N_k}\sum_{n=1}^{N}p(z_{nk}=1)\mathbf{x_n}\\
&\Sigma_k^{new} = \frac{1}{N_k}\sum_{n=1}^{N}p(z_{nk}=1)(\mathbf{x} - \mathbf{\mu_k^{new}})(\mathbf{x} - \mathbf{\mu_k^{new}})^T\\
&\pi_k^{new} = \frac{N_k}{N}\\
&N_k = \sum_{i=1}^{n}p(z_{nk}=1)
\end{split}
\]

Thus we can firstly initial $\pi_k$, $\Sigma_k$ and $\mathbf{\mu_k}$, then calculate $p(z_{nk}=1)$, then calculate new $\pi_k$, $\Sigma_k$ and $\mathbf{\mu_k}$, the calculate $p(z_{nk}=1)$, then ... until the solution converges.

Explanation of the result

Analyzing the result, there is an explanation:

The result can be treated as cluster, which cluster $N$ people to $K$ groups :

1. Number of people in the k$th$ group($N_k$) is the sum of gene($p(z_{nk}=1)$), which represents how much the n$th$ people belongs to the k$th$ group.

2. Each person has a weight($\mathbf{x_n}$), so when we cluster people in groups, we want to know what's the average weight($\mathbf{\mu_k}$) in each group, and what's the weight variance($\Sigma_k$) in each group.

3. When we calculate the average weight in one group, we should calculate the total weight in this group($\sum_{n=1}^{N}p(z_{nk}=1)\mathbf{x_n}$), and then divide the number of people in this group($N_k$).

4. When we calculate the weight variance in one group, we should calculate the total weight variance in one group($\sum_{n=1}^{N}p(z_{nk}=1)(\mathbf{x} - \mathbf{\mu_k^{new}})(\mathbf{x} - \mathbf{\mu_k^{new}})^T$), and then divide the number of people in this group($N_k$).

5. $\pi_k$ can treated as the population proportion that the k$th$ group takes up.

Matlab code for em algorithm can be found in "EM and GMM(Code)"

EM and GMM(Theory)的更多相关文章

  1. EM and GMM(Code)

    In EM and GMM(Theory), I have introduced the theory of em algorithm for gmm. Now lets practice it in ...

  2. css里px em rem特点(转)

    1.px特点: 1.IE无法调整px作为单位的字体大小: 2.Firefox能够调整px.em和rem. px是像素,是相对长度单位,是相对于显示器屏幕分辨率而言的. 2.em特点: 1.em的值并不 ...

  3. px和em的区别(转)

    在国内网站中,包括三大门户,以及“引领”中国网站设计潮流的蓝色理想,ChinaUI等都是使用了px作为字体单位.只有百度好歹做了个可调的表率.而 在大洋彼岸,几乎所有的主流站点都使用em作为字体单位, ...

  4. B和strong以及i和em的区别(转)

    B和strong以及i和em的区别 (2013-12-31 13:58:35) 标签: b strong i em 搜索引擎 分类: 网页制作 一直以来都以为B和strong以及i和em是相同的效果, ...

  5. 机器学习算法(优化)之二:期望最大化(EM)算法

    EM算法概述 (1)数学之美的作者吴军将EM算法称之为上帝的算法,EM算法也是大家公认的机器学习十大经典算法之一.EM是一种专门用于求解参数极大似然估计的迭代算法,具有良好的收敛性和每次迭代都能使似然 ...

  6. HTML5周记(一)

    各位开发者朋友和技术大神大家好!博主刚开始学习html5 ,自本周开始会每周更新技术博客,与大家分享每周所学.鉴于博主水品有限,如发现有问题的地方欢迎大家指正,有更好的意见和建议可在评论下方发表,我会 ...

  7. 从零开始学 Web 之 移动Web(一)屏幕相关基本知识,调试,视口,屏幕适配

    大家好,这里是「 从零开始学 Web 系列教程 」,并在下列地址同步更新...... github:https://github.com/Daotin/Web 微信公众号:Web前端之巅 博客园:ht ...

  8. day6 云道页面 知识点梳理(1)

    关于块级元素.行内元素.行内块元素的梳理 (1)块级元素 特点:   a.可以设置宽高,行高,外边距和内边距   b.块级元素会独占一行    c.宽度默认是容器的100%    d.可以容纳内联元素 ...

  9. EM算法(2):GMM训练算法

    目录 EM算法(1):K-means 算法 EM算法(2):GMM训练算法 EM算法(3):EM算法运用 EM算法(4):EM算法证明 EM算法(2):GMM训练算法 1. 简介 GMM模型全称为Ga ...

随机推荐

  1. Java编程思想非主流知识点

    1. Java中的多态性理解(注意与C++区分) Java中除了static方法和final方法(private方法本质上属于final方法,因为不能被子类访问)之外,其它所有的方法都是动态绑定,这意 ...

  2. Django form模块使用心得

    最近用Django 写了一个网站,现在来分享一下对Django form 的一些心得. 一,创建一个表单 创建一个Form表单有两种方式: 第一种方式是继承于forms.Form,的一个子类,通过在f ...

  3. pageX,clientX,offsetX,layerX的那些事

    在各个浏览器的JS中,有很多个让你十分囧的属性,由于各大厂商对标准的解释和执行不一样,导致十分混乱,也让我们这些前端攻城狮十分无语和纠结>_< John Resig大神说过,动态元素有3个 ...

  4. tomcat 修改网站路径(Java之负基础实战)

    1.找到server.xml 在tomcat安装路径/conf/server.xml 2.搜索webapps 添加 <Context path="" docBace=&quo ...

  5. JAVA语言中冒号的用法

    近来由于本人要介入android平台的开发,所以就买了本JAVA语言的书学习.学习一段时间来,我的感觉是谭浩强就是厉害,编写的<C编程语言>系列丛书不愧是经典.书中对C语言的介绍既系统又全 ...

  6. magento1.x 运行在 php7 优惠券的问题

    diff --git app/code/core/Mage/Sales/etc/config.xml index 5bb43d7..7db62ca 100644 --- app/code/core/M ...

  7. redhat6 + 11G DG部署

    在主库中netca配置 [oracle@HE3dbs]$ cat /u01/app/oracle/product/11gr2/db_1/network/admin/listener.ora #list ...

  8. 10款面向HTML5 画布(Canvas)的JavaScript库

    https://www-evget-com/article/2014/4/9/20799.html

  9. Oracle Job 语法和时间间隔的设定(转)

    http://blog.itpub.net/27157/viewspace-425567/ 初始化相关参数job_queue_processesalter system set job_queue_p ...

  10. delphi 预览图片2 (MouseUP)

    这个是自己项目在使用的,所以带有些业务功能的代码. 逻辑上使用的大多是 mouseup ,MouseMove,Mousedown.使用recttangle容器实现滑动.网上有这个下载demo. 另外移 ...