Notes on Probabilistic Latent Semantic Analysis (PLSA)
转自:http://www.hongliangjie.com/2010/01/04/notes-on-probabilistic-latent-semantic-analysis-plsa/
I highly recommend you read the more detailed version of http://arxiv.org/abs/1212.3900
Formulation of PLSA
There are two ways to formulate PLSA. They are equivalent but may lead to different inference process.
Let’s see why these two equations are equivalent by using Bayes rule.
The whole data set is generated as (we assume that all words are generated independently):
The Log-likelihood of the whole data set for (1) and (2) are:
EM
For or
, the optimization is hard due to the log of sum. Therefore, an algorithm called Expectation-Maximization is usually employed. Before we introduce anything about EM, please note that EM is only guarantee to find a local optimum (although it may be a global one).
First, we see how EM works in general. As we shown for PLSA, we usually want to estimate the likelihood of data, namely , given the paramter
. The easiest way is to obtain a maximum likelihood estimator by maximizing
. However, sometimes, we also want to include some hidden variables which are usually useful for our task. Therefore, what we really want to maximize is
, the complete likelihood. Now, our attention becomes to this complete likelihood. Again, directly maximizing this likelihood is usually difficult. What we would like to show here is to obtain a lower bound of the likelihood and maximize this lower bound.
We need Jensen’s Inequality to help us obtain this lower bound. For any convex function , Jensen’s Inequality states that :
Thus, it is not difficult to show that :
and for concave functions (like logarithm), it is :
Back to our complete likelihood, we can obtain the following conclusion by using concave version of Jensen’s Inequality :
Therefore, we obtained a lower bound of complete likelihood and we want to maximize it as tight as possible. EM is an algorithm that maximize this lower bound through a iterative fashion. Usually, EM first would fix current value and maximize
and then use the new
value to obtain a new guess on
, which is essentially a two stage maximization process. The first step can be shown as follows:
The first term is the same for all . Therefore, in order to maximize the whole equation, we need to minimize KL divergence between
and
, which eventually leads to the optimum solution of
. So, usually for E-step, we use current guess of
to calculate the posterior distribution of hidden variable as the new update score. For M-step, it is problem-dependent. We will see how to do that in later discussions.
Another explanation of EM is in terms of optimizing a so-called Q function. We devise the data generation process as . Therefore, the complete likelihood is modified as:
Think about how to maximize . Instead of directly maximizing it, we can iteratively maximize
as :
Now take the expectation of this equation, we have:
The last term is always non-negative since it can be recognized as the KL-divergence of and
. Therefore, we obtain a lower bound of Likelihood :
The last two terms can be treated as constants as they do not contain the variable , so the lower bound is essentially the first term, which is also sometimes called as “Q-function”.
EM of Formulation 1
In case of Formulation 1, let us introduce hidden variables to indicate which hidden topic
is selected to generated
in
(
). Therefore, the complete likelihood can be formulated as :
From the equation above, we can write our Q-function for the complete likelihood :
For E-step, simply using Bayes Rule, we can obtain:
For M-step, we need to maximize Q-function, which needs to be incorporated with other constraints:
and take all derivatives:
Therefore, we can easily obtain:
EM of Formulation 2
Use similar method to introduce hidden variables to indicate which is selected to generated
and
and we can have the following complete likelihood :
Therefore, the Q-function would be :
For E-step, again, simply using Bayes Rule, we can obtain:
For M-step, we maximize the constraint version of Q-function:
and take all derivatives:
Therefore, we can easily obtain:
Notes on Probabilistic Latent Semantic Analysis (PLSA)的更多相关文章
- NLP —— 图模型(三)pLSA(Probabilistic latent semantic analysis,概率隐性语义分析)模型
LSA(Latent semantic analysis,隐性语义分析).pLSA(Probabilistic latent semantic analysis,概率隐性语义分析)和 LDA(Late ...
- 主题模型之概率潜在语义分析(Probabilistic Latent Semantic Analysis)
上一篇总结了潜在语义分析(Latent Semantic Analysis, LSA),LSA主要使用了线性代数中奇异值分解的方法,但是并没有严格的概率推导,由于文本文档的维度往往很高,如果在主题聚类 ...
- Latent semantic analysis note(LSA)
1 LSA Introduction LSA(latent semantic analysis)潜在语义分析,也被称为LSI(latent semantic index),是Scott Deerwes ...
- 主题模型之潜在语义分析(Latent Semantic Analysis)
主题模型(Topic Models)是一套试图在大量文档中发现潜在主题结构的机器学习模型,主题模型通过分析文本中的词来发现文档中的主题.主题之间的联系方式和主题的发展.通过主题模型可以使我们组织和总结 ...
- Latent Semantic Analysis (LSA) Tutorial 潜语义分析LSA介绍 一
Latent Semantic Analysis (LSA) Tutorial 译:http://www.puffinwarellc.com/index.php/news-and-articles/a ...
- 潜语义分析(Latent Semantic Analysis)
LSI(Latent semantic indexing, 潜语义索引)和LSA(Latent semantic analysis,潜语义分析)这两个名字其实是一回事.我们这里称为LSA. LSA源自 ...
- 潜在语义分析Latent semantic analysis note(LSA)原理及代码
文章引用:http://blog.sina.com.cn/s/blog_62a9902f0101cjl3.html Latent Semantic Analysis (LSA)也被称为Latent S ...
- 海量数据挖掘MMDS week4: 推荐系统之隐语义模型latent semantic analysis
http://blog.csdn.net/pipisorry/article/details/49256457 海量数据挖掘Mining Massive Datasets(MMDs) -Jure Le ...
- Latent Semantic Analysis(LSA/ LSI)原理简介
LSA的工作原理: How Latent Semantic Analysis Works LSA被广泛用于文献检索,文本分类,垃圾邮件过滤,语言识别,模式检索以及文章评估自动化等场景. LSA其中一个 ...
随机推荐
- Python [Leetcode 141]Linked List Cycle
题目描述: Given a linked list, determine if it has a cycle in it. 解题思路: 快的指针和慢的指针 代码如下: # Definition for ...
- HDU 产生冠军 2094
解题思路:这题重在分析,可能你知道的越多,这题想得越多,什么并查集,什么有向图等. 事实是,我们会发现,只要找到一个,并且仅有一个的入度为0的点,说明可以找出 冠军.若入度为0的点一个都没有,说明 ...
- C# 编写Windows Service(windows服务程序)【转载】
[转]http://www.cnblogs.com/bluestorm/p/3510398.html Windows Service简介: 一个Windows服务程序是在Windows操作系统下能完成 ...
- acess() 判断目录是否存在
acess()功能描述: 检查调用进程是否可以对指定的文件执行某种操作. <pre lang="c" escaped="true">#include ...
- JSP的九个隐式(内置)对象
1.out 转译后对应JspWriter对象,其内部关联一个PrintWriter对象.是向客户端输出内容常用的对象. 2.request 转译后对应HttpServletRequest对象.客户端的 ...
- Android使用绘图Path总结
Path作为Android中一种相对复杂的绘图方式,官方文档中的有些解释并不是很好理解,这里作一个相对全面一些的总结,供日后查看,也分享给大家,共同进步. 1.基本绘图方法 addArc(RectF ...
- JS一般般的网页重构可以使用Node.js做些什么(转)
一.非计算机背景前端如何快速了解Node.js? 做前端的应该都听过Node.js,偏开发背景的童鞋应该都玩过. 对于一些没有计算机背景的,工作内容以静态页面呈现为主的前端,可能并未把玩过Node.j ...
- C++模板实例掌握
前段时间重新学习C++,主要看C++编程思想和C++设计新思维.对模版的使用有了更进一层的了解,特总结如下: 下面列出了模版的常用情况: << '\n';} //参考:http://ww ...
- Delphi VclSkin使用教程
1. TSkinData TSkinData 主要用于美化你的程序, 只要把TSkinData控件放下去,它就能自动美化所有窗体. 属性 Active: 使用或取消对程序的美化. DisableT ...
- Learning Vector
题意: 给出n组x,y增量,从(0,0)开始以x,y坐标增加后等到的终点坐标,可以构成一个面积,再以这个终点为起点再增加,以此类推,使用增量顺序不同,得到的面积不,求用k组增量能得到的最大的面积. 分 ...