转自:http://www.hongliangjie.com/2010/01/04/notes-on-probabilistic-latent-semantic-analysis-plsa/

I highly recommend you read the more detailed version of http://arxiv.org/abs/1212.3900

Formulation of PLSA

There are two ways to formulate PLSA. They are equivalent but may lead to different inference process.

Let’s see why these two equations are equivalent by using Bayes rule.

The whole data set is generated as (we assume that all words are generated independently):

The Log-likelihood of the whole data set for (1) and (2) are:

EM

For or , the optimization is hard due to the log of sum. Therefore, an algorithm called Expectation-Maximization is usually employed. Before we introduce anything about EM, please note that EM is only guarantee to find a local optimum (although it may be a global one).

First, we see how EM works in general. As we shown for PLSA, we usually want to estimate the likelihood of data, namely , given the paramter . The easiest way is to obtain a maximum likelihood estimator by maximizing . However, sometimes, we also want to include some hidden variables which are usually useful for our task. Therefore, what we really want to maximize is , the complete likelihood. Now, our attention becomes to this complete likelihood. Again, directly maximizing this likelihood is usually difficult. What we would like to show here is to obtain a lower bound of the likelihood and maximize this lower bound.

We need Jensen’s Inequality to help us obtain this lower bound. For any convex function , Jensen’s Inequality states that :

Thus, it is not difficult to show that :

and for concave functions (like logarithm), it is :

Back to our complete likelihood, we can obtain the following conclusion by using concave version of Jensen’s Inequality :

Therefore, we obtained a lower bound of complete likelihood and we want to maximize it as tight as possible. EM is an algorithm that maximize this lower bound through a iterative fashion. Usually, EM first would fix current value and maximize and then use the new value to obtain a new guess on , which is essentially a two stage maximization process. The first step can be shown as follows:

The first term is the same for all . Therefore, in order to maximize the whole equation, we need to minimize KL divergence between and , which eventually leads to the optimum solution of . So, usually for E-step, we use current guess of to calculate the posterior distribution of hidden variable as the new update score. For M-step, it is problem-dependent. We will see how to do that in later discussions.

Another explanation of EM is in terms of optimizing a so-called Q function. We devise the data generation process as . Therefore, the complete likelihood is modified as:

Think about how to maximize . Instead of directly maximizing it, we can iteratively maximize as :

Now take the expectation of this equation, we have:

The last term is always non-negative since it can be recognized as the KL-divergence of and . Therefore, we obtain a lower bound of Likelihood :

The last two terms can be treated as constants as they do not contain the variable , so the lower bound is essentially the first term, which is also sometimes called as “Q-function”.

EM of Formulation 1

In case of Formulation 1, let us introduce hidden variables to indicate which hidden topic is selected to generated in (). Therefore, the complete likelihood can be formulated as :

From the equation above, we can write our Q-function for the complete likelihood :

For E-step, simply using Bayes Rule, we can obtain:

For M-step, we need to maximize Q-function, which needs to be incorporated with other constraints:

and take all derivatives:

Therefore, we can easily obtain:

EM of Formulation 2

Use similar method to introduce hidden variables to indicate which is selected to generated and and we can have the following complete likelihood :

Therefore, the Q-function would be :

For E-step, again, simply using Bayes Rule, we can obtain:

For M-step, we maximize the constraint version of Q-function:

and take all derivatives:

Therefore, we can easily obtain:

Notes on Probabilistic Latent Semantic Analysis (PLSA)的更多相关文章

  1. NLP —— 图模型(三)pLSA(Probabilistic latent semantic analysis,概率隐性语义分析)模型

    LSA(Latent semantic analysis,隐性语义分析).pLSA(Probabilistic latent semantic analysis,概率隐性语义分析)和 LDA(Late ...

  2. 主题模型之概率潜在语义分析(Probabilistic Latent Semantic Analysis)

    上一篇总结了潜在语义分析(Latent Semantic Analysis, LSA),LSA主要使用了线性代数中奇异值分解的方法,但是并没有严格的概率推导,由于文本文档的维度往往很高,如果在主题聚类 ...

  3. Latent semantic analysis note(LSA)

    1 LSA Introduction LSA(latent semantic analysis)潜在语义分析,也被称为LSI(latent semantic index),是Scott Deerwes ...

  4. 主题模型之潜在语义分析(Latent Semantic Analysis)

    主题模型(Topic Models)是一套试图在大量文档中发现潜在主题结构的机器学习模型,主题模型通过分析文本中的词来发现文档中的主题.主题之间的联系方式和主题的发展.通过主题模型可以使我们组织和总结 ...

  5. Latent Semantic Analysis (LSA) Tutorial 潜语义分析LSA介绍 一

    Latent Semantic Analysis (LSA) Tutorial 译:http://www.puffinwarellc.com/index.php/news-and-articles/a ...

  6. 潜语义分析(Latent Semantic Analysis)

    LSI(Latent semantic indexing, 潜语义索引)和LSA(Latent semantic analysis,潜语义分析)这两个名字其实是一回事.我们这里称为LSA. LSA源自 ...

  7. 潜在语义分析Latent semantic analysis note(LSA)原理及代码

    文章引用:http://blog.sina.com.cn/s/blog_62a9902f0101cjl3.html Latent Semantic Analysis (LSA)也被称为Latent S ...

  8. 海量数据挖掘MMDS week4: 推荐系统之隐语义模型latent semantic analysis

    http://blog.csdn.net/pipisorry/article/details/49256457 海量数据挖掘Mining Massive Datasets(MMDs) -Jure Le ...

  9. Latent Semantic Analysis(LSA/ LSI)原理简介

    LSA的工作原理: How Latent Semantic Analysis Works LSA被广泛用于文献检索,文本分类,垃圾邮件过滤,语言识别,模式检索以及文章评估自动化等场景. LSA其中一个 ...

随机推荐

  1. Android app 别用中文名

    /************************************************************************* * Android app 别用中文名 * 说明: ...

  2. Java 图片提取RGB数组 RGBOfCharMaps (整理)

    package demo; /** * Java 图片提取RGB数组 RGBOfCharMaps (整理) * 声明: * 和ImageCombining配合使用的工具,这里是提取图片的R.G.B生成 ...

  3. zipline tradingcalendar

    zipline/utils/ tradingcalendar.py tradingcalendar_bmf.py 巴西商品期货交易所 tradingcalendar_lse.py 伦敦证券交易所 tr ...

  4. 08day2

    引爆炸弹 贪心 [问题描述] 有 n 个炸弹,有些炸弹牵了一根单向引线(也就是说引线只有在这一端能被炸弹点燃),只要引爆了这个炸弹,用引线连接的下一个炸弹也会爆炸.每个炸弹还有个得分,当这个炸弹被引爆 ...

  5. Java 中的二维数组

    所谓二维数组,可以简单的理解为是一种“特殊”的一维数组,它的每个数组空间中保存的是一个一维数组. 那么如何使用二维数组呢,步骤如下: 1. 声明数组并分配空间 或者 如: 2. 赋值 二维数组的赋值, ...

  6. Android 中像素px和dp的转化

    在Android的布局文件中,往往使用dp作为控件的宽度和高度尺寸,但是在Java代码中,调用getWidth()方法获得的尺寸单位却是像素px,这两个单位有明显的区别:dp和屏幕的密度有关,而px与 ...

  7. HTTP长连接(Comet)实现方式示例

    昨天看了comet的介绍后,虽然大概知道了comet的原理,不过没实际用过还是不太清楚,于是今天又在网上翻了一下别的网友共享的comet实现http长连接的例子,在 Comet Server Push ...

  8. OSX学习01之更新头像

    前不久在官网上守株待兔,买了一个官翻版865,其实最想买294的,可是米不足啊——所以,在同时下了865和293的订单,并纠结了一天后,确定了865,剩余的钱够一个Mac mini了,如果不买也可以日 ...

  9. C++模板详解

    参考:C++ 模板详解(一) 模板:对类型进行参数化的工具:通常有两种形式: 函数模板:仅参数类型不同: 类模板:   仅数据成员和成员函数类型不同. 目的:让程序员编写与类型无关的代码. 注意:模板 ...

  10. 数往知来C#面向对象〈三〉

    C# 基础方法篇 一.复习 1)方法的重载 方法的重载并不是一个方法,实际上方法的重载就是一些不同的 方法,目的是为了方便程序员编码,所以将功能相近的方法命名相同 根据参数,编译器自动的去匹配方法体, ...