Comparison

  LSA pLSA
1. Theoretical background Linear Algebra Probabilities and Statistics
2. Objective function Frobenius norm Likelihood function
3. Polysemy No Yes
4. Folding-in Straightforward Complicated

1. LSA stems from Linear Algebra as it is nothing more than a Singular Value Decomposition. On the other hand, pLSA has a strong probabilistic grounding (latent variable models).

2. SVD is a least squares method (it finds a low-rank matrix approximation that minimizes the Frobenius norm of the difference with the original matrix). Moreover, as it is well known in Machine Learning, the least squares solution corresponds to the Maximum Likelihood solution when experimental errors are gaussian. Therefore, LSA makes an implicit assumption of gaussian noise on the term counts. On the other hand, the objective function maximized in pLSA is the likelihood function of multinomial sampling.

The values in the concept-term matrix found by LSA are not normalized and may even contain negative values. On the other hand, values found by pLSA are probabilities which means they are interpretable and can be combined with other models.

Note: SVD is equivalent to PCA (Principal Component Analysis) when the data is centered (has zero-mean).

3. Both LSA and pLSA can handle synonymy but LSA cannot handle polysemy, as words are defined by a unique point in a space.

4. LSA and pLSA analyze a corpus of documents in order to find a new low-dimensional representation of it. In order to be comparable, new documents that were not originally in the corpus must be projected in the lower-dimensional space too. This is called “folding-in”. Clearly, new documents folded-in don’t contribute to learning the factored representation so it is necessary to rebuild the model using all the documents from time to time.

In LSA, folding-in is as easy as a matrix-vector product. In pLSA, this requires several iterations of the EM algorithm.

LSA和pLSA的比较的更多相关文章

  1. LSA,pLSA原理及其代码实现

    一. LSA 1. LSA原理 LSA(latent semantic analysis)潜在语义分析,也被称为 LSI(latent semantic index),是 Scott Deerwest ...

  2. 文本情感分析(一):基于词袋模型(VSM、LSA、n-gram)的文本表示

    现在自然语言处理用深度学习做的比较多,我还没试过用传统的监督学习方法做分类器,比如SVM.Xgboost.随机森林,来训练模型.因此,用Kaggle上经典的电影评论情感分析题,来学习如何用传统机器学习 ...

  3. LDA

    2 Latent Dirichlet Allocation Introduction LDA是给文本建模的一种方法,它属于生成模型.生成模型是指该模型可以随机生成可观测的数据,LDA可以随机生成一篇由 ...

  4. bow lsa plsa

    Bag-of-Words (BoW) 模型是NLP和IR领域中的一个基本假设.在这个模型中,一个文档(document)被表示为一组单词(word/term)的无序组合,而忽略了语法或者词序的部分.B ...

  5. 一口气讲完 LSA — PlSA —LDA在自然语言处理中的使用

    自然语言处理之LSA LSA(Latent Semantic Analysis), 潜在语义分析.试图利用文档中隐藏的潜在的概念来进行文档分析与检索,能够达到比直接的关键词匹配获得更好的效果. LSA ...

  6. Latent semantic analysis note(LSA)

    1 LSA Introduction LSA(latent semantic analysis)潜在语义分析,也被称为LSI(latent semantic index),是Scott Deerwes ...

  7. NLP —— 图模型(三)pLSA(Probabilistic latent semantic analysis,概率隐性语义分析)模型

    LSA(Latent semantic analysis,隐性语义分析).pLSA(Probabilistic latent semantic analysis,概率隐性语义分析)和 LDA(Late ...

  8. DL4NLP——词表示模型(一)表示学习;syntagmatic与paradigmatic两类模型;基于矩阵的LSA和GloVe

    本文简述了以下内容: 什么是词表示,什么是表示学习,什么是分布式表示 one-hot representation与distributed representation(分布式表示) 基于distri ...

  9. [IR] Concept Search and PLSA

    [Topic Model]主题模型之概率潜在语义分析(Probabilistic Latent Semantic Analysis) 感觉LDA在实践中的优势其实不大,学好pLSA才是重点 阅读笔记 ...

随机推荐

  1. jQuery和AngularJS的区别

    这篇文章主要介绍了jQuery和AngularJS的区别浅析,本文着重讲解一个熟悉jQuery开的程序员如何应对AngularJS中的一些编程思想的转变,需要的朋友可以参考下   最近一直在研究ang ...

  2. win10 删除设备和驱动器中你不要的图标

    设备和驱动器可能有很多你不想要的东西,360云盘,百度网盘,微云-- 删除设备和驱动器中的百度云图标,360网盘图标,要去注册表 运行 regedit 点开 HKEY_CURRENT_USER\SOF ...

  3. Windows 10新功能

    Windows 10 中面向开发人员的新增功能 Windows 10 及新增的开发人员工具将提供新通用 Windows 平台支持的工具.功能和体验.在 Windows 10 上安装完工具和 SDK后, ...

  4. LeetCode 64. Minimum Path Sum(最小和的路径)

    Given a m x n grid filled with non-negative numbers, find a path from top left to bottom right which ...

  5. python 带小数点时间格式化

    #获取带小数点的时间>>> import datetime #当前时间加3天 >>> t1 = datetime.datetime.now() + datetime ...

  6. 浅析Java源码之ArrayList

    面试题经常会问到LinkedList与ArrayList的区别,与其背网上的废话,不如直接撸源码! 文章源码来源于JRE1.8,java.util.ArrayList 既然是浅析,就主要针对该数据结构 ...

  7. VS2008 生成静态链接库并使用

    1.启动VS2008创建一个Win32控制台程序 2.选择静态库 3.创建两个文件lib.h和lib.cpp //lib.h #ifndef LIB_H #define LIB_H int add(i ...

  8. Red and Black

    Problem Description There is a rectangular room, covered with square tiles. Each tile is colored eit ...

  9. js中的浅复制和深复制

    浅复制:浅复制是复制引用,复制后的引用都是指向同一个对象的实例,彼此之间的操作会互相影响 深复制:深复制不是简单的复制引用,而是在堆中重新分配内存,并且把源对象实例的所有属性都进行新建复制,以保证深复 ...

  10. Spring框架学习之注解配置与AOP思想

         上篇我们介绍了Spring中有关高级依赖关系配置的内容,也可以调用任意方法的返回值作为属性注入的值,它解决了Spring配置文件的动态性不足的缺点.而本篇,我们将介绍Spring的又一大核心 ...