If user has told us some relevant and some irrelevant documents, then we can proceed to build a probabilistic classifier, such as a Naive Bayes model.

Can we use probabilities to quantify our uncertainties?


Ranking method: 

Rank by probability of relevance of the document w.r.t. information need.

P(relevant | document i, query)

Bayes’ Optimal Decision Rulex is relevant(相关的)iff p(R|x) > p(NR|x)      

C - cost of retrieval of relevant document

C’- cost of retrieval of non-relevant document

C ⋅ p(R | d) + C ′ ⋅ (1− p(R | d))  ≤  C ⋅ p(R | d′ ) + C ′ ⋅ (1− p(R | d′ ))

for all d’ not yet retrieved, then d is the next document to be retrieved

 

  • How do we compute all those probabilisties?

  • 二值独立模型 - Binary Independence Model

(q位置没有变,odds 优势率)

 分母约去。

Query相关的话,文档Vecdor如此的概率是多少?需要估计。

思考:针对一个Query,某单词是否该出现在文档中呢?


假设 (重要):

pi = p ( xi = 1 | R , q );

ri = p ( xi = 1 | NR , q );

(去掉xi = 0后,乘的变多了,多了x=1, q=1的部分。在前一个连乘中乘以倒数,达到平衡。)

两个常量:

  query能获得有效返回的概率。

  every query 与vocabulary中的each word的相关的概率。 

一个变量:

  Retrieval Status Value

So, how do we compute ci ’s from our data ?

For each term i look at this table of document counts: 

(Term与doc的关系:出现但不一定相关;相关但不一定出现,比如computer与IBM)

pi = s / (S-s)

ri = (n-s) / (N-n-S+s)

Add 1⁄2 Smoothing

  

结论:一篇新文档出现,遂统计every Term与该doc的关系,得到Ci。


  • Okapi BM25: 一个非二值的模型 (略)

   

[IR] Probabilistic Model的更多相关文章

  1. Intro to Probabilistic Model

    概率论复习 概率(Probability) 频率学派(Frequentist):由大量试验得到的期望频率(致命缺陷:有些事情无法大量试验,例如一封邮件是垃圾邮件的概率,雷达探测的物体是一枚导弹的概率) ...

  2. 本人AI知识体系导航 - AI menu

    Relevant Readable Links Name Interesting topic Comment Edwin Chen 非参贝叶斯   徐亦达老板 Dirichlet Process 学习 ...

  3. [IR] Information Extraction

    阶段性总结 Boolean retrieval 单词搜索 [Qword1 and Qword2]               O(x+y) [Qword1 and Qword2]- 改进: Gallo ...

  4. PGM:概率图模型Graphical Model

    http://blog.csdn.net/pipisorry/article/details/51461878 概率图模型Graphical Models简介 完全通过代数计算来对更加复杂的模型进行建 ...

  5. [IR] Word Embeddings

    From: https://www.youtube.com/watch?v=pw187aaz49o Ref: http://blog.csdn.net/abcjennifer/article/deta ...

  6. 深度学习基础 Probabilistic Graphical Models | Statistical and Algorithmic Foundations of Deep Learning

    目录 Probabilistic Graphical Models Statistical and Algorithmic Foundations of Deep Learning 01 An ove ...

  7. FAQ: Machine Learning: What and How

    What: 就是将统计学算法作为理论,计算机作为工具,解决问题.statistic Algorithm. How: 如何成为菜鸟一枚? http://www.quora.com/How-can-a-b ...

  8. ### Paper about Event Detection

    Paper about Event Detection. #@author: gr #@date: 2014-03-15 #@email: forgerui@gmail.com 看一些相关的论文. 1 ...

  9. [ML] I'm back for Machine Learning

    Hi, Long time no see. Briefly, I plan to step into this new area, data analysis. In the past few yea ...

随机推荐

  1. 通过Ruby On Rails 框架来更好的理解MVC框架

    通过Ruby On Rails 框架来更好的理解MVC框架   1.背景    因为我在学习软件工程课程的时候,对于 MVC 框架理解不太深入,只是在理论层面上掌握,但是不知道如何在开发中使用 MVC ...

  2. ARM Cortex Debug Port Access Port DP AP JTAG-DP SW-DP SWJ-DP JTAG-AP MEM-AP

  3. React中props.children和React.Children的区别

    在React中,当涉及组件嵌套,在父组件中使用props.children把所有子组件显示出来.如下: function ParentComponent(props){ return ( <di ...

  4. 正在使用MJRefresh & MJExtension的App

    框架地址:https://github.com/CoderMJLee已经有上百个App用到了MJRefresh & MJExtension框架(只列出了其中一部分App):

  5. APK重签名总结

    keytool -genkey -alias aeo_android.keystore -keyalg RSA -validity 20000 -keystore aeo_android.keysto ...

  6. 40页PPT勾画“互联网颠覆性思维”----诠释互联网思维

    本文PPT内容涉及移动互联网的三个分支——移动电商.在线教育和新媒体. 不同领域一直是可以相互借鉴.相互渗透.相互学习的,在盈利模式和思维方式上有很多是共通的.  

  7. Road to the future——伪MVVM库Q.js

    模仿Vuejs的伪MVVM库,下面是使用说明 项目地址:https://github.com/miniflycn/Q.js 相关项目:https://github.com/miniflycn/Ques ...

  8. Conway's Game of Life: An Exercise in WPF, MVVM and C#

    This blog post was written for the Lockheed Martin Insight blog, sharing here for the external audie ...

  9. Android 模拟器检测

    参考链接:https://github.com/MindMac/HideAndroidEmulator 从多个方面识别模拟器1.用户习惯:联系人数量.短信数量.相册里面照片数量.安装的应用2.从IME ...

  10. C#排序比较

    与C#定义了相等性比较规范一样,C#也定义了排序比较规范,以确定一个对象与另一个对象的先后顺序.排序规范如下 IComparable接口(包括IComparable接口和IComparable< ...