ES忽略TF-IDF评分——使用constant

Ignoring TF/IDF

Sometimes we just don’t care about TF/IDF. All we want to know is that a certain word appears in a field. Perhaps we are searching for a vacation home and we want to find houses that have as many of these features as possible:

WiFi
Garden
Pool

The vacation home documents look something like this:

{ "description": "A delightful four-bedroomed house with ... " }

We could use a simple match query:

GET /_search

{

  "query": {

    "match": {

      "description": "wifi garden pool"

    }

  }

}

However, this isn’t really full-text search. In this case, TF/IDF just gets in the way. We don’t care whether wifi is a common term, or how often it appears in the document. All we care about is that it does appear. In fact, we just want to rank houses by the number of features they have—the more, the better. If a feature is present, it should score 1, and if it isn’t, 0.

constant_score Query

Enter the constant_score query. This query can wrap either a query or a filter, and assigns a score of1 to any documents that match, regardless of TF/IDF:

GET /_search

{

  "query": {

    "bool": {

      "should": [

        { "constant_score": {

          "query": { "match": { "description": "wifi" }}

        }},

        { "constant_score": {

          "query": { "match": { "description": "garden" }}

        }},

        { "constant_score": {

          "query": { "match": { "description": "pool" }}

        }}

      ]

    }

  }

}

Perhaps not all features are equally important—some have more value to the user than others. If the most important feature is the pool, we could boost that clause to make it count for more:

GET /_search

{

  "query": {

    "bool": {

      "should": [

        { "constant_score": {

          "query": { "match": { "description": "wifi" }}

        }},

        { "constant_score": {

          "query": { "match": { "description": "garden" }}

        }},

        { "constant_score": {

          "boost":   2 

          "query": { "match": { "description": "pool" }}

        }}

      ]

    }

  }

}

A matching pool clause would add a score of 2, while the other clauses would add a score of only 1 each.

The final score for each result is not simply the sum of the scores of all matching clauses. The coordination factor and query normalization factor are still taken into account.

We could improve our vacation home documents by adding a not_analyzed features field to our vacation homes:

{ "features": [ "wifi", "pool", "garden" ] } 这样改写有什么好处？省索引空间吗？

参考：https://www.elastic.co/guide/en/elasticsearch/guide/current/ignoring-tfidf.html#ignoring-tfidf

ES忽略TF-IDF评分——使用constant_score的更多相关文章

Elasticsearch由浅入深（十）搜索引擎：相关度评分 TF&IDF算法、doc value正排索引、解密query、fetch phrase原理、Bouncing Results问题、基于scoll技术滚动搜索大量数据
相关度评分 TF&IDF算法 Elasticsearch的相关度评分(relevance score)算法采用的是term frequency/inverse document frequen ...
Elasticsearch学习之相关度评分TF&IDF
relevance score算法,简单来说,就是计算出,一个索引中的文本,与搜索文本,他们之间的关联匹配程度 Elasticsearch使用的是 term frequency/inverse doc ...
25.TF&IDF算法以及向量空间模型算法
主要知识点: boolean model IF/IDF vector space model 一.boolean model 在es做各种搜索进行打分排序时,会先用boolean mo ...
TF/IDF计算方法
FROM:http://blog.csdn.net/pennyliang/article/details/1231028 我们已经谈过了如何自动下载网页.如何建立索引.如何衡量网页的质量(Page R ...
信息检索中的TF/IDF概念与算法的解释
https://blog.csdn.net/class_brick/article/details/79135909 概念 TF-IDF(term frequency–inverse document ...
55.TF/IDF算法
主要知识点: TF/IDF算法介绍查看es计算_source的过程及各词条的分数查看一个document是如何被匹配到的一.算法介绍 relevance score算法,简单来说 ...
TF/IDF（term frequency/inverse document frequency)
TF/IDF(term frequency/inverse document frequency) 的概念被公认为信息检索中最重要的发明. 一. TF/IDF描述单个term与特定document的相 ...
基于TF/IDF的聚类算法原理
一.TF/IDF描述单个term与特定document的相关性TF(Term Frequency): 表示一个term与某个document的相关性. 公式为这个term在document中出 ...
使用solr的函数查询,并获取tf*idf值
1. 使用函数df(field,keyword) 和idf(field,keyword). http://118.85.207.11:11100/solr/mobile/select?q={!func ...

随机推荐

VueJS样式绑定v-bind:class
class 与 style 是 HTML 元素的属性,用于设置元素的样式,我们可以用 v-bind 来设置样式属性. Vue.js v-bind 在处理 class 和 style 时, 专门增强了它 ...
vscode 编译调试c/c++的环境配置
首先看了一下别人写的文章 http://blog.csdn.net/c_duoduo/article/details/51615381 在按照上文链接博主的安装步骤进行到MINGW的安装时出现一个问题 ...
aar格式
aar包是Android Library Project的二进制公布包. 文件的扩展名是aar,并且maven包类型也应该是aar. 只是这文件本身就是一个简单的zip文件.里面有例如以下的内容: / ...
jquery垂直滚动插件一个参数用于设置速度,兼容ie6
利用外层的块级元素负外边距来滚动 1.使用 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://ww ...
ui-router $transitions 用法
1. //route redirection $transitions.onStart({to: 'manage'}, function (trans) { var params = trans.pa ...
(翻译) Container Components
react.js javascript 这篇文章翻译自Medium的一篇文章:Container Components 选择这篇文章翻译的原因是,在刚接触React的时候,这篇文章很好的指引我了解 ...
TP实例化模型的两种方式 M() D()
TP框架中实例化模型的两种方式 #如果使用自己自定义的函数,那么就用D $mode=D('model'); #如果使用是系统自带的函数,那么就是用M $model=M('model');
mfc 小程序---在系统菜单中添加菜单项
1建立一个对话框工程:在dlg类里定义一个菜单指针m_pMenu,在对话框OnInitDialog函数里添加代码: m_pMenu=GetSystemMenu(FALSE);//获取系统菜单的指针 m ...
【BZOJ4276】[ONTAK2015]Bajtman i Okrągły Robin 线段树优化建图+费用流
[BZOJ4276][ONTAK2015]Bajtman i Okrągły Robin Description 有n个强盗,其中第i个强盗会在[a[i],a[i]+1],[a[i]+1,a[i]+2 ...
cmake是什么
1 cmake是什么 cmake是一个管理软件build过程的工具.它并不会直接build处软件可执行文件本身,而是build出可以build出软件本身的全部工程文件,比如makefiles.xcod ...

ES忽略TF-IDF评分——使用constant_score

Ignoring TF/IDF

constant_score Query

ES忽略TF-IDF评分——使用constant_score的更多相关文章

随机推荐

热门专题