ES搜索排序,文档相关度评分介绍——Field-length norm
Field-length norm
How long is the field? The shorter the field, the higher the weight. If a term appears in a short field, such as a title
field, it is more likely that the content of that field is about the term than if the same term appears in a much bigger body
field. The field length norm is calculated as follows:
norm(d) = 1 / √numTerms
|
The field-length norm ( |
While the field-length norm is important for full-text search, many other fields don’t need norms. Norms consume approximately 1 byte per string
field per document in the index, whether or not a document contains the field. Exact-value not_analyzed
string fields have norms disabled by default, but you can use the field mapping to disable norms on analyzed
fields as well:
PUT /my_index
{
"mappings": {
"doc": {
"properties": {
"text": {
"type": "string",
"norms": { "enabled": false }
}
}
}
}
}
|
This field will not take the field-length norm into account. A long field and a short field will be scored as if they were the same length. |
For use cases such as logging, norms are not useful. All you care about is whether a field contains a particular error code or a particular browser identifier. The length of the field does not affect the outcome. Disabling norms can save a significant amount of memory.
Putting it together
These three factors—term frequency, inverse document frequency, and field-length norm—are calculated and stored at index time. Together, they are used to calculate the weight of a single term in a particular document.

When we refer to documents in the preceding formulae, we are actually talking about a field within a document. Each field has its own inverted index and thus, for TF/IDF purposes, the value of the field is the value of the document.
When we run a simple term
query with explain
set to true
(see Understanding the Score), you will see that the only factors involved in calculating the score are the ones explained in the preceding sections:
PUT /my_index/doc/1
{ "text" : "quick brown fox" } GET /my_index/doc/_search?explain
{
"query": {
"term": {
"text": "fox"
}
}
}
The (abbreviated) explanation
from the preceding request is as follows:
weight(text:fox in 0) [PerFieldSimilarity]: 0.15342641
result of:
fieldWeight in 0 0.15342641
product of:
tf(freq=1.0), with freq of 1: 1.0
idf(docFreq=1, maxDocs=1): 0.30685282
fieldNorm(doc=0): 0.5
|
The final |
|
The term |
|
The inverse document frequency of |
|
The field-length normalization factor for this field. |
Of course, queries usually consist of more than one term, so we need a way of combining the weights of multiple terms. For this, we turn to the vector space model.
ES搜索排序,文档相关度评分介绍——Field-length norm的更多相关文章
- ES搜索排序,文档相关度评分介绍——Vector Space Model
Vector Space Model The vector space model provides a way of comparing a multiterm query against a do ...
- ES搜索排序,文档相关度评分介绍——TF-IDF—term frequency, inverse document frequency, and field-length norm—are calculated and stored at index time.
Theory Behind Relevance Scoring Lucene (and thus Elasticsearch) uses the Boolean model to find match ...
- ES 文档与索引介绍
在之前的文章中,介绍了 ES 整体的架构和内容,这篇主要针对 ES 最小的存储单位 - 文档以及由文档组成的索引进行详细介绍. 会涉及到如下的内容: 文档的 CURD 操作. Dynamic Mapp ...
- ES-PHP向ES批量添加文档报No alive nodes found in your cluster
ES-PHP向ES批量添加文档报No alive nodes found in your cluster 2016年12月14日 12:31:40 阅读数:2668 参考文章phpcurl 请求Chu ...
- atitit.vod search doc.doc 点播系统搜索功能设计文档
atitit.vod search doc.doc 点播系统搜索功能设计文档 按键的enter事件1 Left rig事件1 Up down事件2 key_events.key_search = fu ...
- 【ElasticSearch】:索引Index、文档Document、字段Field
因为从ElasticSearch6.X开始,官方准备废弃Type了.对应数据库,对ElasticSearch的理解如下: ElasticSearch 索引Index 文档Document 字段Fiel ...
- es之对文档进行更新操作
5.7.1:更新整个文档 ES中并不存在所谓的更新操作,而是用新文档替换旧文档: 在内部,Elasticsearch已经标记旧文档为删除并添加了一个完整的新文档并建立索引.旧版本文档不会立即消失 ,但 ...
- es搜索排序不正确
沿用该文章里的数据https://www.cnblogs.com/MRLL/p/12691763.html 查询时发现,一模一样的name,但是相关度不一样 GET /z_test/doc/_sear ...
- MongoDB中的映射,限制记录和记录拼排序 文档的插入查询更新删除操作
映射 在 MongoDB 中,映射(Projection)指的是只选择文档中的必要数据,而非全部数据.如果文档有 5 个字段,而你只需要显示 3 个,则只需选择 3 个字段即可. find() 方法 ...
随机推荐
- HDU 1006 Tick and Tick 解不等式解法
一開始思考的时候认为好难的题目,由于感觉非常多情况.不知道从何入手. 想通了就不难了. 能够转化为一个利用速度建立不等式.然后解不等式的问题. 建立速度,路程,时间的模型例如以下: /******** ...
- 【Excle数据透视表】如何移动数据透视表的位置
数据透视表创建完成了,现在需要将它移动到D5位置,如何移动呢? 解决办法 通过"移动数据透视表"功能实现数据透视表的位置移动 步骤1 单击数据透视表任意单元格→数据透视表工具→分析 ...
- android Gallery2 onPause时候,其背景界面显示黑色
改动: Src/com/android/gallery3d/app/AbstracGalleryActivity.java OnResume()函数约290行 去掉 mGLRootView.setVi ...
- 3DES
3DES是继DESeasy被破解后的DES加密升级版,它属于对称加密. 可指定24位长度的密钥.在java API中也有事实上现,代码例如以下: /** * 3DES 的Java SDK API 实现 ...
- Ubuntu安装vncserver实现图形化远程桌面
安装 apt-get update apt-get install vnc4server 开启vnc服务 vncserver 首次启动会要求设置密码,后面可以使用vncpasswd修改: 看到 New ...
- VueJS锚定
锚定函数 指令定义函数提供了几个钩子函数(可选): bind: 只调用一次,指令第一次绑定到元素时调用,用这个钩子函数可以定义一个在绑定时执行一次的初始化动作. inserted: 被绑定元素插入父节 ...
- 基于Repository模式设计项目架构—你可以参考的项目架构设计
关于Repository模式,直接百度查就可以了,其来源是<企业应用架构模式>.我们新建一个Infrastructure文件夹,这里就是基础设施部分,EF Core的上下文类以及Repos ...
- mysql 配置 安装和 root password 更改
第一步: 修改my.ini文件,替换为以下内容 (skip_grant_tables***重点) # For advice on how to change settings please see # ...
- python DOM解析XML
#conding:utf-8 # -*- coding:utf-8 -*- __author__ = 'hdfs' """ XML 解析 :DOM解析珍整个文档作为一个可 ...
- spring + jodd 实现文件上传
String tempDir = SystemUtil.getTempDir(); // 获得系统临时文件夹 String prefix = UUID.randomUUID().toString(). ...