Field-length norm

How long is the field? The shorter the field, the higher the weight. If a term appears in a short field, such as a title field, it is more likely that the content of that field is about the term than if the same term appears in a much bigger body field. The field length norm is calculated as follows:

norm(d) = 1 / √numTerms 

The field-length norm (norm) is the inverse square root of the number of terms in the field.

While the field-length norm is important for full-text search, many other fields don’t need norms. Norms consume approximately 1 byte per string field per document in the index, whether or not a document contains the field. Exact-value not_analyzed string fields have norms disabled by default, but you can use the field mapping to disable norms on analyzed fields as well:

PUT /my_index
{
"mappings": {
"doc": {
"properties": {
"text": {
"type": "string",
"norms": { "enabled": false }
}
}
}
}
}

This field will not take the field-length norm into account. A long field and a short field will be scored as if they were the same length.

For use cases such as logging, norms are not useful. All you care about is whether a field contains a particular error code or a particular browser identifier. The length of the field does not affect the outcome. Disabling norms can save a significant amount of memory.

Putting it together

These three factors—term frequency, inverse document frequency, and field-length norm—are calculated and stored at index time. Together, they are used to calculate the weight of a single term in a particular document.

When we refer to documents in the preceding formulae, we are actually talking about a field within a document. Each field has its own inverted index and thus, for TF/IDF purposes, the value of the field is the value of the document.

When we run a simple term query with explain set to true (see Understanding the Score), you will see that the only factors involved in calculating the score are the ones explained in the preceding sections:

PUT /my_index/doc/1
{ "text" : "quick brown fox" } GET /my_index/doc/_search?explain
{
"query": {
"term": {
"text": "fox"
}
}
}

The (abbreviated) explanation from the preceding request is as follows:

weight(text:fox in 0) [PerFieldSimilarity]:  0.15342641 

result of:
fieldWeight in 0 0.15342641
product of:
tf(freq=1.0), with freq of 1: 1.0

        idf(docFreq=1, maxDocs=1):           0.30685282 

        fieldNorm(doc=0):                    0.5 

The final score for term fox in field text in the document with internal Lucene doc ID 0.

The term fox appears once in the text field in this document.

The inverse document frequency of fox in the text field in all documents in this index.

The field-length normalization factor for this field.

Of course, queries usually consist of more than one term, so we need a way of combining the weights of multiple terms. For this, we turn to the vector space model.

 

 

ES搜索排序,文档相关度评分介绍——Field-length norm的更多相关文章

  1. ES搜索排序,文档相关度评分介绍——Vector Space Model

    Vector Space Model The vector space model provides a way of comparing a multiterm query against a do ...

  2. ES搜索排序,文档相关度评分介绍——TF-IDF—term frequency, inverse document frequency, and field-length norm—are calculated and stored at index time.

    Theory Behind Relevance Scoring Lucene (and thus Elasticsearch) uses the Boolean model to find match ...

  3. ES 文档与索引介绍

    在之前的文章中,介绍了 ES 整体的架构和内容,这篇主要针对 ES 最小的存储单位 - 文档以及由文档组成的索引进行详细介绍. 会涉及到如下的内容: 文档的 CURD 操作. Dynamic Mapp ...

  4. ES-PHP向ES批量添加文档报No alive nodes found in your cluster

    ES-PHP向ES批量添加文档报No alive nodes found in your cluster 2016年12月14日 12:31:40 阅读数:2668 参考文章phpcurl 请求Chu ...

  5. atitit.vod search doc.doc 点播系统搜索功能设计文档

    atitit.vod search doc.doc 点播系统搜索功能设计文档 按键的enter事件1 Left rig事件1 Up down事件2 key_events.key_search = fu ...

  6. 【ElasticSearch】:索引Index、文档Document、字段Field

    因为从ElasticSearch6.X开始,官方准备废弃Type了.对应数据库,对ElasticSearch的理解如下: ElasticSearch 索引Index 文档Document 字段Fiel ...

  7. es之对文档进行更新操作

    5.7.1:更新整个文档 ES中并不存在所谓的更新操作,而是用新文档替换旧文档: 在内部,Elasticsearch已经标记旧文档为删除并添加了一个完整的新文档并建立索引.旧版本文档不会立即消失 ,但 ...

  8. es搜索排序不正确

    沿用该文章里的数据https://www.cnblogs.com/MRLL/p/12691763.html 查询时发现,一模一样的name,但是相关度不一样 GET /z_test/doc/_sear ...

  9. MongoDB中的映射,限制记录和记录拼排序 文档的插入查询更新删除操作

    映射 在 MongoDB 中,映射(Projection)指的是只选择文档中的必要数据,而非全部数据.如果文档有 5 个字段,而你只需要显示 3 个,则只需选择 3 个字段即可. find() 方法 ...

随机推荐

  1. vue v-model使用说明

    1.概述 v-model 会忽略所有表单元素的 value.checked.selected 特性的初始值而总是将 Vue 实例的数据作为数据来源.你应该通过 JavaScript 在组件的 data ...

  2. 学写jQuery插件开发方法

    jQuery如此流行,各式各样的jQuery插件也是满天飞.你有没有想过把自己的一些常用的JS功能也写成jQuery插件呢?如果你的答案是肯定的,那么来吧!和我一起学写jQuery插件吧!   很多公 ...

  3. SAS学习经验总结分享:篇一—数据的读取

    第一篇:BASE SAS分为数据步的作用及生成数据集的方式 我是学经济相关专业毕业的,从事数据分析工作近一年,之前一直在用EXCEL,自认为EXCEL掌握的还不错. 今年5月份听说了SAS,便开始学习 ...

  4. 命令行编译sass

    一.安装ruby1.需要的软件设备: 2.安装过程:点击上图“应用程序”安装即可,注意安装过程中其中三项都需要打上勾.如若没有三项都打上勾则需要修改环境变量中的path路径后添加一个分号. 3.打开c ...

  5. Android 开源项目精选

    0x00  leakcanary [内存泄漏检测] Leakcanary : A memory leak detection library for Android and Java. 良心企业Squ ...

  6. android开发系列之ContentObserver

    在这篇博客里面我想要分享一下自己最近在项目里面遇到一个比较好的数据同步解决方案,首先让我们先来看看该方案的应用场景:我们在客户端本地利用数据库缓存了一些数据,当我们检测到数据库里面的数据发生变化的时候 ...

  7. 下一代Apache Hadoop MapReduce框架的架构

    背景 随着集群规模和负载增加,MapReduce JobTracker在内存消耗,线程模型和扩展性/可靠性/性能方面暴露出了缺点,为此需要对它进行大整修. 需求 当我们对Hadoop MapReduc ...

  8. Linux内核源码分析方法_转

    Linux内核源码分析方法 转自:http://www.cnblogs.com/fanzhidongyzby/archive/2013/03/20/2970624.html 一.内核源码之我见 Lin ...

  9. 高特权级代码段转向低特权级代码段(利用 ret(retf) 指令实现 jmp from ring0 to ring3)

    [0]写在前面 0.1)本代码旨在演示 从 ring0 转移到 ring3(即,从高特权级 转移到 低特权级) 0.2)本文 只对 与 门相关的 代码进行简要注释,言简意赅: 0.3)文末的个人总结是 ...

  10. Java知识点梳理——常用方法总结

    1.查找字符串最后一次出现的位置 String str = "my name is zzw"; int lastIndex = str.lastIndexOf("zzw& ...