Elasticsearch搜索之most

顾名思义，most_field就是匹配词干的字段数越多，分数越高，也可设置权重boost。

下面是简易公式（详细评分算法请参考：http://m.blog.csdn.net/article/details?id=50623948）：

score=match_field1_score*boost+match_field2_score*boost+...match_fieldN_score*boost。

在很多情况下，这种搜索很有效，但存在一个弱点，就是当文档中的字段冗余信息过多，将会影响那些文档比较精炼，而且意思较为全面的分值，

不能使用operator和minimum_should_match来减少相关性低的doc的长尾问题，简单的来说就是按term匹配的个数取胜

例下：

搜索关键字“北京东路”，先下面的分词结果，我们知道它的词干为“北京”与“东路”：

curl   'localhost:9200/fullbiz_index/_analyze?analyzer=ik_smart&pretty=true' -d '{"text":"北京东路"}'

{

   "tokens" : [

      {

         "token" : "text",

         "start_offset" : 2,

         "end_offset" : 6,

         "type" : "ENGLISH",

         "position" : 1

      },

      {

         "token" : "北京",

         "start_offset" : 9,

         "end_offset" : 11,

         "type" : "CN_WORD",

         "position" : 2

      },

      {

         "token" : "东路",

         "start_offset" : 11,

         "end_offset" : 13,

         "type" : "CN_WORD",

         "position" : 3

      }

   ]

}

curl  'localhost:9200/fullbiz1/fullbizinfo/_search?pretty' -d '

{

  "from" : 0,

  "size" : 20,

  "query" : {

    "multi_match" : {

      "query" : "北京东路",

      "fields" : [ "title", "highlight", "tags", "address", "businessDistrict", "cuisineStyle" ],

      "type" : "most_fields",

	  "minimum_should_match" : "70%",//这是指最少匹配词干占比，例如三个词干，只要配置了二个以上就算match，66.6%会啥入70%。二个词干或以下，只要匹配了一个就行。所以“北京东路”只要匹配了“北京”或“东路”都可得分

      "analyzer" : "ik_smart" //ik有二种模式，一种是ik_max_word（最细词干法)，ik_smart(最粗词干法)，这里我们配置第二种，以更接近于业务结果。

    }

  },

  "post_filter" : {

    "bool" : {

      "must" : [ {

        "term" : {

          "status" : 0

        }

      }, {

        "term" : {

          "hostDisplay" : 1

        }

      }, {

        "term" : {

          "cityId" : 2

        }

      }, {

        "term" : {

          "productType" : 3

        }

      } ]

    }

  }

}'

 

    "hits" : [ {

      "_index" : "fullbiz1",

      "_type" : "fullbizinfo",

      "_id" : "324239",

      "_score" : 0.33371,

      "_source":{"boost":1,"productId":24239,"productType":3,"subType":2,"title":"城市公牛(南京东路店)","viceTitle":"城市公牛(南京东路店)","personMax":"-1","personMin":"-1","picUrl":"meal/2016/08/11/1470892987880.jpg","recommand":-1,"needReserveTime":-1,"priceStr":"-1","price":"-1","originalPrice":"-1","leadingMinutes":-1,"tags":null,"status":0,"isFree":-1,"duration":"10:00:00-22:30:00","onlineTime":1470280723,"updateTime":1486951326,"applyExpiredTime":0,"beginTime":0,"endTime":0,"isCourse":-1,"isTour":-1,"supportParty":0,"interestedNum":0,"cityId":2,"cityName":"上海","categoryId":"0","categoryName":"","categoryIconUrl":"","businessDistrict":"南京东路","businessDistrictId":73,"hostId":24239,"contactNumber":"13764741956","hostName":"城市公牛(南京东路店)","address":"南京东路300号L221-222室(河南中路口)","hostDisplay":1,"hostPicUrl":"meal/2016/08/11/1470892987880.jpg","hostSharePicUrl":"meal/2016/08/11/1470892987880.jpg","hostLatitude":"31.243455970586","hostLongitude":"121.49099099941","location":{"lat":"31.243455970586","lon":"121.49099099941"},"hostLatitudeGD":"31.237701","hostLongitudeGD":"121.484409","locationGD":{"lat":"31.237701","lon":"121.484409"},"headPics":"","catalogIds":null,"cuisineStyleId":41,"cuisineStyle":"西餐","hideMask":0,"referenceAgeMin":0,"referenceAgeMax":0,"userLimit":-1,"todayReservable":1,"orderNums":3,"pvConversionRate":"-1","interestNums":0,"hotPoints":0,"hostAvgPrice":16000,"hostProductLabelIds":",1,2,4,5,7,8,9,12,13,14,15,","shopPay":0,"hostVipEquities":"0","isHostSale":0,"highlight":"[\"2010年世博会加拿大馆特约餐厅\",\"加拿大简约西部乡村风格小酒馆餐厅\",\"家庭式的用餐氛围 80%均是外国食客\"]","isSeatBook":1,"lastUTCTimestamp":"2017-02-13T10:02:06.000+08:00"}

    }, {

      "_index" : "fullbiz1",

      "_type" : "fullbizinfo",

      "_id" : "392659",

      "_score" : 0.31962717,

      "_source":{"boost":1,"productId":92659,"productType":3,"subType":4,"title":"THAIBEAUTY美容连锁机构(南京东路店)","viceTitle":"THAIBEAUTY美容连锁机构(南京东路店)","personMax":"-1","personMin":"-1","picUrl":"hostInfo/2017/01/11/1484121279773528.jpg","recommand":-1,"needReserveTime":-1,"priceStr":"-1","price":"-1","originalPrice":"-1","leadingMinutes":-1,"tags":"","status":0,"isFree":-1,"duration":null,"onlineTime":1484121281,"updateTime":1484202471,"applyExpiredTime":0,"beginTime":0,"endTime":0,"isCourse":-1,"isTour":-1,"supportParty":0,"interestedNum":0,"cityId":2,"cityName":"上海","categoryId":"0","categoryName":"","categoryIconUrl":"","businessDistrict":"南京东路","businessDistrictId":73,"hostId":92659,"contactNumber":"021-63511876","hostName":"THAIBEAUTY美容连锁机构(南京东路店)","address":"南京东路580号6楼","hostDisplay":1,"hostPicUrl":"hostInfo/2017/01/11/1484121279773528.jpg","hostSharePicUrl":"hostInfo/2017/01/11/1484121279773528.jpg","hostLatitude":"31.241721400027","hostLongitude":"121.48585125776","location":{"lat":"31.241721400027","lon":"121.48585125776"},"hostLatitudeGD":"31.235887","hostLongitudeGD":"121.479289","locationGD":{"lat":"31.235887","lon":"121.479289"},"headPics":"","catalogIds":null,"cuisineStyleId":0,"cuisineStyle":"美容/SPA","hideMask":-1,"referenceAgeMin":0,"referenceAgeMax":0,"userLimit":-1,"todayReservable":0,"orderNums":0,"pvConversionRate":"-1","interestNums":0,"hotPoints":0,"hostAvgPrice":284500,"hostProductLabelIds":",60,","shopPay":0,"hostVipEquities":"0","isHostSale":0,"highlight":"[\"高端局部瘦身\",\"环境舒适 按摩师手法专业\",\"使用高品质产品\"]","isSeatBook":1,"lastUTCTimestamp":"2017-01-12T14:27:51.000+08:00"}

    }, {

      "_index" : "fullbiz1",

      "_type" : "fullbizinfo",

      "_id" : "364804",

      "_score" : 0.31002828,

      "_source":{"boost":1,"productId":64804,"productType":3,"subType":2,"title":"斗牛士(南京东路店)","viceTitle":"斗牛士(南京东路店)","personMax":"-1","personMin":"-1","picUrl":"hostInfo/2016/12/26/1482718008927949.png","recommand":-1,"needReserveTime":-1,"priceStr":"-1","price":"-1","originalPrice":"-1","leadingMinutes":-1,"tags":"","status":0,"isFree":-1,"duration":null,"onlineTime":1482718014,"updateTime":1486569730,"applyExpiredTime":0,"beginTime":0,"endTime":0,"isCourse":-1,"isTour":-1,"supportParty":0,"interestedNum":0,"cityId":2,"cityName":"上海","categoryId":"0","categoryName":"","categoryIconUrl":"","businessDistrict":"南京东路","businessDistrictId":73,"hostId":64804,"contactNumber":"021-33317136","hostName":"斗牛士(南京东路店)","address":"南京东路353号悦荟广场（原353店）7F","hostDisplay":1,"hostPicUrl":"hostInfo/2016/12/26/1482718008927949.png","hostSharePicUrl":"hostInfo/2016/12/26/1482718008927949.png","hostLatitude":"31.24210523683","hostLongitude":"121.49020262932","location":{"lat":"31.24210523683","lon":"121.49020262932"},"hostLatitudeGD":"31.236339","hostLongitudeGD":"121.483623","locationGD":{"lat":"31.236339","lon":"121.483623"},"headPics":"","catalogIds":null,"cuisineStyleId":41,"cuisineStyle":"西餐","hideMask":-1,"referenceAgeMin":0,"referenceAgeMax":0,"userLimit":-1,"todayReservable":0,"orderNums":0,"pvConversionRate":"-1","interestNums":0,"hotPoints":0,"hostAvgPrice":12200,"hostProductLabelIds":",1,","shopPay":0,"hostVipEquities":"0","isHostSale":0,"highlight":"[\"精选进口澳洲安格斯牛排\",\"严控0度低温 保证牛肉鲜嫩\",\"进口原切牛排保证牛肉口感与外观\"]","isSeatBook":1,"lastUTCTimestamp":"2017-02-09T00:02:10.000+08:00"}

.....

      "_index" : "fullbiz1",

      "_type" : "fullbizinfo",

      "_id" : "353771",

      "_score" : 0.7784657,

      "_source":{"boost":1,"productId":53771,"productType":3,"subType":2,"title":"九储堂创意中国菜(外滩店)","viceTitle":"九储堂创意中国菜(外滩店)","personMax":"-1","personMin":"-1","picUrl":"hostInfo/2016/12/26/1482744127546461.jpg","recommand":-1,"needReserveTime":-1,"priceStr":"-1","price":"-1","originalPrice":"-1","leadingMinutes":-1,"tags":"","status":0,"isFree":-1,"duration":null,"onlineTime":1482744132,"updateTime":1486738928,"applyExpiredTime":0,"beginTime":0,"endTime":0,"isCourse":-1,"isTour":-1,"supportParty":0,"interestedNum":0,"cityId":2,"cityName":"上海","categoryId":"0","categoryName":"","categoryIconUrl":"","businessDistrict":"外滩","businessDistrictId":71,"hostId":53771,"contactNumber":"021-63308900","hostName":"九储堂创意中国菜(外滩店)","address":"北京东路398号新协通国际大酒店18楼","hostDisplay":1,"hostPicUrl":"hostInfo/2016/12/26/1482744127546461.jpg","hostSharePicUrl":"hostInfo/2016/12/26/1482744127546461.jpg","hostLatitude":"31.246247363994","hostLongitude":"121.48894308136","location":{"lat":"31.246247363994","lon":"121.48894308136"},"hostLatitudeGD":"31.240463","hostLongitudeGD":"121.48237","locationGD":{"lat":"31.240463","lon":"121.48237"},"headPics":"","catalogIds":null,"cuisineStyleId":25,"cuisineStyle":"创意菜","hideMask":-1,"referenceAgeMin":0,"referenceAgeMax":0,"userLimit":-1,"todayReservable":0,"orderNums":0,"pvConversionRate":"-1","interestNums":0,"hotPoints":0,"hostAvgPrice":19100,"hostProductLabelIds":",1,","shopPay":0,"hostVipEquities":"0","isHostSale":0,"highlight":"[\"新加坡同乐餐饮总厨胡于保先生主理\",\"大厅可容纳150人的宴会 包房5间\",\"靠窗座位亦可欣赏浦江两岸美景\"]","isSeatBook":1,"lastUTCTimestamp":"2017-02-10T23:02:08.000+08:00"}

而结果中有包含“北京东路”完整内容的文档却排在后面，这不科学，为什么会是这个结果，下面我们经过explain来看看评分计算：

curl 'localhost:9200/fullbiz1/fullbizinfo/_search?pretty&explain' ....后面内容省略，和上面的请求是一样，只加了一个explain，以及size限制第一条，因为信息太多，只分析具体一个文档,下面我们直接看评分部分：

      "_explanation" : {

        "value" : 0.33371,

        "description" : "product of:",

        "details" : [ {

          "value" : 0.66742,

          "description" : "sum of:",

          "details" : [ {

            "value" : 0.28481156,

            "description" : "product of:",

            "details" : [ {

              "value" : 0.5696231,

              "description" : "sum of:",

              "details" : [ {

                "value" : 0.5696231,

                "description" : "weight(title:东路 in 7321) [PerFieldSimilarity], result of:",

                "details" : [ {

                  "value" : 0.5696231,

                  "description" : "score(doc=7321,freq=1.0), product of:",

                  "details" : [ {

                    "value" : 0.25448462,

                    "description" : "queryWeight, product of:",

                    "details" : [ {

                      "value" : 7.1626873,

                      "description" : "idf(docFreq=244, maxDocs=116302)"

                    }, {

                      "value" : 0.03552921,

                      "description" : "queryNorm"

                    } ]

                  }, {

                    "value" : 2.23834,

                    "description" : "fieldWeight in 7321, product of:",

                    "details" : [ {

                      "value" : 1.0,

                      "description" : "tf(freq=1.0), with freq of:",

                      "details" : [ {

                        "value" : 1.0,

                        "description" : "termFreq=1.0"

                      } ]

                    }, {

                      "value" : 7.1626873,

                      "description" : "idf(docFreq=244, maxDocs=116302)"

                    }, {

                      "value" : 0.3125,

                      "description" : "fieldNorm(doc=7321)"

                    } ]

                  } ]

                } ]

              } ]

            }, {

              "value" : 0.5,

              "description" : "coord(1/2)"

            } ]

          }, {

            "value" : 0.067192085,

            "description" : "product of:",

            "details" : [ {

              "value" : 0.13438417,

              "description" : "sum of:",

              "details" : [ {

                "value" : 0.13438417,

                "description" : "weight(address:东路 in 7321) [PerFieldSimilarity], result of:",

                "details" : [ {

                  "value" : 0.13438417,

                  "description" : "score(doc=7321,freq=1.0), product of:",

                  "details" : [ {

                    "value" : 0.1477382,

                    "description" : "queryWeight, product of:",

                    "details" : [ {

                      "value" : 4.158218,

                      "description" : "idf(docFreq=4942, maxDocs=116302)"

                    }, {

                      "value" : 0.03552921,

                      "description" : "queryNorm"

                    } ]

                  }, {

                    "value" : 0.90961015,

                    "description" : "fieldWeight in 7321, product of:",

                    "details" : [ {

                      "value" : 1.0,

                      "description" : "tf(freq=1.0), with freq of:",

                      "details" : [ {

                        "value" : 1.0,

                        "description" : "termFreq=1.0"

                      } ]

                    }, {

                      "value" : 4.158218,

                      "description" : "idf(docFreq=4942, maxDocs=116302)"

                    }, {

                      "value" : 0.21875,

                      "description" : "fieldNorm(doc=7321)"

                    } ]

                  } ]

                } ]

              } ]

            }, {

              "value" : 0.5,

              "description" : "coord(1/2)"

            } ]

          }, {

            "value" : 0.3154164,

            "description" : "product of:",

            "details" : [ {

              "value" : 0.6308328,

              "description" : "sum of:",

              "details" : [ {

                "value" : 0.6308328,

                "description" : "weight(businessDistrict:东路 in 7321) [PerFieldSimilarity], result of:",

                "details" : [ {

                  "value" : 0.6308328,

                  "description" : "score(doc=7321,freq=1.0), product of:",

                  "details" : [ {

                    "value" : 0.22633977,

                    "description" : "queryWeight, product of:",

                    "details" : [ {

                      "value" : 6.3705263,

                      "description" : "idf(docFreq=540, maxDocs=116302)"

                    }, {

                      "value" : 0.03552921,

                      "description" : "queryNorm"

                    } ]

                  }, {

                    "value" : 2.7871053,

                    "description" : "fieldWeight in 7321, product of:",

                    "details" : [ {

                      "value" : 1.0,

                      "description" : "tf(freq=1.0), with freq of:",

                      "details" : [ {

                        "value" : 1.0,

                        "description" : "termFreq=1.0"

                      } ]

                    }, {

                      "value" : 6.3705263,

                      "description" : "idf(docFreq=540, maxDocs=116302)"

                    }, {

                      "value" : 0.4375,

                      "description" : "fieldNorm(doc=7321)"

                    } ]

                  } ]

                } ]

              } ]

            }, {

              "value" : 0.5,

              "description" : "coord(1/2)"

            } ]

          } ]

        }, {

          "value" : 0.5,

          "description" : "coord(3/6)"

        } ]

      }

    } ]

  }

}

从上面分析结果来看，排在前面的这些包含“南京东路”的文档，不是因为匹配度高，而是因为匹配的字段多，所以得分大于下面那个只包含一个“北京东路”字段的文档。

总结：most_field适应于那种字段之间信息差异较大的搜索匹配，像上面那种title中有“东路”，商圈、地址中也有“东路“，冗余信息较多。

Elasticsearch搜索之most_fields分析的更多相关文章

Elasticsearch搜索之cross_fields分析
cross_fields类型采用了一种以词条为中心(Term-centric)的方法,这种方法和best_fields及most_fields采用的以字段为中心(Field-centric)的方法有很 ...
Elasticsearch搜索之best_fields分析
顾名思义,best_field就是获取最佳匹配的field,另个可以通过tie_breaker来控制其他field的得分,boost可以设置权重(默认都为1). 下面从宏观上来讲的简单公式: scor ...
一次 ElasticSearch 搜索优化
一次 ElasticSearch 搜索优化 1. 环境 ES6.3.2,索引名称 user_v1,5个主分片,每个分片一个副本.分片基本都在11GB左右,GET _cat/shards/user 一共 ...
ElasticSearch搜索介绍四
ElasticSearch搜索最基础的搜索: curl -XGET http://localhost:9200/_search 返回的结果为: { "took": 2, &quo ...
elasticsearch indices.recovery 流程分析（索引的_open操作也会触发recovery）——主分片recovery主要是从translog里恢复之前未写完的index，副分片recovery主要是从主分片copy segment和translog来进行恢复
摘自:https://www.easyice.cn/archives/231 elasticsearch indices.recovery 流程分析与速度优化目录 [隐藏] 主分片恢复流程副本分片 ...
ElasticSearch 线程池类型分析之 ExecutorScalingQueue
ElasticSearch 线程池类型分析之 ExecutorScalingQueue 在ElasticSearch 线程池类型分析之SizeBlockingQueue这篇文章中分析了ES的fixed ...
ElasticSearch 线程池类型分析之 ResizableBlockingQueue
ElasticSearch 线程池类型分析之 ResizableBlockingQueue 在上一篇文章 ElasticSearch 线程池类型分析之 ExecutorScalingQueue的末尾, ...
Elasticsearch搜索资料汇总
Elasticsearch 简介 Elasticsearch(ES)是一个基于Lucene 构建的开源分布式搜索分析引擎,可以近实时的索引.检索数据.具备高可靠.易使用.社区活跃等特点,在全文检索.日 ...
看完这篇还不会 Elasticsearch 搜索,那我就哭了！
本文主要介绍 ElasticSearch 搜索相关的知识,首先会介绍下 URI Search 和 Request Body Search,同时也会学习什么是搜索的相关性,如何衡量相关性. Search ...

随机推荐

hdoj 1231 最大连续子列和
最大连续子序列 Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 65536/32768 K (Java/Others)Total Sub ...
golang中的rpc包用法
RPC,即 Remote Procedure Call(远程过程调用),说得通俗一点就是:调用远程计算机上的服务,就像调用本地服务一样. 我所在公司的项目是采用基于Restful的微服务架构,随着微服 ...
jquery事件与绑定事件
1.首先,我们来看一下经常使用的添加事件的方式: <input type="button" id="btn" value="click me!& ...
Macaca 自动化框架 [Python 系列]
介绍 Macaca是一套完整的自动化测试解决方案,基于node.js开发.由阿里巴巴公司开源: 地址:http://macacajs.github.io/macaca/ 特点: 同时支持PC端和移动端 ...
1441: Min
1441: Min Time Limit: 5 Sec Memory Limit: 64 MBSubmit: 320 Solved: 213[Submit][Status][Discuss] De ...
2272: [Usaco2011 Feb]Cowlphabet 奶牛文字
2272: [Usaco2011 Feb]Cowlphabet 奶牛文字 Time Limit: 10 Sec Memory Limit: 128 MBSubmit: 138 Solved: 97 ...
JAVA面试题和答案(二)
本文我们将要讨论Java面试中的各种不同类型的面试题,它们可以让雇主测试应聘者的Java和通用的面向对象编程的能力.下面的章节分为上下两篇,第一篇将要讨论面向对象编程和它的特点,关于Java和它的功能 ...
10分钟精通SharePoint - SharePoint拓扑结构
SharePoint服务器角色:前端,应用程序和数据库服务器应用程序服务:搜索.Office文档.User Profile和App等应用服务器数据库类型:内容数据库.应用程序数据库和配置数据库规 ...
mybatis只能模糊查询英文不能查询中文
解决方法:修改配置文件,最简单的完美修改方法,修改mysql的my.cnf文件中的字符集键值(注意配置的字段细节): 1.在[client]字段里加入default-character-set=utf ...
构建微服务（Building Microservices）-PDF 文档
闲时翻译了几篇基于Spring Cloud.Netflix OSS 构建微服务的英文文章,为方便分享交流,整理为PDF文档. PDF 文档目录: 目录一.微服务操作模型... 3 1. 前提 ...

Elasticsearch搜索之most_fields分析

Elasticsearch搜索之most_fields分析的更多相关文章

随机推荐

热门专题