Elasticsearch-按多个数组中数组匹配的分数排序

问题描述 投票:2回答:1

索引文件

{
  "book_id":"book01",
  "pages":[
    { "page_id":1, "words":["1", "2", "xx"] }
    { "page_id":2, "words":["4", "5", "xx"] }
    { "page_id":3, "words":["7", "8", "xx"] }
  ]
}
{
  "book_id":"book02",
  "pages":[
    { "page_id":1, "words":["1", "xx", "xx"] }
    { "page_id":2, "words":["4", "xx", "xx"] }
    { "page_id":3, "words":["7", "xx", "xx"] }
  ]
}

输入数据

{
  "book_id":"book_new",
  "pages":[
    { "page_id":1, "words":["1", "2", "3"] }
    { "page_id":2, "words":["4", "5", "6"] }
    { "page_id":3, "words":["xx", "xx", "xx"] }
  ]
}

我有很多书有多页。每页都有一个单词列表。我想搜索类似页面数超过阈值的图书。

阈值

  1. min_word_match_score:2(两页之间words匹配的最低分数)] >>
  2. min_page_match_score:2(两本书之间的similar pages的最小数量)
  3. 关键术语

  1. 相似页面:两个页面至少有min_word_match_score个相同的单词
  2. 相似书:两本书至少具有min_page_match_score个相似页
  3. 预期结果

基于指定的阈值,正确的回报应仅为book01,因为

  1. book01-1和book_new-1的得分为2(> = min_word_match_score,totalScore ++)
  2. book01-2和book_new-2的得分为2(> = min_word_match_score,totalScore ++)
  3. book01和book_new的总得分为2(总得分> = min_page_match_score)
  4. 搜索查询不正确(无效)

"bool" : {
   "should" : [
     {
        "match" : { "book_pages.visual_words" : {"query" : "1", "operator" : "OR"} },
        "match" : { "book_pages.visual_words" : {"query" : "2", "operator" : "OR"} },
        "match" : { "book_pages.visual_words" : {"query" : "3", "operator" : "OR"} }
     }
   ],
   "minimum_should_match" : 2
   "adjust_pure_negative" : true,
   "boost" : 1.0
 }
}

我首先尝试制作一部分,如果页面查询匹配,但不是逐数组搜索,而是针对所有页面的单词进行搜索。而且我不确定如何管理两个不同的分数-单词匹配得分和页面匹配得分。

我应该深入了解innerHit吗?请帮忙!

带索引的文档{“ book_id”:“ book01”,“ pages”:[{“ page_id”:1,“ words”:[“ 1”,“ 2”,“ xx”]} {“ page_id”:2, “ words”:[“ 4”,“ 5”,“ xx”]} {“ page_id”:3,“ words”:[“ 7”,“ 8”,“ xx”]}]} {...] >

amazon-web-services elasticsearch elasticsearch-5 elasticsearch-aggregation amazon-elasticsearch
1个回答
0
投票

不是最好的,而是我的两分钱!

我不认为Elasticsearch为此用例提供了精确的解决方案。执行所需操作的最接近方法是利用More Like This查询。

© www.soinside.com 2019 - 2024. All rights reserved.