如何根据Elasticsearch中的rescore函数选择顶级桶

Question

考虑Elasticsearch 5.6的以下查询：

{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "rescore": [
    {
      "window_size": 10000,
      "query": {
        "rescore_query": {
          "function_score": {
            "boost_mode": "replace",
            "script_score": {
              "script": {
                "source": "doc['topic_score'].value"
              }
            }
          }
        },
        "query_weight": 0,
        "rescore_query_weight": 1
      }
    }
  ],
  "aggs": {
    "distinct": {
      "terms": {
        "field": "identical_id",
        "order": {
          "top_score": "desc"
        }
      },
      "aggs": {
        "best_unique_result": {
          "top_hits": {
            "size": 1
          }
        },
        "top_score": {
          "max": {
            "script": {
              "inline": "_score"
            }
          }
        }
      }
    }
  }
}

这是一个简化版本，其中真实查询具有更复杂的主查询，并且rescore函数更加密集。

让我解释它的目的首先，我要花一个小时的时间开发一支笔，当一支铅笔实际上解决了我的问题时，它会写入空间。我正在执行快速初始查询，然后使用更强大的功能重新调整顶级结果。从这些结果我想显示最高的不同值，即没有两个结果应该具有相同的identical_id。如果有更好的方法来做到这一点，我也会考虑一个答案。

我希望像这样的查询可以通过rescore查询对结果进行排序，将具有相同identical_id的所有结果分组，并显示每个这样的不同组的最高匹配。我还假设由于我按最大父_score排序这些术语聚合桶，它们将被排序以反映它们包含的最佳结果，这是根据原始rescore查询确定的。

实际情况是，术语桶按最大查询分数而不是rescore查询分数排序。奇怪的是，桶内的顶级命中似乎确实使用了rescore。

有没有更好的方法来实现我想要的最终结果，或者某种方式我可以修复此查询以按照我期望的方式工作？

Answer 1

来自documentation：

查询rescorer仅对查询和post_filter阶段返回的Top-K结果执行第二个查询。可以通过window_size参数控制每个分片上将检查的文档数，默认为10。

随着rescore query在post_filter阶段之后开始，我认为术语聚合桶已经固定。

我不知道如何组合rescore和聚合。对不起:(

Answer 2

我认为我对这个问题有一个非常好的解决方案，但我会让赏金继续到期，因为有人提出了更好的方法。

{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "sample": {
      "sampler": {
        "shard_size": 10000
      },
      "aggs": {
        "distinct": {
          "terms": {
            "field": "identical_id",
            "order": {
              "top_score": "desc"
            }
          },
          "aggs": {
            "best_unique_result": {
              "top_hits": {
                "size": 1,
                "sort": [
                  {
                    "_script": {
                      "type": "number",
                      "script": {
                        "source": "doc['topic_score'].value"
                      },
                      "order": "desc"
                    }
                  }
                ]
              }
            },
            "top_score": {
              "max": {
                "script": {
                  "source": "doc['topic_score'].value"
                }
              }
            }
          }
        }
      }
    }
  }
}

sampler聚合将从核心查询中获取每个分片的前N个命中，并对这些聚合运行聚合。然后在定义存储桶顺序的max聚合器中，我使用与我用来从存储桶中选择顶部命中的完全相同的脚本。现在，桶和顶部命中在相同的前N组项目上运行，并且桶将按照相同分数的最大值排序，从相同的脚本生成。不幸的是，我仍然需要运行脚本一次来订购存储桶，并且只需要在存储桶中选择一个顶部命中，并且您可以使用rescore代替顶部命中，但无论哪种方式它都必须运行两次，我发现它更快作为排序脚本然后作为rescore

如何根据Elasticsearch中的rescore函数选择顶级桶

问题描述投票：3回答：2

2个回答

最新问题

如何根据Elasticsearch中的rescore函数选择顶级桶

问题描述 投票：3回答：2

2个回答

最新问题

问题描述投票：3回答：2