Elasticsearch:如何获取排序脚本中的术语位置

问题描述 投票:0回答:1

在 Elasticsearch 8 中,我有以下索引:

{
    "settings": {
        "analysis": {
            "analyzer": {
                "default": {
                    "type": "whitespace"
                },
                "default_search": {
                    "type": "whitespace"
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "title": {
                "type": "text",
                "term_vector": "with_positions"
            }
        }
    }
}

还有这个查询

{
    "query": {
        "simple_query_string": {
            "query": "banana"
        }
    },
    "track_scores": true,
    "sort": {
        "_script": {
            "type": "number",
            "order": "desc",
            "script": {
                "source": " >>> HOW TO GET THE POSITION OF banana IN THE title FIELD? <<< ",
                "lang": "painless"
            }
        }
    }
}

我找到了非常古老的答案,例如:

_index['title'].get('banana',_POSITIONS);

但错误:
cannot resolve symbol [_index]

我需要这个,因为我希望查询在标题字段中出现较早的文档获得更高的分数。

elasticsearch elasticsearch-8
1个回答
0
投票

我探索过没有工具可以在 Elasticsearch 中获取术语位置(除了分析谓词上下文)

所以我的解决方案是制作一个并行脚本分词器

样本文件

PUT /term_position_score/_bulk
{"create":{"_id":1}}
{"text": "apple jackfruit banana"}
{"create":{"_id":2}}
{"text": "apple banana apple"}
{"create":{"_id":3}}
{"text": "banana apple apple"}
{"create":{"_id":4}}
{"text": "nobananas apple apple"}

使用

function_score
和脚本进行查询

GET /term_position_score/_search?filter_path=hits.hits
{
    "query": {
        "function_score": {
            "query": {
                "match_all": {}
            },
            "script_score": {
                "script": {
                    "source": """
                        int getItemPosition(def array, def item) {
                            int arrayLength = array.length;
                            for (int i = 0; i < arrayLength; i++) {
                                if (item == array[i]) {
                                    return i;
                                }
                            }
                            return -1;
                        }

                        String term = params['query'];
                        String[] terms = /\s/.split(params['_source']['text']);
                        int termPosition = getItemPosition(terms, term);
                        double updatedScore = _score * (termPosition + 1);
                        return updatedScore;
                    """,
                    "params": {
                        "query": "banana"
                    }
                }
            }
        }
    }
}

回应

{
    "hits" : {
        "hits" : [
            {
                "_index" : "term_position_score",
                "_type" : "_doc",
                "_id" : "1",
                "_score" : 3.0,
                "_source" : {
                    "text" : "apple jackfruit banana"
                }
            },
            {
                "_index" : "term_position_score",
                "_type" : "_doc",
                "_id" : "2",
                "_score" : 2.0,
                "_source" : {
                    "text" : "apple banana apple"
                }
            },
            {
                "_index" : "term_position_score",
                "_type" : "_doc",
                "_id" : "3",
                "_score" : 1.0,
                "_source" : {
                    "text" : "banana apple apple"
                }
            },
            {
                "_index" : "term_position_score",
                "_type" : "_doc",
                "_id" : "4",
                "_score" : 0.0,
                "_source" : {
                    "text" : "nobananas apple apple"
                }
            }
        ]
    }
}

限制

  • 脚本化分词器是
    /\s/
    正则表达式分割器。如果您需要另一个标记生成器,则必须更改正则表达式
  • 查询在脚本的
    params
    中定义

在此用例中,您可以使用搜索模板

© www.soinside.com 2019 - 2024. All rights reserved.