在 Elasticsearch 8 中,我有以下索引:
{
"settings": {
"analysis": {
"analyzer": {
"default": {
"type": "whitespace"
},
"default_search": {
"type": "whitespace"
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"term_vector": "with_positions"
}
}
}
}
还有这个查询
{
"query": {
"simple_query_string": {
"query": "banana"
}
},
"track_scores": true,
"sort": {
"_script": {
"type": "number",
"order": "desc",
"script": {
"source": " >>> HOW TO GET THE POSITION OF banana IN THE title FIELD? <<< ",
"lang": "painless"
}
}
}
}
我找到了非常古老的答案,例如:
_index['title'].get('banana',_POSITIONS);
cannot resolve symbol [_index]
我需要这个,因为我希望查询在标题字段中出现较早的文档获得更高的分数。
我探索过没有工具可以在 Elasticsearch 中获取术语位置(除了分析谓词上下文)
所以我的解决方案是制作一个并行脚本分词器
样本文件
PUT /term_position_score/_bulk
{"create":{"_id":1}}
{"text": "apple jackfruit banana"}
{"create":{"_id":2}}
{"text": "apple banana apple"}
{"create":{"_id":3}}
{"text": "banana apple apple"}
{"create":{"_id":4}}
{"text": "nobananas apple apple"}
使用
function_score
和脚本进行查询
GET /term_position_score/_search?filter_path=hits.hits
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"script_score": {
"script": {
"source": """
int getItemPosition(def array, def item) {
int arrayLength = array.length;
for (int i = 0; i < arrayLength; i++) {
if (item == array[i]) {
return i;
}
}
return -1;
}
String term = params['query'];
String[] terms = /\s/.split(params['_source']['text']);
int termPosition = getItemPosition(terms, term);
double updatedScore = _score * (termPosition + 1);
return updatedScore;
""",
"params": {
"query": "banana"
}
}
}
}
}
}
回应
{
"hits" : {
"hits" : [
{
"_index" : "term_position_score",
"_type" : "_doc",
"_id" : "1",
"_score" : 3.0,
"_source" : {
"text" : "apple jackfruit banana"
}
},
{
"_index" : "term_position_score",
"_type" : "_doc",
"_id" : "2",
"_score" : 2.0,
"_source" : {
"text" : "apple banana apple"
}
},
{
"_index" : "term_position_score",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"text" : "banana apple apple"
}
},
{
"_index" : "term_position_score",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.0,
"_source" : {
"text" : "nobananas apple apple"
}
}
]
}
}
限制
/\s/
正则表达式分割器。如果您需要另一个标记生成器,则必须更改正则表达式params
中定义在此用例中,您可以使用搜索模板