我很难理解使用建议 API 时得到的结果。
目标是我不希望返回这个结果。
如何重现 - 这是我的映射:
PUT /movies
{
"settings": {
"analysis": {
"filter": {
"true_false_filter": {
"type": "keep",
"keep_words": [
"true",
"false"
]
},
"french_elision": {
"type": "elision",
"articles_case": false,
"articles": [
"puisqu"
]
},
"french_stemmer": {
"type": "stemmer",
"language": "light_french"
},
"organic-dictionnary": {
"type": "synonym",
"expand": true,
"lenient": true,
"synonyms": [
"non bio"
]
},
"french_stop_filter": {
"type": "stop",
"ignore_case": true,
"stopwords": "_french_"
}
},
"analyzer": {
"lowercase_stop_analyzer": {
"tokenizer": "lowercase",
"filter": [
"french_stop_filter"
]
},
"lowercase_asciifolding": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"asciifolding",
"lowercase"
]
},
"french_analyzer_custom": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"asciifolding",
"lowercase",
"french_elision",
"french_stemmer"
]
},
"custom_organic_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"asciifolding",
"lowercase",
"french_elision",
"organic-dictionnary",
"true_false_filter",
"unique"
]
}
}
}
},
"mappings": {
"properties": {
"attr": {
"type": "text",
"analyzer": "french_analyzer_custom"
},
"brand_name": {
"type": "keyword"
},
"brand_name_suggest": {
"type": "completion",
"analyzer": "lowercase_stop_analyzer",
"search_analyzer": "lowercase_asciifolding",
"preserve_separators": false,
"preserve_position_increments": false,
"max_input_length": 50
}
}
}
}
然后我将一个文档放入索引中:
POST /movies/_doc/1001
{
"brand_name": "A LE MOUTON HUILE D'OLIVE",
"brand_name_suggest": [
"A LE MOUTON HUILE D'OLIVE"
]
}
然后是我的搜索:
GET movies/_search
{
"explain": true,
"suggest": {
"completer": {
"text": "amo",
"completion": {
"field": "brand_name_suggest",
"size": 20,
"skip_duplicates": true
}
}
}
}
我的问题:为什么在搜索“amo”时找到此文档?
如何防止退货?
提前致谢
由于
brand_name_suggest
使用 lowercase_stop_analyzer
删除法语停用词,因此 A LE MOUTON HUILE D'OLIVE
将被分析为 a, mouton, huile, olive
,即 LE
被删除。
因此,在搜索时,当您输入
amo
时,它会匹配前两个标记,这就是您获得此文档的原因。如果您想防止这种情况发生,您需要从索引时间分析器中删除 french_stop_filter
。
除了以后可能会困扰您的另一个问题是您的搜索分析器
lowercase_asciifolding
会进行asciifolding,但您的索引时间分析器不会,因此如果您使用重音索引单词,您可能也无法在搜索时找到它们。