[我在其中设置了"max_ngram_diff": 50
的地方有一个Elasticsearch索引,但是以某种方式,它似乎仅适用于edge_ngram
标记生成器,而不适用于ngram
标记生成器。
我已经针对相同的URL http://localhost:9201/index-name/_analyze
发出了这两个请求:
请求1
{
"tokenizer":
{
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 20,
"token_chars": [
"letter",
"digit"
]
},
"text": "1234567890;abcdefghijklmn;"
}
请求2
{
"tokenizer": {
"type": "ngram",
"min_gram": 3,
"max_gram": 20,
"token_chars": [
"letter",
"digit"
]
},
"text": "1234567890;abcdefghijklmn;"
}
第一个请求返回预期结果:
{
"tokens": [
{
"token": "123",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 0
},
{
"token": "1234",
"start_offset": 0,
"end_offset": 4,
"type": "word",
"position": 1
},
{
"token": "12345",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 2
},
{
"token": "123456",
"start_offset": 0,
"end_offset": 6,
"type": "word",
"position": 3
},
// more tokens
]
}
但是第二个请求仅返回此:
{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[ffe18f1a89e6][172.18.0.3:9300][indices:admin/analyze[s]]"
}
],
"type": "illegal_argument_exception",
"reason": "The difference between max_gram and min_gram in NGram Tokenizer must be less than or equal to: [1] but was [17]. This limit can be set by changing the [index.max_ngram_diff] index level setting."
},
"status": 400
}
发生了什么事,使用edge_ngram
标记器的第一个请求在max_gram
和min_gram
之间的差异可能比1
大,但是使用ngram
标记器的第二个请求却没有?
这是我的映射:
{
"settings": {
"index": {
"max_ngram_diff": 50,
// further settings
}
}
}
使用的弹性搜索版本为7.2.0
感谢您的帮助!
此行为与ES版本7.2.0有关。使用ES版本7.4.0时,一切正常。