Elasticsearch“ max_ngram_diff”适用于“ edge_ngram”,但不适用于“ ngram_tokenizer”

问题描述 投票:0回答:1

[我在其中设置了"max_ngram_diff": 50的地方有一个Elasticsearch索引,但是以某种方式,它似乎仅适用于edge_ngram标记生成器,而不适用于ngram标记生成器。

我已经针对相同的URL http://localhost:9201/index-name/_analyze发出了这两个请求:

请求1

{
    "tokenizer":
    {
        "type": "edge_ngram",
        "min_gram": 3,
        "max_gram": 20,
        "token_chars": [
            "letter",
            "digit"
        ]
    },
    "text": "1234567890;abcdefghijklmn;"
}

请求2

{
    "tokenizer": {
        "type": "ngram",
        "min_gram": 3,
        "max_gram": 20,
        "token_chars": [
            "letter",
            "digit"
        ]
    },
    "text": "1234567890;abcdefghijklmn;"
}

第一个请求返回预期结果:

{
    "tokens": [
        {
            "token": "123",
            "start_offset": 0,
            "end_offset": 3,
            "type": "word",
            "position": 0
        },
        {
            "token": "1234",
            "start_offset": 0,
            "end_offset": 4,
            "type": "word",
            "position": 1
        },
        {
            "token": "12345",
            "start_offset": 0,
            "end_offset": 5,
            "type": "word",
            "position": 2
        },
        {
            "token": "123456",
            "start_offset": 0,
            "end_offset": 6,
            "type": "word",
            "position": 3
        }, 
        // more tokens
    ]
}

但是第二个请求仅返回此:

{
    "error": {
        "root_cause": [
            {
                "type": "remote_transport_exception",
                "reason": "[ffe18f1a89e6][172.18.0.3:9300][indices:admin/analyze[s]]"
            }
        ],
        "type": "illegal_argument_exception",
        "reason": "The difference between max_gram and min_gram in NGram Tokenizer must be less than or equal to: [1] but was [17]. This limit can be set by changing the [index.max_ngram_diff] index level setting."
    },
    "status": 400
}

发生了什么事,使用edge_ngram标记器的第一个请求在max_grammin_gram之间的差异可能比1大,但是使用ngram标记器的第二个请求却没有?

这是我的映射:

{
    "settings": {
        "index": {
            "max_ngram_diff": 50,
            // further settings
         }
     }
}

使用的弹性搜索版本为7.2.0

感谢您的帮助!

elasticsearch tokenize n-gram elasticsearch-analyzers
1个回答
0
投票

此行为与ES版本7.2.0有关。使用ES版本7.4.0时,一切正常。

© www.soinside.com 2019 - 2024. All rights reserved.