Elasticsearch - 如何让 ES 忽略查询字符串中的连词(and、of、or 等...)?

问题描述 投票:0回答:1

我有以下 Elasticsearch 搜索配置:

  query_string: {
    query: `${sanitizedQueryString}~`,
    fields: ['fieldOne^5', 'fieldTwo^5', 'fieldThree'],
  }, 

上面将我给出的每个搜索字符串分解为单独的单词,并搜索搜索字符串中的每个单词。如果我有以下记录:

[{
    id: 1,
    name: 'Harris'
}, {
    id: 2,
    name: 'Smith'
}, {
    id: 3,
    name: 'Dallas'
}, {
    id: 4,
    name: 'Farmers And Workers'
}];

我的搜索查询是

harris smith
,然后我会返回上面数组中的前 2 条记录。

如果我传入搜索查询

harris and smith
,我会返回 3 条记录 - 前 2 条记录和最后一条记录。在这种情况下,将返回最后一条记录,因为它的名称中包含单词
And
,并且我的搜索查询也包含单词
and

and
of
or
这样的词在英语中称为连词。如何排除搜索查询中的连词进行分析和搜索?

elasticsearch elasticsearch-5
1个回答
0
投票

问题的解决方案是

stop
过滤器

映射

PUT /conjunctions
{
    "settings": {
        "analysis": {
            "analyzer": {
                "stop_lowercase_whitespace_analyzer": {
                    "tokenizer": "whitespace",
                    "filter": [
                        "stop",
                        "lowercase"
                    ]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "id": {
                "type": "integer"
            },
            "name": {
                "type": "text",
                "analyzer": "stop_lowercase_whitespace_analyzer"
            }
        }
    }
}

文件

PUT /conjunctions/_bulk
{"create":{"_id":1}}
{"id":1,"name":"Harris"}
{"create":{"_id":2}}
{"id":2,"name":"Smith"}
{"create":{"_id":3}}
{"id":3,"name":"Dallas"}
{"create":{"_id":4}}
{"id":4,"name":"Farmers And Workers"}

查询

GET /conjunctions/_search?filter_path=hits.hits
{
    "query": {
        "match": {
            "name": "harris and smith"
        }
    }
}

回应

{
    "hits" : {
        "hits" : [
            {
                "_index" : "conjunctions",
                "_type" : "_doc",
                "_id" : "1",
                "_score" : 1.3940738,
                "_source" : {
                    "id" : 1,
                    "name" : "Harris"
                }
            },
            {
                "_index" : "conjunctions",
                "_type" : "_doc",
                "_id" : "2",
                "_score" : 1.3940738,
                "_source" : {
                    "id" : 2,
                    "name" : "Smith"
                }
            }
        ]
    }
}

您可以更改停用词列表。请参阅文档

© www.soinside.com 2019 - 2024. All rights reserved.