是否可以使用正则表达式增强elasticsearch匹配?

问题描述 投票:0回答:1

所以我一直在使用elasticsearch,我遇到了一个问题,我正在努力使用正则表达式来增强我的匹配,例如,如果我查询“文档524106”,我希望它匹配该字段,那么如果匹配的数字与我的正则表达式匹配

[0-9]{5,}
分数将会提高。 所以我尝试将我的匹配查询与正则表达式结合起来,但这行不通,无论我做什么,正则表达式都会增强任何包含与正则表达式匹配的数字的文档,我尝试做一个必须包含匹配和正则表达式的文档,我尝试使用regexp 作为过滤器,我尝试将两者分开,有人能帮我弄清楚它是否可能吗?

我尝试了多个查询,包括必须查询、过滤器,将两者结合起来,所以我最后尝试的是这个,但仍然没有得到我正在考虑的解决方案

content_query.append(
    {
        "bool": {
            "must": [
                {
                    "match": {
                        "document.number".format(lang): {
                            "query": token
                        }
                    }
                },
                {
                    "regexp": {
                        "document.number".format(lang): {
                            "value": "[0-9]{5,}",
                            "boost": 5
                        }
                    }
                }
            ],
            "filter": {
                "bool": {
                    "must": [
                        {
                            "regexp": {
                                "document.number".format(lang): {
                                    "value": "[0-9]{5,}"
                                }
                            }
                        },
                        {
                            "match": {
                                "document.number".format(lang): {
                                    "query": token
                                }
                            }
                        }
                    ]
                }
            }
        }
    }
)

举更多例子: 假设我有 3 个文档

Document 1
Title : this is document 1
document: I have 56000 dollars in my bank account
Document 2
Title : this is document 2
document: I have 123695 dollars in my bank account
Document 3
Title : this is document 3
document: I have 52.85 dollars in my bank account

我索引的方式是将文本与数字分开,因此要查询的 document.number 。我用来索引数字的正则表达式是

"\d+\s*\-?\.?\s*\d+"
如果我使用匹配和正则表达式查询“56000 美元”,它应该与匹配查询和正则表达式匹配 56000,这样它就会得到提升,出于某种原因,正则表达式也会提升所有其他数字,这不是我想要的

python regex elasticsearch
1个回答
0
投票

如果我正确理解你的问题,那么以下查询就是一个解决方案

GET /regexp_fields/_search?filter_path=hits.hits
{
    "query": {
        "dis_max": {
            "queries": [
                {
                    "regexp": {
                        "text": "[0-9]{5,}"
                    }
                },
                {
                    "match": {
                        "text": "dollars"
                    }
                }
            ],
            "tie_breaker": 0.8
        }
    }
}

文件

PUT /regexp_fields/_bulk
{"create":{"_id":1}}
{"title" : "this is document 1", "text": "I have 56000 dollars in my bank account"}
{"create":{"_id":2}}
{"title" : "this is document 2", "text": "I have 123695 dollars in my bank account"}
{"create":{"_id":3}}
{"title" : "this is document 3", "text": "I have 52.85 dollars in my bank account"}

回应

{
    "hits" : {
        "hits" : [
            {
                "_index" : "regexp_fields",
                "_type" : "_doc",
                "_id" : "1",
                "_score" : 1.1068251,
                "_source" : {
                    "title" : "this is document 1",
                    "text" : "I have 56000 dollars in my bank account"
                }
            },
            {
                "_index" : "regexp_fields",
                "_type" : "_doc",
                "_id" : "2",
                "_score" : 1.1068251,
                "_source" : {
                    "title" : "this is document 2",
                    "text" : "I have 123695 dollars in my bank account"
                }
            },
            {
                "_index" : "regexp_fields",
                "_type" : "_doc",
                "_id" : "3",
                "_score" : 0.13353139,
                "_source" : {
                    "title" : "this is document 3",
                    "text" : "I have 52.85 dollars in my bank account"
                }
            }
        ]
    }
}
© www.soinside.com 2019 - 2024. All rights reserved.