Elasticsearch未返回搜索结果中期望的文档

问题描述 投票:0回答:1

我有一个客户集合,这些客户具有名字,姓氏,电子邮件,描述和所有者ID。我想从应用程序中提取一个字符串,并按照优先顺序搜索所有字段。我正在使用boost实现这一目标。

[目前,我在文档中的各个字段中都有很多名为Sean的测试客户。我有2个文档,其中包含一封电子邮件,电子邮件为[email protected]。一个文档的说明中包含相同的电子邮件。

[当我执行以下搜索时,我在搜索结果中丢失了说明中不包含电子邮件的文档。

这是我的查询:

{
  "query" : {
    "bool" : {
      "filter" : {
        "match" : {
          "ownerId" : "acct_123"
        }
      },
      "must" : [
        {
          "bool" : {
            "should" : [
              {
                "prefix" : {
                  "firstName" : {
                    "value" : "sean",
                    "boost" : 3
                  }
                }
              },
              {
                "prefix" : {
                  "lastName" : {
                    "value" : "sean",
                    "boost" : 3
                  }
                }
              },
              {
                "terms" : {
                  "boost" : 2,
                  "description" : [
                    "sean"
                  ]
                }
              },
              {
                "prefix" : {
                  "email" : {
                    "value" : "sean",
                    "boost" : 1
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }
}

这里是我遗失的文件:

{
  "_index" : "xxx",
  "_id" : "cus_123",
  "_version" : 1,
  "_type" : "customers",
  "_seq_no" : 9096,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "firstName" : null,
    "id" : "cus_123",
    "lastName" : null,
    "email" : "[email protected]",
    "ownerId" : "acct_123",
    "description" : null
  }
}

当我查看当前结果时,所有文档的得分均为3.0。他们的名字中也有“ Sean”,因此得分更高。当我对缺少的文档执行_explain时,通过上面的查询,我得到以下信息:

{
    "_index": "xxx",
    "_type": "customers",
    "_id": "cus_123",
    "matched": true,
    "explanation": {
        "value": 1.0,
        "description": "sum of:",
        "details": [
            {
                "value": 1.0,
                "description": "sum of:",
                "details": [
                    {
                        "value": 1.0,
                        "description": "ConstantScore(email._index_prefix:sean)",
                        "details": []
                    }
                ]
            },
            {
                "value": 0.0,
                "description": "match on required clause, product of:",
                "details": [
                    {
                        "value": 0.0,
                        "description": "# clause",
                        "details": []
                    },
                    {
                        "value": 1.0,
                        "description": "ownerId:acct_123",
                        "details": []
                    }
                ]
            }
        ]
    }
}

这是我的映射:

{
  "properties": {
    "firstName": {
      "type": "text",
      "index_prefixes": {
        "max_chars": 10,
        "min_chars": 1
      }
    },
    "email": {
      "analyzer": "my_email_analyzer",
      "type": "text",
      "index_prefixes": {
        "max_chars": 10,
        "min_chars": 1
      }
    },
    "lastName": {
      "type": "text",
      "index_prefixes": {
        "max_chars": 10,
        "min_chars": 1
      }
    },
    "description": {
      "type": "text"
    },
    "ownerId": {
      "type": "text"
    }
  }
}
        "my_email_analyzer": {
          "type": "custom",
          "tokenizer": "uax_url_email"
        }

如果我正确理解这一点,因为该文档仅获得1分,因此未达到特定阈值。香港专业教育学院试图调整min_score,但我没有运气。关于如何使该文档包含在搜索结果中的任何想法?

非常感谢

elasticsearch
1个回答
1
投票

取决于“丢失”的含义:

  1. 是不是文档没有将其计入命中数(“总数”?]]
  2. 或者是,文档本身未在匹配列表中显示为匹配?
  3. 如果是#2,您可能希望通过在搜索请求中添加size子句(默认大小为10)来增加Elasticsearch获取和返回的文档数量:

示例

"size": 50
© www.soinside.com 2019 - 2024. All rights reserved.