如何使用 DSL 查询丢弃 ElasticSearch 中的重复值?

问题描述 投票:0回答:2

这里我试图根据查询客户获取属性名称,这里的问题是属性名称中有很多重复值我想丢弃,有人可以帮助我吗

 {
  "_source": [
    "attribute_name"
  ],
  "size": 500, 
  "query": {
    "multi_match": {
      "query": "CUSTOMER",
      "fields": [
        "hierarchy_name",
        "attribute_name"
      ]
    }
  }
}

假设这是我的输出,这里我想丢弃重复的 attribute_name

 {
        "_index": "planlytx_records",
        "_type": "_doc",
        "_id": "tD6WDnkBQTXQIneq8Ypr",
        "_score": 2.5454113,
        "_source": {
          "attribute_name": "CUSTOMER"
        }
      },
      {
        "_index": "planlytx_records",
        "_type": "_doc",
        "_id": "3j6WDnkBQTXQIneq8Yps",
        "_score": 2.5454113,
        "_source": {
          "attribute_name": "CUSTOMER"
        }
      },
      {
        "_index": "planlytx_records",
        "_type": "_doc",
        "_id": "nT6WDnkBQTXQIneqyonu",
        "_score": 1.8101583,
        "_source": {
          "attribute_name": "REGION"
        }
      },
      {
        "_index": "planlytx_records",
        "_type": "_doc",
        "_id": "6D6WDnkBQTXQIneq8Yps",
        "_score": 1.8101583,
        "_source": {
          "attribute_name": "REGION"
        }
      },

我的输出应该是这样的..

{
        "_index": "planlytx_records",
        "_type": "_doc",
        "_id": "3j6WDnkBQTXQIneq8Yps",
        "_score": 2.5454113,
        "_source": {
          "attribute_name": "CUSTOMER"
        }
      },
      {
        "_index": "planlytx_records",
        "_type": "_doc",
        "_id": "nT6WDnkBQTXQIneqyonu",
        "_score": 1.8101583,
        "_source": {
          "attribute_name": "REGION"
        }
      },
elasticsearch logstash kibana elasticsearch-5 elasticsearch-dsl
2个回答
0
投票

您可以使用

collapse
参数,根据字段值从搜索结果中删除重复项

添加包含索引数据、映射、搜索查询和搜索结果的工作示例

索引映射:

{
  "mappings": {
    "properties": {
      "attribute_name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

索引数据:

{
  "attribute_name": "CUSTOMER-ALL"
}
{
  "attribute_name": "CUSTOMER-ALL"
}
{
  "attribute_name": "CUSTOMER"
}
{
  "attribute_name": "CUSTOMER"
}

搜索查询:

{
  "query": {
    "multi_match": {
      "query": "CUSTOMER",
      "fields": [
        "attribute_name"
      ]
    }
  },
  "collapse": {
    "field": "attribute_name.keyword"
  }
}

搜索结果:

 "hits": [
      {
        "_index": "67260491",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.12199639,
        "_source": {
          "attribute_name": "CUSTOMER"
        },
        "fields": {
          "attribute_name.keyword": [
            "CUSTOMER"
          ]
        }
      },
      {
        "_index": "67260491",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.09271726,
        "_source": {
          "attribute_name": "CUSTOMER-ALL"
        },
        "fields": {
          "attribute_name.keyword": [
            "CUSTOMER-ALL"
          ]
        }
      }
    ]

更新1:

如果您只想删除重复数据,可以运行以下查询

{
  "collapse": {
    "field": "attribute_name.keyword"
  }
}

搜索结果将是

"hits": [
      {
        "_index": "67276433",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "attribute_name": "CUSTOMER"
        },
        "fields": {
          "attribute_name.keyword": [
            "CUSTOMER"
          ]
        }
      },
      {
        "_index": "67276433",
        "_type": "_doc",
        "_id": "3",
        "_score": 1.0,
        "_source": {
          "attribute_name": "REGION"
        },
        "fields": {
          "attribute_name.keyword": [
            "REGION"
          ]
        }
      }
    ]

0
投票

您可以使用:

    `"collapse": {
        "field": "attribute_name.keyword"
     }`

例如在包含您可以使用的姓名的员工索引中

    `GET employee/_search
    {
      "query": {
        "multi_match": {
         "query": "Mohan",
          "type": "bool_prefix",
          "fields": [
              "first_name"
           ]
        }
      },
      "collapse": {
        "field": "first_name"
       }
    }`
© www.soinside.com 2019 - 2024. All rights reserved.