elasticsearch 字符串聚合数组

问题描述 投票:0回答:2

我需要一个聚合查询来获取包含所有根文件夹的存储桶。我的elasticsearch中的所有文档都有一个名为path的字段,我在其中存储一个数组,其中包含文档所在的路径(例如path=[1.3., 1.2.4, 5., 11])。

如果我使用普通术语聚合

"terms": {
    "field": "path.keyword"
}

不幸的是我得到了所有独特的路径:

"buckets" : [
    {
      "key" : "1.3."
      "doc_count" : 6
    },
    {
      "key" : "11."
      "doc_count" : 3
    },
    {
      "key" : "5."
      "doc_count" : 3
    },
    {
      "key" : "1.2.4."
      "doc_count" : 1
    }
]

我尝试使用无痛脚本来解决它

"terms": {
    "script": "doc['path.keyword'].value.substring(0, doc['path.keyword'].value.indexOf('.')  )"
}

但是我只得到路径数组的最后一个元素

"buckets" : [
    {
      "key" : "1",
      "doc_count" : 7
    },
    {
      "key" : "11",
      "doc_count" : 3
    }
]

如何只获取根文件夹?

elasticsearch elasticsearch-aggregation elasticsearch-painless
2个回答
3
投票

使用 doc["field"].value 将给出字段中所有值的单个字符串。 在脚本中,您需要返回具有根值的值数组,即迭代字段的所有元素并返回子字符串数组。

样本数据:

"hits" : [
      {
        "_index" : "index84",
        "_type" : "_doc",
        "_id" : "yihhWnEBHtQEPt4DqWLz",
        "_score" : 1.0,
        "_source" : {
          "path" : [
            "1.1.1",
            "1.2",
            "2.1.1",
            "12.11"
          ]
        }
      }
    ]

查询

{
  "aggs": {
    "root_path": {
      "terms": {
        "script": {
          "source": "def firstIndex=0;def path=[]; for(int i=0;i<doc['path.keyword'].length;i++){firstIndex=doc['path.keyword'][i].indexOf('.'); path.add(doc['path.keyword'][i].substring(0,firstIndex))} return path;"
        }
      }
    }
  }
}

结果:

"aggregations" : {
    "root_path" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1",
          "doc_count" : 1
        },
        {
          "key" : "12",
          "doc_count" : 1
        },
        {
          "key" : "2",
          "doc_count" : 1
        }
      ]
    }
  }

0
投票

有一种方法可以通过一行脚本来解决这个问题

映射

PUT /root_paths
{
    "settings": {
        "analysis": {
            "analyzer": {
                "pattern_first_token_analyzer": {
                    "tokenizer": "dot_split_tokenizer",
                    "filter": [
                        "first_token_filter"
                    ]
                }
            },
            "tokenizer": {
                "dot_split_tokenizer": {
                    "type": "pattern",
                    "pattern": "\\."
                }
            }, 
            "filter": {
                "first_token_filter": {
                    "type": "predicate_token_filter",
                    "script": {
                        "source": """
                            token.position == 0;
                        """
                    }
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "path": {
                "type": "text",
                "fields": {
                    "root": {
                        "type": "text",
                        "analyzer": "pattern_first_token_analyzer",
                        "fielddata": true
                    }
                }
            }
        }
    }
}

文件

PUT /root_paths/_bulk
{"create":{"_id":1}}
{"path":"1.3.5"}
{"create":{"_id":2}}
{"path":"1.2"}
{"create":{"_id":3}}
{"path":"2.6.9"}
{"create":{"_id":4}}
{"path":"10.11.12"}

聚合查询

GET /root_paths/_search?filter_path=aggregations
{
    "aggs": {
        "by_root_path": {
            "terms": {
                "field": "path.root"
            }
        }
    }
}

回应

{
    "aggregations" : {
        "by_root_path" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
                {
                    "key" : "1",
                    "doc_count" : 2
                },
                {
                    "key" : "10",
                    "doc_count" : 1
                },
                {
                    "key" : "2",
                    "doc_count" : 1
                }
            ]
        }
    }
}

另一种方法是使用运行时字段将路径按点分割成字符串数组并提取数组的第一项

© www.soinside.com 2019 - 2024. All rights reserved.