如何确定 ElasticSearch 中哪个日志文件对索引大小贡献最大？

Question

我们在云端有一个 ElasticSearch 集群，我们的应用程序每天会生成多个不同的日志，十几个不同的日志文件。所有日志都被提取到 Elasticsearch 中的同一索引中，每天轮换。我们现在发现 ElasticSearch 中的日志非常庞大，需要找出哪些日志文件对来自哪个服务器的贡献最大。

我可以使用什么查询来按大小识别前 10 个服务器和前 10 个日志文件？这更像是通过将列分组到 SQL 数据库中来计算列的大小，但是，我对它的 Elastic 版本不太熟悉。

请帮助解决此案例的Elastic way（查询API）。

谢谢

Answer 1

您可以使用 Elasticsearch 术语聚合 按计数查看前 10 个结果。按尺寸比较困难且昂贵，所以我建议按数量使用。响应正文中的

doc_count

将向您显示结果。

#top 10 test_field result
GET test-index/_search
{
  "size": 0,
  "aggs": {
    "top_10_test_field": {
      "terms": {
        "field": "test_field.keyword",
        "size": 10
      }
    }
  }
}

#top 10 test_field + top 10 status
GET test-index/_search
{
  "size": 0,
  "aggs": {
    "top_10_test_field": {
      "terms": {
        "field": "test_field.keyword",
        "size": 10
      },
      "top_10_test_field_and_top_10_status": {
        "NAME": {
          "terms": {
            "field": "status.keyword"
          }
        }
      }
    }
  }
}

您可以在响应正文中使用
```
doc_count
```
作为结果。
为了适合您的示例，请将
```
servers
```
替换为
```
test_field
```
，并将
```
status
```
替换为
```
log
```
。

在上面的示例中，我们使用的是符合术语聚合条件的

.keyword

。使用以下

_mapping

API 调用检查您的字段，并在聚合中使用符合条件的字段。

GET your_index_name/_mapping

如果您有多个索引，您可以使用正则表达式。

GET index-*/_mapping, GET index-*/_search

如果您想按索引比较大小或计数，可以使用

_cat/indices

API 调用。

GET _cat/indices/index1,index2?v

如何确定 ElasticSearch 中哪个日志文件对索引大小贡献最大？

问题描述投票：0回答：1

1个回答

最新问题

如何确定 ElasticSearch 中哪个日志文件对索引大小贡献最大？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1