已解决的警报会重置Alertmanager中的repeat_interval

问题描述 投票:0回答:1

我的警报管理器配置为

 config:
  global: 
    resolve_timeout: 5m
  route:
    group_by: ['alert_manager_group_by']
    group_wait: 30s
    group_interval: 15m
    repeat_interval: 30m
    receiver: 'alerting-metrics'
    routes:
    - receiver: 'alerting-metrics'
      matchers:
          - group =~ "TestGroup"
      continue: true
    - receiver: 'teams'
      matchers:
        - group =~ "TestGroup"
      continue: true
    - receiver: 'null'
      matchers:
        - alertname =~ "InfoInhibitor|Watchdog"
  receivers:
    - name: 'null'
    - name: 'alerting-metrics'
      webhook_configs:
      - send_resolved: false
        url: 'my webhook'
    - name: 'teams'
      webhook_configs:
      - send_resolved: false
        url: http://prometheus-msteams:2000/high_priority_channel

我有一个接收器,我可以在其中接收警报,并且我有普罗米修斯计数器来获取已触发警报数量的统计数据。

"alertmanager-monitoring-emea-dev-kube-p-alertmanager-0","3/18/2024, 3:45:13.104 PM","ts=2024-03-18T15:45:13.104Z caller=notify.go:743 level=debug component=dispatcher receiver=alerting-metrics integration=webhook[0] msg=""Notify success"" attempts=1" 45 minutes after last notification as no new alert was added to the group (group_wait + repeat_interval)

上次通知后约 30 分钟后,新警报的状态为已解决

"alertmanager-monitoring-emea-dev-kube-p-alertmanager-0","3/18/2024, 4:15:12.988 PM","ts=2024-03-18T16:15:12.988Z caller=dispatch.go:163 level=debug component=dispatcher msg=""Received alert"" alert=Prometheus_P01_Alert[6fdc8ff][resolved]"
"alertmanager-monitoring-emea-dev-kube-p-alertmanager-0","3/18/2024, 4:15:13.063 PM","ts=2024-03-18T16:15:13.063Z caller=dispatch.go:515 level=debug component=dispatcher aggrGroup=""{}/{group=~\""TestGroup""}:{alert_manager_group_by=\""Front_Door\""}"" msg=flushing alerts=[Prometheus_P01_Alert[6fdc8ff][resolved]]"
"alertmanager-monitoring-emea-dev-kube-p-alertmanager-0","3/18/2024, 4:15:13.063 PM","ts=2024-03-18T16:15:13.063Z caller=dispatch.go:515 level=debug component=dispatcher aggrGroup=""{}/{group=~\""TestGroup""}:{alert_manager_group_by=\""Front_Door\""}"" msg=flushing alerts=[Prometheus_P01_Alert[6fdc8ff][resolved]]"

新警报已激活

alertmanager-monitoring-emea-dev-kube-p-alertmanager-0","3/18/2024, 4:16:42.989 PM","ts=2024-03-18T16:16:42.989Z caller=dispatch.go:163 level=debug component=dispatcher msg=""Received alert"" alert=Prometheus_P01_Alert[6fdc8ff][active]"

30秒后发送通知

"alertmanager-monitoring-emea-dev-kube-p-alertmanager-0","3/18/2024, 4:17:12.990 PM","ts=2024-03-18T16:17:12.990Z caller=dispatch.go:515 level=debug component=dispatcher aggrGroup=""{}/{group=~\""TestGroup"}:{alert_manager_group_by=\""Front_Door\""}"" msg=flushing alerts=[Prometheus_P01_Alert[6fdc8ff][active]]"
"alertmanager-monitoring-emea-dev-kube-p-alertmanager-0","3/18/2024, 4:17:12.990 PM","ts=2024-03-18T16:17:12.990Z caller=dispatch.go:515 level=debug component=dispatcher aggrGroup=""{}/{group=~\""TestGroup""}:{alert_manager_group_by=\""Front_Door\""}"" msg=flushing alerts=[Prometheus_P01_Alert[6fdc8ff][active]]"
"alertmanager-monitoring-emea-dev-kube-p-alertmanager-0","3/18/2024, 4:17:13.016 PM","ts=2024-03-18T16:17:13.016Z caller=notify.go:743 level=debug component=dispatcher receiver=alerting-metrics integration=webhook[0] msg=""Notify success"" attempts=1" notification was sent regardless of when the last notification was sent

问题:这是正常行为吗? Alertmanager是否重置repeat_interval?

prometheus prometheus-alertmanager
1个回答
0
投票

Alertmanager 的文档 声明

repeat_interval

如果已经发送通知,需要等待多长时间才能再次发送
已成功发送警报。

这意味着自该警报的上次通知以来,经过

repeat_interval
后,同一警报将再次触发通知。

由于您的初始警报已得到解决,因此它不再存在,并且

repeat_interval
并不重要。

新的警报,即使具有相同的标签,也会被视为一个单独的对象,具有单独的生命周期。


如果您对如何在一段时间内阻止来自同一警报规则的新通知感兴趣,即使初始警报已解决,您也可以查看此答案

© www.soinside.com 2019 - 2024. All rights reserved.