我正在尝试根据 PubSub 订阅中失败消息的比率创建警报策略。我喜欢使用 pubsub.googleapis.com/subscription/dead_letter_message_count 作为分子,使用 pubsub.googleapis.com/subscription/pull_ack_request_count 作为分母。 Alignment Periods 匹配,我使用 Cross Series Reducer 通过消除所有标签来消除分母中的附加标签。我打算创建的警报策略如下所示:
monitoring/alertPolicy:AlertPolicy:
combiner : "AND"
conditions : [
[0]: {
conditionThreshold: {
aggregations : [
[0]: {
alignmentPeriod : "600s"
crossSeriesReducer: "REDUCE_SUM"
perSeriesAligner : "ALIGN_SUM"
}
]
comparison : "COMPARISON_GT"
denominatorAggregations: [
[0]: {
alignmentPeriod : "600s"
crossSeriesReducer: "REDUCE_SUM"
perSeriesAligner : "ALIGN_SUM"
}
]
denominatorFilter : "resource.type = \"pubsub_subscription\" AND resource.labels.subscription_id = \"subscription\" AND metric.type = \"pubsub.googleapis.com/subscription/pull_ack_request_count\""
duration : "1800s"
filter : "resource.type = \"pubsub_subscription\" AND resource.labels.subscription_id = \"subscription\" AND metric.type = \"pubsub.googleapis.com/subscription/dead_letter_message_count\""
thresholdValue : 0.5
}
}
]
但是我得到了错误:
Error creating AlertPolicy: googleapi: Error 400: 分子是一个 增量指标但分母不是增量指标。
这看起来很混乱,因为这两个指标都是 Delta。我使用 API 资源管理器来检索时间序列。对于分子,我得到:
{
"timeSeries": [
{
"metric": {
"type": "pubsub.googleapis.com/subscription/dead_letter_message_count"
},
"resource": {
"type": "pubsub_subscription",
"labels": {
"project_id": "redacted"
}
},
"metricKind": "DELTA",
"valueType": "INT64",
"points": [
{
"interval": {
"startTime": "2023-03-13T10:10:00Z",
"endTime": "2023-03-13T10:20:00Z"
},
"value": {
"int64Value": "0"
}
},
....,
{
"interval": {
"startTime": "2023-03-13T09:10:00Z",
"endTime": "2023-03-13T09:20:00Z"
},
"value": {
"int64Value": "93"
}
},
{
"interval": {
"startTime": "2023-03-13T09:00:00Z",
"endTime": "2023-03-13T09:10:00Z"
},
"value": {
"int64Value": "9"
}
},
{
"interval": {
"startTime": "2023-03-13T08:50:00Z",
"endTime": "2023-03-13T09:00:00Z"
},
"value": {
"int64Value": "34"
}
}
]
}
],
"unit": "1"
}
对于分母:
{
"timeSeries": [
{
"metric": {
"type": "pubsub.googleapis.com/subscription/pull_ack_request_count"
},
"resource": {
"type": "pubsub_subscription",
"labels": {
"project_id": "redacted"
}
},
"metricKind": "DELTA",
"valueType": "INT64",
"points": [
{
"interval": {
"startTime": "2023-03-13T09:50:00Z",
"endTime": "2023-03-13T10:00:00Z"
},
"value": {
"int64Value": "6"
}
},
....,
{
"interval": {
"startTime": "2023-03-13T08:20:00Z",
"endTime": "2023-03-13T08:30:00Z"
},
"value": {
"int64Value": "104"
}
},
{
"interval": {
"startTime": "2023-03-13T08:10:00Z",
"endTime": "2023-03-13T08:20:00Z"
},
"value": {
"int64Value": "93"
}
},
{
"interval": {
"startTime": "2023-03-13T08:00:00Z",
"endTime": "2023-03-13T08:10:00Z"
},
"value": {
"int64Value": "111"
}
}
]
}
],
"unit": "1"
}
由于某些实现细节,无法在基于 JSON 的警报中定义此基于比率的警报。 来自谷歌:
我们从产品团队那里得到了一个更新,指出这个问题是由于 到增量字段中的不一致。显然是这个的原因 是 pull_ack_request_count 有一个增量窗口操作 其定义中的显式窗口。这个明确的窗口阻止 被标记为增量的预计算。
比率是 Google 内部的查询功能。我们缺乏信心 实施并且不能保证不会出现错误。建议 一般都是用MQL代替分母过滤器。