我们正在调查Azure App Service自动缩放的问题。有时,当CPU使用率超过设置的阈值时,实例将被删除。
根据以下准则设置自动缩放:https://docs.microsoft.com/en-us/azure/azure-monitor/platform/autoscale-best-practices
规则是:
冷静期=所有规则10分钟
我们确定此问题是由自动缩放导致观察错误的App Service Plan平均CPU利用率值引起的:
如您所见,有时差异会很大(55%对24%)。
我们已启用诊断日志记录,以自动评估存储帐户。并发现以下日志:
{ "time": "2019-11-18T07:27:02.1437280Z", "resourceId": "---removed---", "category": "AutoscaleEvaluations", "operationName": "MetricEvaluation", "correlationId": "b8265c6f-47b8-4cd4-9c59-c710b407d043", "properties": {"targetResourceId":"---removed---","metricName":"CpuPercentage","metricNamespace":"","timeGrain":"00:01:00","timeGrainStatistic":"Average","startTime":"11/18/2019 7:21:00 AM","endTime":"11/18/2019 7:26:00 AM","data":"[47.666666666666664,45.428571428571431,48.666666666666664,44.0,0.0,0.0]"}}
{ "time": "2019-11-18T07:27:02.1437280Z", "resourceId": "---removed---", "category": "AutoscaleEvaluations", "operationName": "ScaleRuleEvaluation", "correlationId": "b8265c6f-47b8-4cd4-9c59-c710b407d043", "properties": {"targetResourceId":"---removed---","metricName":"CpuPercentage","metricNamespace":"microsoft.web/serverfarms","timeGrain":"00:01:00","timeGrainStatistic":"Average","timeWindow":"00:05:00","timeAggregationType":"Average","operator":"GreaterThanOrEqual","threshold":"70","observedValue":"30.9603174603175","estimateScaleResult":"NotTriggered"}}
注意,第一个日志的数据收集中的最后两个值是0.0,第二个日志的观察者值是30.9603174603175。我们已经验证了在该时间段内CPU使用率从未低于40%。
似乎自动缩放逻辑仅使用0.0而不是缺少值(空),这对于平均值计算是错误的。
这是预期的行为吗?目前,我们已将自动扩展时间窗口设置为10分钟以进行横向扩展,将15分钟用于扩展以最小化错误。有更好的解决方法吗?
最近,我们在西欧的服务计划中发现了相同的行为。我们已经打开了支持请求,我们正在等待回应。