我们有一个 Azure 应用服务,它正在调用第三方 API。有时,对该第三方 API 的请求看似随机地会失败,然后立即重新开始工作。大约 20% 的请求失败。第三方声称他们没有任何问题。
发出请求的代码是这样的:
try
{
// simplified
var httpRequest = new HttpRequestMessage
{
RequestUri = new Uri(requestUri),
Method = HttpMethod.Get
};
httpRequest.Headers.Add("username", username);
httpRequest.Headers.Add("token", accessToken);
response = await _httpClient.SendAsync(httpRequest);
response.EnsureSuccessStatusCode();
}
catch (Exception ex) {
_logger.Log(ex)
}
另请注意,HttpClient 使用池化,定义于
Startup.cs
:
var jitter = new Random();
var retryPolicy = HttpPolicyExtensions.HandleTransientHttpError()
.OrResult(msg => msg.StatusCode == HttpStatusCode.NotFound)
.WaitAndRetryAsync(3, retryAttempt => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt)) + TimeSpan.FromMilliseconds(jitter.Next(0, 500))); // exponential backoff + jitter to prevent grouping
var sp = ServicePointManager.FindServicePoint(new Uri(_config.BaseUri));
sp.ConnectionLeaseTimeout = 60 * 1000;
services.AddHttpClient<IFooService, FooService>(client =>
{
client.Timeout = TimeSpan.FromSeconds(10);
}).SetHandlerLifetime(TimeSpan.FromMinutes(10)).AddPolicyHandler(retryPolicy);
发生异常时,是这样的:
System.Threading.Tasks.TaskCanceledException: The operation was canceled.
at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean allowHttp2, CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.GetHttpConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at System.Net.Http.DiagnosticsHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at Microsoft.Extensions.Http.Logging.LoggingHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at Polly.Retry.AsyncRetryEngine.ImplementationAsync[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Func`5 onRetryAsync, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider, Boolean continueOnCapturedContext)
at Polly.AsyncPolicy`1.ExecuteAsync(Func`3 action, Context context, CancellationToken cancellationToken, Boolean continueOnCapturedContext)
at Microsoft.Extensions.Http.PolicyHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at Microsoft.Extensions.Http.Logging.LoggingScopeHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
请注意,当应用程序服务失败时,通过 Postman 发送的任何请求都可以正常工作,这意味着应用程序服务中存在网络级问题。
我看到过有关 Azure Web 服务/HttpClient 问题的类似问题,表明这是 SNAT 耗尽问题,但是 Azure 诊断工具指出 SNAT 不是问题并且完全在限制范围内 - 请参阅下图:
任何人都可以告诉我为什么应用程序服务会随机阻止 HttpClient 请求,以及如何确定这是否是问题并纠正它。
我们也有类似的问题。我们对不同的 API 进行多次调用,并在 21 秒后收到大量“取消”消息,有时在 100 秒后收到“超时”消息(HttpClient 的默认超时)。 Azure SNAT 报告未显示特殊内容。我们(还)没有 NAT 网关。 一个使用 API 网络的团队没有发现他们这边的错误,并且在他们的防火墙日志中看到许多“client_rst”(客户端重置)。 我们不是设置ServicePointManager。
您找到解决问题的方法了吗?
问候。