GKE 升级会杀死主动运行请求的 Pod

问题描述 投票:0回答:1

我们遇到了 Google Kubernetes Engine (GKE) 的问题,定期升级到新版本会导致集群内的 Pod 和容器中断。虽然我们知道升级是必要的和预期的,但当集群中断我们的服务时,即使请求正在积极运行,问题也会出现。

我们的设置包括多项服务,特别是 Express Gateway 和多个相互连接的 Rails 服务,如下所示:Ingress -> Express -> Rails1 -> Rails2。

在 GKE 升级期间,如果请求从 Express 传输到 Rails1,并且 Rails1 由于升级过程而终止,我们会观察到网关仅收到一般消息,没有任何详细错误或潜在问题的指示。

RequestError: Timeout awaiting 'request' for 3000ms
    at ClientRequest.<anonymous> (/app/node_modules/got/dist/source/core/index.js:970:65)
    at /app/node_modules/@opentelemetry/context-async-hooks/build/src/AbstractAsyncHooksContextManager.js:50:55
    at AsyncLocalStorage.run (node:async_hooks:319:14)
    at AsyncLocalStorageContextManager.with (/app/node_modules/@opentelemetry/context-async-hooks/build/src/AsyncLocalStorageContextManager.js:33:40)
    at ClientRequest.contextWrapper (/app/node_modules/@opentelemetry/context-async-hooks/build/src/AbstractAsyncHooksContextManager.js:50:32)
    at Object.onceWrapper (node:events:628:26)
    at ClientRequest.emit (node:events:525:35)
    at ClientRequest.origin.emit (/app/node_modules/@szmarczak/http-timer/dist/source/index.js:43:20)
    at TLSSocket.socketErrorListener (node:_http_client:494:9)
    at TLSSocket.emit (node:events:513:28)
    at emitErrorNT (node:internal/streams/destroy:157:8)
    at emitErrorCloseNT (node:internal/streams/destroy:122:3)
    at processTicksAndRejections (node:internal/process/task_queues:83:21)
    at Timeout.timeoutHandler [as _onTimeout] (/app/node_modules/got/dist/source/core/utils/timed-out.js:36:25)
    at listOnTimeout (node:internal/timers:561:11)
    at processTimers (node:internal/timers:502:7) {

我们试图在我们的商业时报中避免这种更新时间,但这并不能解决根本问题。我也查看了日志,但看不到太多信息。如果您需要任何其他日志,我会尝试查找并将其发送到此处。

express kubernetes google-kubernetes-engine kubernetes-ingress
1个回答
0
投票

您可能希望研究 lifecycle hooks(在终止期间耗尽连接)和 poddisruptionbudgets(以帮助确保服务弹性)来帮助缓解这些问题。

© www.soinside.com 2019 - 2024. All rights reserved.