使用 trino 与工作节点对话时遇到太多错误

Question

在运行单个大型查询时遇到此问题。我们可以在这个错误发生之前终止这样的查询吗？

io.trino.operator.PageTransportTimeoutException: Encountered too many errors talking to a worker node. The node may have crashed or be under too much load. This is probably a transient issue, so please retry your query in a few minutes. (http://172.22.66.206:8889/v1/task/20230727_083615_00032_edi7s.0.0.0/results/0/0 - 30 failures, failure duration 302.86s, total failed request time 312.86s)

3 节点集群 m6g.16xlarge（协调员和 2 个工作人员）

node-scheduler.include-coordinator=false
discovery.uri=http://ip-172-22-69-150.ec2.internal:8889
http-server.threads.max=500
sink.max-buffer-size=1GB
query.max-memory=3000GB
query.max-memory-per-node=60GB
query.max-history=40
query.min-expire-age=30m
query.client.timeout=30m
query.stage-count-warning-threshold=100
query.max-stage-count=150
http-server.http.port=8889
http-server.log.path=/var/log/trino/http-request.log
http-server.log.max-size=67108864B
http-server.log.max-history=5
log.max-size=268435456B
jmx.rmiregistry.port = 9080
jmx.rmiserver.port = 9081
node-scheduler.max-splits-per-node = 200
experimental.query-max-spill-per-node = 50GB
graceful-shutdown-timeout = 3600s
task.concurrency = 16
query.execution-policy = phased
experimental.max-spill-per-node = 100GB
query.max-concurrent-queries = 20
query.max-total-memory = 5000GB

Answer 1

我在 jvm 配置中有以下标志。 -XX：+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError=kill -9 %p

因此，每当出现 OOM 时，进程就会将其堆转储到磁盘上并终止该进程。工作节点磁盘利用率过高 (>95%)，这导致 trino 进程无法启动。

环顾四周后，我发现 OOM 问题是由于这个问题造成的：- https://bugs.openjdk.org/browse/JDK-8293861

为了解决这个问题，我添加了以下 jvm 属性 -XX:+解锁诊断VM选项 -XX:-G1使用预防性GC

这可以防止进程因GC而进入OOM

使用 trino 与工作节点对话时遇到太多错误

问题描述投票：0回答：1

1个回答

最新问题

使用 trino 与工作节点对话时遇到太多错误

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1