dask 调度程序的连接问题

问题描述 投票:0回答:1

我已经使用 GKE 设置了一个 kubernetes 集群并安装了 dask-kubernetes-operator。 当我尝试像这样启动集群时

cluster: KubeCluster = KubeCluster(custom_cluster_spec="cluster.yaml")
client = Client(cluster)
client

其中 .yaml 基本上是来自此网站的 cluster-spec.yaml 示例,但使用我自己的图像(基于 ghcr.io/dask/dask:2023.10.0-py3.10), 我收到以下错误消息,通常连续多次:

Task exception was never retrieved
future: <Task finished name='Task-822' coro=<PortForward._sync_sockets() done, defined at 
/opt/conda/lib/python3.10/site-packages/kr8s/_portforward.py:167> exception=ExceptionGroup('unhandled errors in a 
TaskGroup', [ConnectionClosedError('TCP socket closed')])>
  + Exception Group Traceback (most recent call last):
  |   File "/opt/conda/lib/python3.10/site-packages/kr8s/_portforward.py", line 171, in _sync_sockets
  |     async with anyio.create_task_group() as tg:
  |   File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 664, in __aexit__
  |     raise BaseExceptionGroup(
  | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/opt/conda/lib/python3.10/site-packages/kr8s/_portforward.py", line 183, in _tcp_to_ws
    |     raise ConnectionClosedError("TCP socket closed")
    | kr8s._exceptions.ConnectionClosedError: TCP socket closed
    +------------------------------------

在调度程序 Pod 日志中还说

distributed.comm.tcp - 信息 - 来自 tcp://127.0.0.1:51484 的连接在握手完成之前关闭

+ '[' '' ']'
+ '[' '' == true ']'
+ CONDA_BIN=/opt/conda/bin/conda
+ '[' -e /opt/app/environment.yml ']'
+ echo 'no environment.yml'
+ '[' '' ']'
+ '[' '' ']'
+ exec dask-scheduler
no environment.yml
/opt/conda/lib/python3.10/site-packages/distributed/cli/dask_scheduler.py:142: FutureWarning: dask-scheduler is deprecated and will be removed in a future release; use `dask scheduler` instead
  warnings.warn(
2023-11-23 12:35:29,548 - distributed.scheduler - INFO - -----------------------------------------------
2023-11-23 12:35:30,422 - distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
2023-11-23 12:35:30,473 - distributed.scheduler - INFO - State start
2023-11-23 12:35:30,478 - distributed.scheduler - INFO - -----------------------------------------------
2023-11-23 12:35:30,479 - distributed.scheduler - INFO -   Scheduler at:     tcp://10.12.0.35:8786
2023-11-23 12:35:30,480 - distributed.scheduler - INFO -   dashboard at:  http://10.12.0.35:8787/status
2023-11-23 12:35:30,480 - distributed.scheduler - INFO - Registering Worker plugin shuffle
2023-11-23 12:41:27,053 - distributed.comm.tcp - INFO - Connection from tcp://127.0.0.1:51484 closed before handshake completed
2023-11-23 12:41:54,805 - distributed.scheduler - INFO - Receive client connection: Client-adc9e3c2-89fd-11ee-8284-0242ac120003
2023-11-23 12:41:54,806 - distributed.core - INFO - Starting established connection to tcp://127.0.0.1:38964
(base) root@1675432b9888:/workspaces/trading_bot# kubectl delete daskclusters example

我尝试增加 InitialDelaySeconds 并检查版本是否匹配,但这没有帮助。在网上找不到有关此错误的更多信息。

kubernetes google-kubernetes-engine dask dask-distributed dask-kubernetes
1个回答
0
投票

这只是一个吵闹的警告,不应阻止您的代码工作。

导致警告的错误现已修复,因此升级到新版本应该可以解决此问题。

© www.soinside.com 2019 - 2024. All rights reserved.