我正在尝试通过配置在Power PC上运行以下代码:
Operating System: Red Hat Enterprise Linux Server 7.6 (Maipo)
CPE OS Name: cpe:/o:redhat:enterprise_linux:7.6:GA:server
Kernel: Linux 3.10.0-957.21.3.el7.ppc64le
Architecture: ppc64-le
具有20个核心的单节点本地集群。
import os, subprocess
from timeit import default_timer as timer
from dask.distributed import Client, LocalCluster, fire_and_forget, as_completed
def run_client(n_workers):
files = []
for dirpaths, dirnames, filenames in os.walk('cap_logs/'):
if not dirnames:
files.extend([os.path.join(dirpaths, file) for file in filenames])
def parser(file):
val = subprocess.run(['./test.sh', file], stdout=subprocess.PIPE)
return val.stdout.decode()
cluster = LocalCluster(n_workers=n_workers, dashboard_address=None)
with Client(cluster) as client:
futures = []
files = client.scatter(files)
futures = client.map(parser, files)
results = [future.result() for future in as_completed(futures)]
del futures
cluster.close()
workers = [20, 18, 16, 14, 12, 10, 8, 7, 6, 5, 4, 3, 2, 1]
times = {}
for n_workers in workers:
tic = timer()
run_client(n_workers)
toc = timer()
time = toc - tic
times[n_workers] = round(time, 2)
如果n_workers相对于内核总数(即20)相对较小(<15),效果很好,但是当我将n_workers设置为> 15时,会出现以下错误:
OSError: Timed out trying to connect to 'tcp://127.0.0.1:34487' after 10 s: connect() didn't finish in time
我很惊讶您看到如此之少的工人超时。但是,即使这样,您可能仍想尝试为dask config的connect
部分提供更长的distributed.timeouts
超时:
distributed:
comm:
timeouts:
connect: 10s # time before connecting fails
tcp: 30s # time before calling an unresponsive connection dead
full default config可以在源代码中找到。