在当前进程完成其自举阶段之前,已尝试启动新进程

问题描述 投票:0回答:1

我是dask的新手,我发现有一个模块很容易实现并行化。我正在开发一个项目,我可以在一台机器上并行化一个循环as you can see here 。但是,我想转移到dask.distributed。我在上面的课程中应用了以下更改:

diff --git a/mlchem/fingerprints/gaussian.py b/mlchem/fingerprints/gaussian.py
index ce6a72b..89f8638 100644
--- a/mlchem/fingerprints/gaussian.py
+++ b/mlchem/fingerprints/gaussian.py
@@ -6,7 +6,7 @@ from sklearn.externals import joblib
 from .cutoff import Cosine
 from collections import OrderedDict
 import dask
-import dask.multiprocessing
+from dask.distributed import Client
 import time


@@ -141,13 +141,14 @@ class Gaussian(object):
         for image in images.items():
             computations.append(self.fingerprints_per_image(image))

+        client = Client()
         if self.scaler is None:
-            feature_space = dask.compute(*computations, scheduler='processes',
+            feature_space = dask.compute(*computations, scheduler='distributed',
                                          num_workers=self.cores)
             feature_space = OrderedDict(feature_space)
         else:
             stacked_features = dask.compute(*computations,
-                                            scheduler='processes',
+                                            scheduler='distributed',
                                             num_workers=self.cores)

             stacked_features = numpy.array(stacked_features)

这样做会生成此错误:

 File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

我尝试过添加if __name__ == '__main__':的不同方法,但没有任何成功。这可以是reproduced by running this example。如果有人能帮我解决这个问题,我将不胜感激。我不知道如何更改代码以使其工作。

谢谢。

编辑:例子是cu_training.py

python dask dask-distributed
1个回答
2
投票

Client命令启动新进程,因此它必须位于if __name__ == '__main__':块内,如SO question或此GitHub issue中所述

这与多处理模块相同

© www.soinside.com 2019 - 2024. All rights reserved.