以下代码:
import numpy as np, pandas as pd
import multiprocessing, itertools, timeit
from functools import partial
processes = 5 * multiprocessing.cpu_count()
print(f'processes: {processes}')
pool = multiprocessing.Pool(processes=processes)
def calc(x, y):
return x+y
def calc_all():
pairs = [[1,1], [2,2], [3,3]]
results = pool.map(calc, pairs)
print(results)
if __name__ == '__main__':
calc_all()
回来了:
File "/usr/local/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.12/multiprocessing/pool.py", line 114, in worker
task = get()
^^^^^
File "/usr/local/lib/python3.12/multiprocessing/queues.py", line 389, in get
return _ForkingPickler.loads(res)
^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't get attribute 'calc' on <module '__main__' from '/workspaces/calc.py'>
如果我仅将
main
移至单独的文件并导入 calc_all
,我仍然会收到相同的错误(当然模块名称不同)
如果我将
calc_all
和 main
移动到另一个模块并导入 calc
,它就可以正常工作。
我想了解为什么当两者都是顶级函数时会发生这种情况。有没有更好的方法来解决这个问题,而不是将部分模块移动到单独的文件中?
我无法复制它,我陷入了一个奇怪的循环。当问这样的问题时,你需要包括你的环境是什么:python 版本、操作系统等。
此外,生成 5*cpu_count 进程是没有意义的,这只会产生大量开销。
清理后效果很好:
from functools import partial
import multiprocessing, itertools, timeit
import numpy as np
import pandas as pd
def setup():
processes = multiprocessing.cpu_count()
print(f'processes: {processes}')
pool = multiprocessing.Pool(processes=processes)
return pool
def calc(args):
return sum(args)
def calc_all():
pool = setup()
pairs = [[1,1], [2,2], [3,3]]
results = pool.map(calc, pairs)
print(results)
if __name__ == '__main__':
calc_all()