在Python中使用池进行多处理（Windows）

Question

我必须以平行的方式进行学习，以便更快地运行它。我是python中多处理库的新手，还不能使它成功运行。在这里，我正在调查每对（起源，目标）是否仍然在我研究的各个帧之间的某些位置。几点：

这是一个功能，我想要更快地运行（这不是几个过程）。
该过程随后进行;这意味着每一帧都与前一帧进行比较。
此代码是原始代码的一种非常简单的形式。代码输出一个residece_list。
我正在使用Windows操作系统。

有人可以检查代码（多处理部分）并帮助我改进它以使其工作。谢谢。

import numpy as np
from multiprocessing import Pool, freeze_support


def Main_Residence(total_frames, origin_list, target_list):
    Previous_List = {}
    residence_list = []

    for frame in range(total_frames):     #Each frame

        Current_List = {}               #Dict of pair and their residence for frames
        for origin in range(origin_list):

            for target in range(target_list):
                Pair = (origin, target)         #Eahc pair

                if Pair in Current_List.keys():     #If already considered, continue
                    continue
                else:
                    if origin == target:
                        if (Pair in Previous_List.keys()):            #If remained from the previous frame, add residence
                            print "Origin_Target remained: ", Pair
                            Current_List[Pair] = (Previous_List[Pair] + 1)
                        else:                                           #If new, add it to the current
                            Current_List[Pair] = 1

        for pair in Previous_List.keys():                        #Add those that exited from residence to the list
            if pair not in Current_List.keys():
                residence_list.append(Previous_List[pair])

        Previous_List = Current_List
    return residence_list

if __name__ == '__main__':
    pool = Pool(processes=5)
    Residence_List = pool.apply_async(Main_Residence, args=(20, 50, 50))
    print Residence_List.get(timeout=1)
    pool.close()
    pool.join()
    freeze_support()

Residence_List = np.array(Residence_List) * 5

Answer 1

多处理在您在此处呈现的上下文中没有意义。您正在创建五个子进程（以及属于池的三个线程，管理工作程序，任务和结果）以执行一次一个函数。所有这些都需要花费在系统资源和执行时间上，而你的四个工作流程根本不做任何事情。多处理不会加速函数的执行。特定示例中的代码总是比在主进程中明确执行Main_Residence(20, 50, 50)慢。

要使多处理在这样的上下文中有意义，您的工作需要分解为一组可以并行处理的同质任务，其结果可能在以后合并。

作为示例（不一定是好的），如果要计算一系列数字的最大素数因子，可以将计算任何特定数字的因子的任务委派给池中的工作者。然后，几个工人将并行进行这些单独的计算：

def largest_prime_factor(n):
    p = n
    i = 2
    while i * i <= n:
        if n % i:
            i += 1
        else:
            n //= i
    return p, n


if __name__ == '__main__':
    pool = Pool(processes=3)
    start = datetime.now()
    # this delegates half a million individual tasks to the pool, i.e. 
    # largest_prime_factor(0), largest_prime_factor(1), ..., largest_prime_factor(499999)      
    pool.map(largest_prime_factor, range(500000))
    pool.close()
    pool.join()
    print "pool elapsed", datetime.now() - start
    start = datetime.now()
    # same work just in the main process
    [largest_prime_factor(i) for i in range(500000)]
    print "single elapsed", datetime.now() - start

输出：

pool elapsed 0:00:04.664000
single elapsed 0:00:08.939000

（largest_prime_factor函数取自@Stefan的this answer）

正如您所看到的，池的速度大约是单个进程执行相同工作量的两倍，同时在三个并行的进程中运行。这是由于多处理/池引入的开销。

所以，你说过你的例子中的代码已经简化了。您必须分析原始代码，看它是否可以分解为可以传递到池中进行处理的同质任务。如果可以，使用多处理可能会帮助您加快程序的速度。如果没有，多处理可能会花费你的时间，而不是保存它。

编辑：因为您询问了有关代码的建议。我几乎无法谈论你的功能。你自己说过这只是一个简单的例子来提供一个MCVE（非常感谢顺便说一句！大多数人都没有花时间将代码剥离到最低限度）。无论如何，代码审查的请求更适合在Codereview。

使用可用的任务委派方法稍微玩一下。在我的主要因素例子中，使用apply_async带来了巨大的惩罚。与使用map相比，执行时间增加了九倍。但我的例子是使用一个简单的迭代，你的每个任务需要三个参数。这可能是starmap的一个案例，但这只适用于Python 3.3。无论如何，任务数据的结构/性质基本上决定了使用的正确方法。我通过多处理您的示例函数进行了一些q＆d测试。输入定义如下：

inp = [(20, 50, 50)] * 5000  # that makes 5000 tasks against your Main_Residence

我在Python 3.6中运行了三个子进程，你的函数没有改变，除了删除print语句（I / O代价很高）。我使用了starmap，apply，starmap_async和apply_async，并且每次都在迭代结果，以解释异步结果中的阻塞get()。这是输出：

starmap elapsed 0:01:14.506600
apply elapsed 0:02:11.290600
starmap async elapsed 0:01:27.718800
apply async elapsed 0:01:12.571200
# btw: 5k calls to Main_Residence in the main process looks as bad 
# as using apply for delegation
single elapsed 0:02:12.476800

正如您所看到的，执行时间不同，尽管所有四种方法都执行相同的工作量;你选择的apply_async似乎是最快的方法。

编码风格。你的代码看起来很......非常规:)你使用Capitalized_Words_With_Underscore作为你的名字（函数和变量名），这在Python中几乎是禁忌。此外，将名称Previous_List分配给字典是......有问题的。看看PEP 8，尤其是Naming Conventions部分，看看Python普遍接受的编码风格。

从您的print看起来的方式判断，您仍在使用Python 2.我知道在企业或机构环境中，有时候您可以使用它们。仍然，请记住clock for Python 2 is ticking

在Python中使用池进行多处理（Windows）

问题描述投票：1回答：1

1个回答

最新问题

在Python中使用池进行多处理（Windows）

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1