我具有下面的matching()
函数,并且带有for循环,我向其中传递了一个大的generator(unique_combinations)
。
需要花几天的时间,所以我想对循环中的元素使用多处理来加快处理速度,但是我不知道该怎么做。
我发现一般很难理解concurrent.futures
背后的逻辑。
results = []
match_score = []
def matching():
for pair in unique_combinations:
if fuzz.ratio(pair[0], pair[1]) > 90:
results.append(pair)
match_score.append(fuzz.ratio(pair[0], pair[1]))
def main():
executor = ProcessPoolExecutor(max_workers=3)
task1 = executor.submit(matching)
task2 = executor.submit(matching)
task3 = executor.submit(matching)
if __name__ == '__main__':
main()
print(results)
print(match_score)
我认为这应该加快执行速度。
如果您已经在使用并发功能,最好的方法是使用IMO:
import concurrent.futures
def matching(pair):
fuzz_ratio = fuzz.ratio(pair[0], pair[1]) # only calculate this once
if fuzz_ratio > 90:
return pair, fuzz_ratio
else:
return None
def main():
unique_combinations = [(1, 2), (2, 3), (3, 4)]
with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor:
for result in executor.map(matching, unique_combinations, chunksize=100):
if result:
# handle the results somehow
results.append(result[0])
match_score.append(results[1])
if __name__ == '__main__':
main()