多处理池 python

Question

我想散列共享文件夹的所有文件（所有子文件夹中的所有文件）。我想用多处理池来做这个，但我不知道如何调用进行计算的函数。下面的代码什么都不打印（没有错误），我不能使用 pool.map 因为我一次将一个文件传递给函数。你能帮助我吗？非常感谢。

from multiprocessing import Pool
import os
import hashlib


def hashfile(fp):
    with open(fp, 'rb') as file:
        bytes = file.read()
        hash_file = hashlib.sha1(bytes).hexdigest()
        print(fp, hash_file)


def testProcessPool(share_input):
    with Pool(processes=2) as pool:
        for root, dirs, files in os.walk(share_input):
            for f in files:
                pool.apply_async(hashfile, os.path.join(root, f))


if __name__ == '__main__':
    testProcessPool('/Users/gm/Desktop/test')

Answer 1

我会稍微更改代码。

创建一个生成文件的生成器（使用
```
os.path.join
```
）
使用
```
Pool.imap_unordered
```
调用哈希计算函数

import os
import hashlib
from multiprocessing import Pool


def hashfile(fp):
    with open(fp, 'rb') as f:
        hash_file = hashlib.sha1(f.read()).hexdigest()
    return fp, hash_file


def get_files(inp):
    for root, dirs, files in os.walk(inp):
        for f in files:
            yield os.path.join(root, f)

def testProcessPool(share_input):
    with Pool(processes=2) as pool:
        for r in pool.imap_unordered(hashfile, get_files(share_input)):
            print(r)

if __name__ == '__main__':
    testProcessPool('/usr/')

印花：


...

('/usr/lib/x86_64-linux-gnu/libsvn_client-1.so.1.0.0', '86a1d993385993d0bf09346502295c1663e2e6e8')
('/usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.3', '50966c2c9147b1e4af587599de81551d7223e152')
('/usr/lib/x86_64-linux-gnu/libperl.so.5.32', 'd828cc5f429c1e2b124ebb50ee953ee9dae99c45')
('/usr/lib/x86_64-linux-gnu/libsvn_subr-1.so.1.0.0', '200dedc1dfd66d7cc13a787f023678fc055a3658')
('/usr/lib/x86_64-linux-gnu/libedit.so.2.0.63', '0a314040f5dab8f7de474ba5b265c7e0bc7c75de')

...

Answer 2

好的，谢谢。我正在做所有这些来查找重复文件（这就是我使用字典的原因，请参见下面的代码）并且我必须尽可能快地编写代码。我有很多份额，有的甚至有150GB。对于线程，我并没有提高多少性能，我希望进程能够成功。

def testProcessPool(share_input):
    diz = dict()
    with Pool(processes=4) as pool:
        for r in pool.imap_unordered(hashfile, get_files(share_input)):
            if not r[1] in diz.keys():
                diz[r[1]] = r[0]
            else:
                diz[r[1]] = diz[r[1]] + '|' + r[0]
    for k, v in diz.items():
        print(k, v)

多处理池 python

问题描述投票：0回答：2

2个回答

最新问题

多处理池 python

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2