在我的例子中处理多个文件时,为什么线程比异步快得多

问题描述 投票:0回答:1

我正在尝试异步和线程,以找出在处理大量文件时应该选择哪种机制。实验很简单,我只想读取文件并在每行末尾添加一些字符并将这些行写回文件。以下是我的代码:

async def process_files(filenames):
    for filename in filenames:
        await process_file(filename)

async def process_file(filename):
    i = 0
    #await asyncio.sleep(0.1)
    with open(filename, 'r+') as f:
        lines = f.readlines()
        for line in lines:
            line += f"{i}\n"
            f.write(line)
            i += 1

def regular_process_file(filename):
    i = 0
    with open(filename, 'r+') as f:
        lines = f.readlines()
        for line in lines:
            line += f"{i}\n"
            f.write(line)
            i += 1

#asyncio call  
asyncio.run(process_files(filenames))

#multi-threading
threads = []
for file in filenames:
    t = threading.Thread(target=regular_process_file, args=(file,))
    t.start()
    threads.append(t)
for t in threads:
    t.join()

#multi-processing
processes = []
for file in filenames:
   proc = Process(target=regular_process_file, args=(file,))
    processes.append(proc)
    proc.start()

for p in processes:
    p.join()

当我对上述三种机制(asyncio、线程和多处理)进行计时时,我发现 asyncio 的执行时间接近多处理,而线程的执行时间约为其他两种机制的 2/3。我认为 asyncio 应该达到与线程类似的效率,因为 GIT 只允许一个线程在任何时间点执行。或者 python 多线程是否会利用所有处理器(例如,每个处理器一个线程,但如果是这种情况,多线程不应该具有与多处理类似的效率吗)?

谢谢!

python-3.x python-asyncio python-multiprocessing python-multithreading
1个回答
0
投票

您的异步代码未使用 asyncio。此外,Python 的默认 open 方法是阻塞调用。如果你想异步打开文件,可以使用 aiofiles

首先使用 pip install aiofiles

安装库

这是 process_file 函数的异步版本:

import aiofiles
import asyncio

async def process_files(filenames):
    await asyncio.gather(*[process_file(filename) for filename in filenames])


async def process_file(filename):
    i = 0
    async with aiofiles.open(filename, 'r+') as f:
        lines = await f.readlines()
        async with aiofiles.open(filename, 'w') as fw:
            for line in lines:
                line += f"{i}\n"
                await fw.write(line)
                i += 1
© www.soinside.com 2019 - 2024. All rights reserved.