我正在尝试异步和线程,以找出在处理大量文件时应该选择哪种机制。实验很简单,我只想读取文件并在每行末尾添加一些字符并将这些行写回文件。以下是我的代码:
async def process_files(filenames):
for filename in filenames:
await process_file(filename)
async def process_file(filename):
i = 0
#await asyncio.sleep(0.1)
with open(filename, 'r+') as f:
lines = f.readlines()
for line in lines:
line += f"{i}\n"
f.write(line)
i += 1
def regular_process_file(filename):
i = 0
with open(filename, 'r+') as f:
lines = f.readlines()
for line in lines:
line += f"{i}\n"
f.write(line)
i += 1
#asyncio call
asyncio.run(process_files(filenames))
#multi-threading
threads = []
for file in filenames:
t = threading.Thread(target=regular_process_file, args=(file,))
t.start()
threads.append(t)
for t in threads:
t.join()
#multi-processing
processes = []
for file in filenames:
proc = Process(target=regular_process_file, args=(file,))
processes.append(proc)
proc.start()
for p in processes:
p.join()
当我对上述三种机制(asyncio、线程和多处理)进行计时时,我发现 asyncio 的执行时间接近多处理,而线程的执行时间约为其他两种机制的 2/3。我认为 asyncio 应该达到与线程类似的效率,因为 GIT 只允许一个线程在任何时间点执行。或者 python 多线程是否会利用所有处理器(例如,每个处理器一个线程,但如果是这种情况,多线程不应该具有与多处理类似的效率吗)?
谢谢!
您的异步代码未使用 asyncio。此外,Python 的默认 open 方法是阻塞调用。如果你想异步打开文件,可以使用 aiofiles
首先使用 pip install aiofiles
安装库这是 process_file 函数的异步版本:
import aiofiles
import asyncio
async def process_files(filenames):
await asyncio.gather(*[process_file(filename) for filename in filenames])
async def process_file(filename):
i = 0
async with aiofiles.open(filename, 'r+') as f:
lines = await f.readlines()
async with aiofiles.open(filename, 'w') as fw:
for line in lines:
line += f"{i}\n"
await fw.write(line)
i += 1