因此,我正在开展一个项目,其中为我提供了一些图像和一些包含这些图像上下文信息的 Excel 工作表。现在,图像和 Excel 是日常数据读数。它们组织在这个文件夹结构中:
现在手头的任务:
我已经解决了这部分。但我的主要问题是,每天读数的图像处理、分割和特征提取大约需要 26-32 分钟。所以总共需要1.5-2小时左右。数据同时增长,我们每周读取三个读数。所以我写了一个多线程脚本,可以同时开始处理所有的日子。这将所有 3 天的数据总共花费的时间缩短为 26-32 分钟。
该脚本大部分时间都有效,但有时当我运行脚本时,
futures.append(executor.submit(processAndsegmentImages, day=day))
方法似乎在三天内都没有启动线程。我查了一下,这件事发生了。有时只启动线程 2 天,有时只启动一天。
这是我的代码:
def processAndsegmentImages(day):
# doing image processing, segmentation, and other analysis
return '{} images processing and segmentation completed'.format(day)
if __name__ == "__main__":
start = time.time()
rf = Roboflow(api_key="my_api_key")
project = rf.workspace().project("my_project")
model = project.version(1).model
print('model loaded\n')
import concurrent.futures
days = ['day1', 'day2', 'day3']
with concurrent.futures.ThreadPoolExecutor(max_workers=min(32, os. cpu_count() + 4)) as executor:
futures = []
for day in days:
print('adding {} to executor'.format(day))
futures.append(executor.submit(processAndsegmentImages, day=day))
for future in concurrent.futures.as_completed(futures):
print(future.result())
end = time.time()
tlapsed = end-start
print('total time taken: {:.2f} minutes'.format(tlapsed/60))
想法输出应该是这样的:
loading Roboflow workspace...
loading Roboflow project...
model loaded
adding day1 to executor
adding day2 to executor
adding day3 to executor
created root directory: /home/arrafi/potato/segmentation_results/day2_segmentation_results
created root directory: /home/arrafi/potato/segmentation_results/day1_segmentation_results
created root directory: /home/arrafi/potato/segmentation_results/day3_segmentation_results
starting day1 images processing from /home/arrafi/potato/segmentation_results/day1_segmentation_results/day1_raw_images/
starting day3 images processing from /home/arrafi/potato/segmentation_results/day3_segmentation_results/day3_raw_images/
starting day2 images processing from /home/arrafi/potato/segmentation_results/day2_segmentation_results/day2_raw_images/
192 day1_images proccessed and saved at: /home/arrafi/potato/segmentation_results/day1_segmentation_results/day1_images
created pred_images and json directory for day1 images
starting potato segmentation ... of 192 day1_images
192 day2_images proccessed and saved at: /home/arrafi/potato/segmentation_results/day2_segmentation_results/day2_images
created pred_images and json directory for day2 images
starting potato segmentation ... of 192 day2_images
192 day3_images proccessed and saved at: /home/arrafi/potato/segmentation_results/day3_segmentation_results/day3_images
created pred_images and json directory for day3 images
starting potato segmentation ... of 192 day3_images
................and some more outputs......................
大多数情况下,脚本都会成功运行,但有时我注意到脚本只启动该过程一两天。如下所示,仅在第 1 天和第 3 天开始:
loading Roboflow workspace...
loading Roboflow project...
model loaded
adding day1 to executor
adding day2 to executor
adding day3 to executor
created root directory: /home/arrafi/potato/segmentation_results/day1_segmentation_results
created root directory: /home/arrafi/potato/segmentation_results/day3_segmentation_results
starting day1 images processing from /home/arrafi/potato/segmentation_results/day1_segmentation_results/day1_raw_images/
starting day3 images processing from /home/arrafi/potato/segmentation_results/day3_segmentation_results/day3_raw_images/
192 day1_images proccessed and saved at: /home/arrafi/potato/segmentation_results/day1_segmentation_results/day1_images
created pred_images and json directory for day1 images
starting potato segmentation ... of 192 day1_images
192 day3_images proccessed and saved at: /home/arrafi/potato/segmentation_results/day3_segmentation_results/day3_images
created pred_images and json directory for day3 images
starting potato segmentation ... of 192 day3_images
................and some more outputs......................
谁能指出为什么会发生这种情况?我在调用
ThreadPoolExecutor
时做错了什么吗?我已经在网上搜索解决方案,但无法找出为什么会发生这种情况,因为到目前为止代码没有抛出任何错误。而且行为是如此随机。
所以,我发现有时如果有另一个 python 程序正在运行或先前运行的残留线程会导致问题。因此,我在
multithreading
脚本的开头添加了一个小脚本,以重新启动 python
程序。我是这样解决的:
import os
import sys
import psutil
import logging
def restart_program():
"""
Restarts the current program, with file objects and descriptors cleanup
"""
try:
p = psutil.Process(os.getpid())
for handler in p.get_open_files() + p.connections():
os.close(handler.fd)
except Exception as e:
logging.error(e)
python = sys.executable
os.execl(python, python, *sys.argv)
restart_program函数旨在重新启动当前的Python程序。它关闭与当前进程关联的文件对象和描述符,然后重新启动程序。这在某些情况下很有用,例如当您需要使用新设置或配置重新加载程序时。 我从这里得到了脚本:https://stackoverflow.com/a/33334183/13520498