`concurrent.futures.ThreadPoolExecutor` 行为异常

问题描述 投票:0回答:1

因此,我正在开展一个项目,其中为我提供了一些图像和一些包含这些图像上下文信息的 Excel 工作表。现在,图像和 Excel 是日常数据读数。它们组织在这个文件夹结构中:

现在手头的任务:

  • 我需要检查这些日常图像记录,对图像进行语义分割,提取对象的高度-宽度-面积,然后将它们放入各自的 Excel 工作表中。就像下面这张图一样。创建的输出应如下组织:

我已经解决了这部分。但我的主要问题是,每天读数的图像处理、分割和特征提取大约需要 26-32 分钟。所以总共需要1.5-2小时左右。数据同时增长,我们每周读取三个读数。所以我写了一个多线程脚本,可以同时开始处理所有的日子。这将所有 3 天的数据总共花费的时间缩短为 26-32 分钟。

该脚本大部分时间都有效,但有时当我运行脚本时,

futures.append(executor.submit(processAndsegmentImages, day=day))
方法似乎在三天内都没有启动线程。我查了一下,这件事发生了。有时只启动线程 2 天,有时只启动一天。

这是我的代码:

def processAndsegmentImages(day):
    # doing image processing, segmentation, and other analysis
    return '{} images processing and segmentation completed'.format(day)

if __name__ == "__main__":
    start = time.time()
    rf = Roboflow(api_key="my_api_key")
    project = rf.workspace().project("my_project")
    model = project.version(1).model
    print('model loaded\n')
    import concurrent.futures
    days = ['day1', 'day2', 'day3']
    with concurrent.futures.ThreadPoolExecutor(max_workers=min(32, os. cpu_count() + 4)) as executor:
        futures = []
        for day in days:
            print('adding {} to executor'.format(day))
            futures.append(executor.submit(processAndsegmentImages, day=day))
        for future in concurrent.futures.as_completed(futures):
            print(future.result())
    end = time.time()
    tlapsed = end-start
    print('total time taken: {:.2f} minutes'.format(tlapsed/60))

想法输出应该是这样的:

loading Roboflow workspace...
loading Roboflow project...
model loaded

adding day1 to executor
adding day2 to executor
adding day3 to executor
created root directory: /home/arrafi/potato/segmentation_results/day2_segmentation_results
created root directory: /home/arrafi/potato/segmentation_results/day1_segmentation_results
created root directory: /home/arrafi/potato/segmentation_results/day3_segmentation_results
starting day1 images processing from /home/arrafi/potato/segmentation_results/day1_segmentation_results/day1_raw_images/
starting day3 images processing from /home/arrafi/potato/segmentation_results/day3_segmentation_results/day3_raw_images/
starting day2 images processing from /home/arrafi/potato/segmentation_results/day2_segmentation_results/day2_raw_images/

192 day1_images proccessed and saved at:  /home/arrafi/potato/segmentation_results/day1_segmentation_results/day1_images
created pred_images and json directory for day1 images

starting potato segmentation ... of 192 day1_images

192 day2_images proccessed and saved at:  /home/arrafi/potato/segmentation_results/day2_segmentation_results/day2_images
created pred_images and json directory for day2 images

starting potato segmentation ... of 192 day2_images

192 day3_images proccessed and saved at:  /home/arrafi/potato/segmentation_results/day3_segmentation_results/day3_images
created pred_images and json directory for day3 images

starting potato segmentation ... of 192 day3_images

................and some more outputs......................

大多数情况下,脚本都会成功运行,但有时我注意到脚本只启动该过程一两天。如下所示,仅在第 1 天和第 3 天开始:

loading Roboflow workspace...
loading Roboflow project...
model loaded

adding day1 to executor
adding day2 to executor
adding day3 to executor
created root directory: /home/arrafi/potato/segmentation_results/day1_segmentation_results
created root directory: /home/arrafi/potato/segmentation_results/day3_segmentation_results
starting day1 images processing from /home/arrafi/potato/segmentation_results/day1_segmentation_results/day1_raw_images/
starting day3 images processing from /home/arrafi/potato/segmentation_results/day3_segmentation_results/day3_raw_images/


192 day1_images proccessed and saved at:  /home/arrafi/potato/segmentation_results/day1_segmentation_results/day1_images
created pred_images and json directory for day1 images

starting potato segmentation ... of 192 day1_images

192 day3_images proccessed and saved at:  /home/arrafi/potato/segmentation_results/day3_segmentation_results/day3_images
created pred_images and json directory for day3 images

starting potato segmentation ... of 192 day3_images

................and some more outputs......................

谁能指出为什么会发生这种情况?我在调用

ThreadPoolExecutor
时做错了什么吗?我已经在网上搜索解决方案,但无法找出为什么会发生这种情况,因为到目前为止代码没有抛出任何错误。而且行为是如此随机。

python multithreading threadpool python-multithreading threadpoolexecutor
1个回答
0
投票

所以,我发现有时如果有另一个 python 程序正在运行或先前运行的残留线程会导致问题。因此,我在

multithreading
脚本的开头添加了一个小脚本,以重新启动
python
程序。我是这样解决的:

import os
import sys
import psutil
import logging

def restart_program():
    """
    Restarts the current program, with file objects and descriptors cleanup
    """
    try:
        p = psutil.Process(os.getpid())
        for handler in p.get_open_files() + p.connections():
            os.close(handler.fd)
    except Exception as e:
        logging.error(e)

    python = sys.executable
    os.execl(python, python, *sys.argv)

restart_program函数旨在重新启动当前的Python程序。它关闭与当前进程关联的文件对象和描述符,然后重新启动程序。这在某些情况下很有用,例如当您需要使用新设置或配置重新加载程序时。 我从这里得到了脚本:https://stackoverflow.com/a/33334183/13520498

© www.soinside.com 2019 - 2024. All rights reserved.