Python tempfile.TemporaryDirectory() 清理因 PermissionError 和 NotADirectoryError 而崩溃

问题描述 投票:0回答:1

前提

我正在尝试通过 pdf2image 和 poppler 将一些 PDF 转换为图像,然后运行一些计算机视觉任务。

转换本身运行良好。

但是,转换会在转换时为 pdf 中的每个页面创建一些工件,我希望在函数结束时将其删除。为了促进这一点,我使用 tempfile.TemporaryDirectory()。该函数如下所示:

    with tempfile.TemporaryDirectory() as path:
        images_from_path: [Image] = convert_from_path(
                os.path.join(path_superfolder, "calibration_target.pdf"),
                size=(2480, 3508),
                output_folder=path, poppler_path=r'E:\poppler-22.04.0\Library\bin')
        if len(images_from_path) >= page:
            images_from_path[page - 1].save(os.path.join(path_superfolder, "result.jpg"))

问题

问题是,程序总是因以下错误而崩溃在转换 PDF 并将所需图像写入文件后

Traceback (most recent call last):
  File "C:\Python310\lib\shutil.py", line 617, in _rmtree_unsafe
    os.unlink(fullname)
PermissionError: [WinError 32] The process cannot access the file, because it is being used by another process: 'C:\\Users\\tobia\\AppData\\Local\\Temp\\tmp24c4bmzv\\bd76d834-672e-49fc-ac30-7751b7b660d0-01.ppm'
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Python310\lib\tempfile.py", line 843, in onerror
    _os.unlink(path)
PermissionError: [WinError 32] The process cannot access the file, because it is being used by another process: 'C:\\Users\\tobia\\AppData\\Local\\Temp\\tmp24c4bmzv\\bd76d834-672e-49fc-ac30-7751b7b660d0-01.ppm'

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "C:\Python310\lib\code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "E:\PyCharm 2022.2.3\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "E:\PyCharm 2022.2.3\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "D:\Dokumente\Uni\Informatik\BA_Thesis\tumexam-scheduling-codebase\generate_data.py", line 393, in <module>
    extract_calibration_page_as_image_from_pdf()
  File "D:\Dokumente\Uni\Informatik\BA_Thesis\tumexam-scheduling-codebase\generate_data.py", line 190, in extract_calibration_page_as_image_from_pdf
    tmp_dir.cleanup()
  File "C:\Python310\lib\tempfile.py", line 873, in cleanup
    self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors)
  File "C:\Python310\lib\tempfile.py", line 855, in _rmtree
    _shutil.rmtree(name, onerror=onerror)
  File "C:\Python310\lib\shutil.py", line 749, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "C:\Python310\lib\shutil.py", line 619, in _rmtree_unsafe
    onerror(os.unlink, fullname, sys.exc_info())
  File "C:\Python310\lib\tempfile.py", line 846, in onerror
    cls._rmtree(path, ignore_errors=ignore_errors)
  File "C:\Python310\lib\tempfile.py", line 855, in _rmtree
    _shutil.rmtree(name, onerror=onerror)
  File "C:\Python310\lib\shutil.py", line 749, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "C:\Python310\lib\shutil.py", line 600, in _rmtree_unsafe
    onerror(os.scandir, path, sys.exc_info())
  File "C:\Python310\lib\shutil.py", line 597, in _rmtree_unsafe
    with os.scandir(path) as scandir_it:
NotADirectoryError: [WinError 267] Directory name invalid: 'C:\\Users\\tobia\\AppData\\Local\\Temp\\tmp24c4bmzv\\bd76d834-672e-49fc-ac30-7751b7b660d0-01.ppm'

当单步执行清理例程时,一切看起来都很好,路径是正确的,它开始删除文件,直到某个时候内部路径变量变得混乱并且例程崩溃,因为显然文件不是目录。 对我来说,似乎竞争条件在这里引起了问题。

我已经尝试过的

  • 将函数重写为 不使用 with 而是 使用 tmp_dir.cleanup()
     显式调用例程
  • 仅创建目录,而不用转换工件填充它。 在这种情况下清理工作有效。
  • 临时文件的文档提到打开文件时发生权限错误。但是,这些文件仅在此函数中使用,如果这是导致错误的原因,我不确定文件仍然在哪里打开或哪个函数导致此错误。我当然怀疑是转换函数。
python temporary-files code-cleanup pdf2image
1个回答
1
投票

在进行更多实验并写下这个问题时,我找到了一个可行的解决方案:

    with tempfile.TemporaryDirectory() as path:
        images_from_path: [Image] = convert_from_path(
                os.path.join(path_superfolder, f"calibration_target_{exam_type}.pdf"),
                size=(2480, 3508),
                output_folder=path, poppler_path=r'E:\poppler-22.04.0\Library\bin')
        if len(images_from_path) >= page:
            images_from_path[page - 1].save(os.path.join(path_superfolder, "result.jpg"))
        images_from_path = []

似乎不知何故,例程在清理方面遇到了麻烦,因为转换后的图像实际上是由

pdf2image
创建的工件,并且仍然由我的数据结构保存。在隐式启动清理之前重置数据结构解决了问题。

如果有更好的方法来解决这个问题,请随时告诉我。

© www.soinside.com 2019 - 2024. All rights reserved.