为什么使用 openpyxl 且 read_only=True 打开和关闭 xlsx 文件工作簿后会有文件句柄?

问题描述 投票:0回答:1

我可以寻求您的帮助来解决 Python 包 openpyxl 版本 3.0.7 的文件句柄问题吗?如果 load_workbook 'read_only' 参数设置为 False,则不会发生这种情况。仅当设置为 True 时才会发生。如果您多次调用这些 load_workbook 和 close 函数(同一文件),最终会发生这种情况。我相信我缩小了打开文件句柄的源代码的范围。问题是它没有被删除。多次打开/关闭同一工作簿后调用

shutil.move(source_file, target_file)
时会引发异常。我将尝试通过打开和关闭一次来避免这种情况,但我需要构建一个数据结构来存储所有内容,因为工作簿有 23 个工作表。但这似乎是一个问题。如果我设置read_only=False,性能很糟糕!所以跑起来大约需要一个小时以上。

import openpyxl # openpyxl 3.0.7
# repeat open/close multiple times
wb_source = openpyxl.load_workbook(file_path, read_only=True)
ws_source = wb_source[worksheet_name]
for row in ws_source.rows:
   for cell in # cells
      # ...
wb_source.close()
shutil.move(file_path, file_path_archive)

这里是例外:

Traceback (most recent call last):
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.2544.0_x64__qbz5n2kfra8p0\lib\shutil.py", line 566, in move
    os.rename(src, real_dst)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Python\\...file.xlsx' -> 'C:\\Python\\...file.xlsx'

.\venv\Lib\site-packages\openpyxl\reader\excel.py

# Python stdlib imports
from zipfile import ZipFile, ZIP_DEFLATED, BadZipfile
from sys import exc_info
from io import BytesIO
import os.path
import warnings
# ...
    if self.read_only:
        ws = ReadOnlyWorksheet(self.wb, sheet.name, rel.target, self.shared_strings)
        ws.sheet_state = sheet.state
        self.wb._sheets.append(ws)
        continue
    else:
        fh = self.archive.open(rel.target)
        ws = self.wb.create_sheet(sheet.name)
        ws._rels = rels
        ws_parser = WorksheetReader(ws, fh, self.shared_strings, self.data_only)
        ws_parser.bind_all()

.\venv\Lib\site-packages\openpyxl\packaging\manifest.py

mimetypes = MimeTypes()

C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.2544.0_x64__qbz5n2kfra8p0\Lib\mimetypes.py

class MimeTypes:

    def init(files=None):
        global suffix_map, types_map, encodings_map, common_types
        global inited, _db
        inited = True    # so that MimeTypes.__init__() doesn't call us again
    
        if files is None or _db is None:
            db = MimeTypes()
            if _winreg:
                db.read_windows_registry()
    
            if files is None:
                files = knownfiles
            else:
                files = knownfiles + list(files)
        else:
            db = _db
    
        for file in files:
            if os.path.isfile(file):
                db.read(file) # <-------------------------------------- read file
        encodings_map = db.encodings_map
        suffix_map = db.suffix_map
        types_map = db.types_map[True]
        common_types = db.types_map[False]
        # Make the DB a global variable now that it is fully initialized
        _db = db
def read(self, filename, strict=True):
    """
    Read a single mime.types-format file, specified by pathname.

    If strict is true, information will be added to
    list of standard types, else to the list of non-standard
    types.
    """
    with open(filename, encoding='utf-8') as fp:
        self.readfp(fp, strict)
def readfp(self, fp, strict=True):
    """
    Read a single mime.types-format file.

    If strict is true, information will be added to
    list of standard types, else to the list of non-standard
    types.
    """
    while 1:
        line = fp.readline()
        if not line:
            break
        words = line.split()
        for i in range(len(words)):
            if words[i][0] == '#':
                del words[i:]
                break
        if not words:
            continue
        type, suffixes = words[0], words[1:]
        for suff in suffixes:
            self.add_type(type, '.' + suff, strict)
python openpyxl
1个回答
0
投票

3.0.7和3.1.2版本都有这个问题。现在我只打开文件一次并读取所有数据,最后关闭。仅执行一次此操作并不能消除文件句柄问题,但因为我只打开一次,所以这并不是一个大问题。当我这样做时,性能显着提高。

该框架存在此问题的原因是因为 xlsx 文件从技术上讲是一个 zip 文件,在场景下有多个文件,因此从存档中打开多个文件。

© www.soinside.com 2019 - 2024. All rights reserved.