如何使用Python提取Excel工作簿中链接的Excel文件？

Question

我有一个 Excel 工作簿，其中一些工作表包含保险计划详细信息的文本/表格中的数据，一些工作表包含详细信息的图像/屏幕截图，一些工作表包含链接的 Excel 文件（作为对象插入）。

使用Python如何迭代每个工作表，如果有链接的Excel文件我想将它们存储在另一个目录中？

我尝试了这段代码：

import os
import shutil
import pathlib
import zipfile
import openpyxl
import re

def extract_linked_excel_files(path, output_folder_name='extracted_excel_files'):
    """
    Extracts linked Excel files from an Excel file and stores them in a single folder.

    Args:
        path (pathlib.Path or str): Excel file path.
        output_folder_name (str): Name of the folder to store the extracted Excel files.
            Defaults to 'extracted_excel_files'.

    Returns:
        new_paths (list[pathlib.Path]): List of paths to the extracted Excel files.
    """
    # Convert path to pathlib.Path if it's a string
    if isinstance(path, str):
        path = pathlib.Path(path)

    # Check if the file has the '.xlsx' extension
    if path.suffix != '.xlsx':
        raise ValueError('Path must be an xlsx file')

    # Extract the filename (excluding the extension) using .stem
    name = path.stem

    # Create a new folder for the extracted Excel files
    output_folder = path.parent / output_folder_name
    output_folder.mkdir(exist_ok=True)  # Create folder if it doesn't exist

    # Open the workbook
    workbook = openpyxl.load_workbook(path, read_only=True)

    # List to store the paths of the extracted Excel files
    new_paths = []

    try:
        # Iterate through all sheets in the workbook
        for sheet_name in workbook.sheetnames:
            sheet = workbook[sheet_name]

            # Iterate through all rows in the sheet
            for row in sheet.iter_rows():
                # Iterate through all cells in the row
                for cell in row:
                    # Check if the cell contains a hyperlink
                    if cell.hyperlink:
                        # Check if the hyperlink is an Excel file
                        hyperlink_target = cell.hyperlink.target
                        if re.search(r'\.xlsx$', hyperlink_target, re.IGNORECASE):
                            linked_file_path = hyperlink_target.replace('/', os.path.sep)

                            # Construct paths for the linked Excel file and the new destination
                            linked_file_full_path = path.parent / linked_file_path
                            new_path = output_folder / linked_file_path

                            # Copy the linked Excel file to the output folder
                            shutil.copy(linked_file_full_path, new_path)

                            # Store the new path in the list
                            new_paths.append(new_path)

    finally:
        # Close the workbook
        workbook.close()

    # Return the list of paths to the extracted Excel files
    return new_paths

然而这只是返回一个空的工作簿`

Answer 1

您的代码似乎走在正确的轨道上，但可能有几个原因导致它找不到链接的 Excel 文件。以下是一些排除故障和改进代码的建议：

检查超链接目标：确保 Excel 工作表中的超链接目标指向正确的 Excel 文件。验证超链接确实链接到扩展名为 .xlsx 的 Excel 文件。
使用完整路径进行比较：比较路径时，最好使用完整路径以避免与相对路径相关的问题。更新比较以在路径上使用resolve()方法：

如果 linked_file_full_path.resolve() == path.resolve(): # 这是原来的Excel文件，跳过继续
调试输出：添加一些打印语句以帮助调试代码。例如，打印 linked_file_full_path 和 new_path 来查看它们是否构建正确：

print(f"链接文件：{linked_file_full_path}") print(f"新路径：{new_path}")

这可以帮助识别路径构建的任何问题。

处理不同的路径分隔符：由于您使用的 Excel 文件可能包含以正斜杠 (/) 作为分隔符的超链接，因此您应该使用 os.path.sep 规范化路径分隔符：

linked_file_path = hyperlink_target.replace('/', os.path.sep)

确保您使用适合您的操作系统的正确路径分隔符。

检查超链接类型：确保 Excel 工作表中的超链接是 URL 类型。如果它们是文档类型，则 cell.hyperlink.target 可能不会直接指向该文件。

进行这些调整后，您应该能够识别并纠正任何阻止提取链接的 Excel 文件的问题。如果问题仍然存在，请打印附加信息进行调试，这将有助于查明问题。

如何使用Python提取Excel工作簿中链接的Excel文件？

问题描述投票：0回答：1

1个回答

最新问题

如何使用Python提取Excel工作簿中链接的Excel文件？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1