几个excel文件后算法变得很慢

问题描述 投票:0回答:1

问题概述

我需要将多个.xlsx文件合并成工作表,其中每个工作表名称必须是文件名。


当前问题

下面的代码在几个文件之后变得很慢并且花费大量内存。


尝试过的解决方案

关闭Excel文件并删除数据框并手动运行gc,不起作用。

代码

import pandas as pd
import openpyxl
import os
import gc as gc

print("Copying sheets from multiple files to one file")

dir_input = 'D:/MeusProjetosJava/Importacao/'
dir_output = "Integrados/combined.xlsx"

cwd = os.path.abspath(dir_input)
files = os.listdir(cwd)

df_total = pd.DataFrame()
df_total.to_excel(dir_output) #create a new file
workbook=openpyxl.load_workbook(dir_output)
ss_sheet = workbook['Sheet1']
ss_sheet.title = 'TempExcelSheetForDeleting'
workbook.save(dir_output)


for file in files:                         # loop through Excel files
    if file.endswith('.xls') or file.endswith('.xlsx'):
        excel_file = pd.ExcelFile(cwd+"/"+file)
        sheets = excel_file.sheet_names
        for sheet in sheets:
            sheet_name = str(file.title())
            sheet_name = sheet_name.replace(".xlsx","").lower()
            sheet_name = sheet_name.removesuffix(".xlsx")

            print(file, sheet_name)

            df = excel_file.parse(sheet_name = sheet)
            with pd.ExcelWriter(dir_output,mode='a') as writer:
                df.to_excel(writer, sheet_name=f"{sheet_name}", index=False)
                del df

        excel_file.close()
        del excel_file
        sheets = None
        gc.collect()


workbook=openpyxl.load_workbook(dir_output)
std=workbook["TempExcelSheetForDeleting"]
workbook.remove(std)
workbook.save(dir_output)
print("all done")

** 参考资料 **

将多张Sheet合并成一个EXCEL

python-3.x pandas
1个回答
0
投票

我认为你的代码有点复杂并且创建了一些不必要的临时对象。我会首先尝试一种简单的方法,即使用 Pandas ExcelWriter,因此模板代码将是这样的。您的文件是否真的很大,导致内存问题?

# Don't like the dir_output name as its the final file output name
with pd.ExcelWriter(dir_output , mode='a') as writer:
    for file in files:                         
        if file.endswith('.xls') or file.endswith('.xlsx'):
            # get the name of the file in cur_sheet YOUR CODE
            df = pd.read_excel(file)
            df.to_excel(writer, sheet_name=cur_sheet)
© www.soinside.com 2019 - 2024. All rights reserved.