包含许多工作表的大型 Excel 文件与 python 代码合并

问题描述 投票:0回答:1

我正在尝试将多个 Excel 文件(.xlsx)合并为一个 .xlsx 文件。每个文件大约有 100 张。合并 Excel 文件时,我想要单独的这些工作表。

例如,“excel_1.xlsx”的工作表名称为“1”到“100”,“excel_2.xlsx”的工作表名称为“101”到“200”。然后,当我合并两个 Excel 文件时,新的 Excel 文件“excel_merged.xlsx”应该有 200 个工作表,工作表名称从“1”到“200”。

下面是我写的代码,但是我发现合并完成需要太多时间。

import pandas as pd
import os
import time

# read excel file
excel_files = []
file_name = 'output_240326_pivot_{i}.xlsx'
for i in range(64):
    file_path = f'./our_data/240403/' + file_name.format(i=i)
    excel_files.append(file_path)

# final merging path
final_excel_path = './our_data/240403/240403_final_for_tgt.xlsx'

# start time count for total merging
start_time = time.time()

with pd.ExcelWriter(final_excel_path) as writer:
    # for each excel file
    for file in excel_files:
        # start time count
        file_start_time = time.time()
    
        # Read every sheets in excel file
        xls = pd.ExcelFile(file)
        for sheet_name in xls.sheet_names:
            df = pd.read_excel(xls, sheet_name)
            # Save each sheet seperately using ExcelWriter
            df.to_excel(writer, sheet_name=sheet_name, index=False)
    
        # Time spent for merging current file
        file_end_time = time.time()
        print(f"Completed merging {os.path.basename(file)} in {file_end_time - file_start_time:.2f} seconds.")

# Total time for merging
end_time = time.time()
print(f"All sheets combined into {final_excel_path} in {end_time - start_time:.2f} seconds.")

当我当前运行此代码时,合并所花费的时间似乎像斐波那契顺序一样增加。这是为什么?

Completed merging output_240326_pivot_0.xlsx in 0.64 seconds.
Completed merging output_240326_pivot_1.xlsx in 1.15 seconds.
Completed merging output_240326_pivot_2.xlsx in 2.36 seconds.
Completed merging output_240326_pivot_3.xlsx in 3.98 seconds.
Completed merging output_240326_pivot_4.xlsx in 6.14 seconds.
Completed merging output_240326_pivot_5.xlsx in 8.80 seconds.
Completed merging output_240326_pivot_6.xlsx in 12.24 seconds.
Completed merging output_240326_pivot_7.xlsx in 16.37 seconds.
Completed merging output_240326_pivot_8.xlsx in 21.27 seconds.
Completed merging output_240326_pivot_9.xlsx in 27.38 seconds.
Completed merging output_240326_pivot_10.xlsx in 31.95 seconds.
Completed merging output_240326_pivot_11.xlsx in 38.43 seconds.
Completed merging output_240326_pivot_12.xlsx in 45.42 seconds.
Completed merging output_240326_pivot_13.xlsx in 53.47 seconds.
Completed merging output_240326_pivot_14.xlsx in 61.85 seconds.
Completed merging output_240326_pivot_15.xlsx in 71.27 seconds.
Completed merging output_240326_pivot_16.xlsx in 81.11 seconds.
Completed merging output_240326_pivot_17.xlsx in 91.49 seconds.
Completed merging output_240326_pivot_18.xlsx in 102.94 seconds.

p.s 我也从 stackoverflow 搜索了相关问题,但找不到适合我的情况的合适答案。如果我将每个工作表拆分成单独的 .csv 文件并将它们合并到一个 Excel 文件工作表中,会更快吗?

python pandas excel merge
1个回答
0
投票

随着操作变得密集,复杂性似乎也在增加。有一个嵌套的 for 循环,并且您在 for 循环中打开相同的文件。您可以优化此操作,将所有文件中的所有工作表整理到单个数据框中,然后最后将其转换为 Excel。请参阅示例代码:

all_sheets = []
xls = pd.ExcelFile(file)
for sheet_name in xls.sheet_names:
   df = pd.read_excel(xls, sheet_name)
   all_sheets.append(df)
merged_df = pd.concat(all_sheets)
merged_df.to_excel(writer, sheet_name="Merged_Sheet", index=False)

© www.soinside.com 2019 - 2024. All rights reserved.