Openpyxl - 在工作簿之间复制和粘贴数据时删除重复项

问题描述 投票:0回答:1

我的脚本从一本工作簿复制到另一本工作簿,并根据值对它们进行排序。我正在尝试找到一种方法来删除重复项。我尝试使用

if
语句来检查目标工作簿中是否已存在数据,但它无法正常工作。我哪里错了?

from openpyxl import load_workbook
from openpyxl import Workbook

wb = load_workbook('testData.xlsx')
wb2 = load_workbook('testTemplate.xlsx')

ws = wb.worksheets[0]
mr = ws.max_row

ws2 = wb2.worksheets[0]
A = ws2.max_row
B = ws2.max_row
C = ws2.max_row

ws2values = set()

for row in ws.iter_rows(min_row = 2, min_col = 1, max_row = mr, max_col = 2):
    for cell in row:
        if cell.value == "A":
            if ws2.cell(row = A + 1, column = 1).value in ws2values:
                pass
            else:
                ws2.cell(row = A + 1, column = 1).value = (cell.offset(column = + 1).value)
                A += 1

        elif cell.value == "B":
            if ws2.cell(row = B + 1, column = 1).value in ws2values:
                pass
            else:
                ws2.cell(row = B + 1, column = 1).value = (cell.offset(column = + 1).value)
                B += 1

        elif cell.value == "C":
            if ws2.cell(row = C + 1, column = 1).value in ws2values:
                pass
            else:
                ws2.cell(row = C + 1, column = 1).value = (cell.offset(column = + 1).value)
                C += 1

wb2.save('testTemplate.xlsx')
python pandas excel dataframe openpyxl
1个回答
1
投票

我在您的问题中没有看到

pandas
标签,但如果您感兴趣,您可以使用其中一些库函数来避免循环,加速转换并获得您正在寻找的相同结果。

import pandas as pd

cols_template= ["A", "B", "C"]

def concat_missingvals(df):
    out = pd.concat([df, pd.DataFrame(index=range(0, len(df)), columns=cols_template)],
                    ignore_index=True).dropna(how="all")
    return out

df = (
        pd.read_excel("testData.xlsx",
                      usecols=["Source", "Number"])
            .drop_duplicates()
            .assign(idx= lambda x: x.groupby("Source").cumcount())
            .pivot(index="Source", columns="idx")
            .transpose()
            .reset_index(drop=True)
            .rename_axis(None, axis=1)
            .pipe(concat_missingvals)
      )

# 输出:

print(df)

      A     B     C
0  10.1  10.2  10.3
1  10.4  10.5  10.6

然后您可以使用

pandas.DataFrame.to_excel
将结果数据框存储在新的 Excel 文件中。

with pd.ExcelWriter("testData_Retouche.xlsx") as writer:
    df.to_excel(writer, index=False, sheet_name="Result")
    col_idx = df.columns.get_loc('A') #Put the column name here
    writer.sheets['Result'].set_column(col_idx, col_idx, 10) #10 is the column width

# 使用的输入 (testData.xlsx) :

© www.soinside.com 2019 - 2024. All rights reserved.