我的脚本从一本工作簿复制到另一本工作簿,并根据值对它们进行排序。我正在尝试找到一种方法来删除重复项。我尝试使用
if
语句来检查目标工作簿中是否已存在数据,但它无法正常工作。我哪里错了?
from openpyxl import load_workbook
from openpyxl import Workbook
wb = load_workbook('testData.xlsx')
wb2 = load_workbook('testTemplate.xlsx')
ws = wb.worksheets[0]
mr = ws.max_row
ws2 = wb2.worksheets[0]
A = ws2.max_row
B = ws2.max_row
C = ws2.max_row
ws2values = set()
for row in ws.iter_rows(min_row = 2, min_col = 1, max_row = mr, max_col = 2):
for cell in row:
if cell.value == "A":
if ws2.cell(row = A + 1, column = 1).value in ws2values:
pass
else:
ws2.cell(row = A + 1, column = 1).value = (cell.offset(column = + 1).value)
A += 1
elif cell.value == "B":
if ws2.cell(row = B + 1, column = 1).value in ws2values:
pass
else:
ws2.cell(row = B + 1, column = 1).value = (cell.offset(column = + 1).value)
B += 1
elif cell.value == "C":
if ws2.cell(row = C + 1, column = 1).value in ws2values:
pass
else:
ws2.cell(row = C + 1, column = 1).value = (cell.offset(column = + 1).value)
C += 1
wb2.save('testTemplate.xlsx')
我在您的问题中没有看到
pandas
标签,但如果您感兴趣,您可以使用其中一些库函数来避免循环,加速转换并获得您正在寻找的相同结果。
import pandas as pd
cols_template= ["A", "B", "C"]
def concat_missingvals(df):
out = pd.concat([df, pd.DataFrame(index=range(0, len(df)), columns=cols_template)],
ignore_index=True).dropna(how="all")
return out
df = (
pd.read_excel("testData.xlsx",
usecols=["Source", "Number"])
.drop_duplicates()
.assign(idx= lambda x: x.groupby("Source").cumcount())
.pivot(index="Source", columns="idx")
.transpose()
.reset_index(drop=True)
.rename_axis(None, axis=1)
.pipe(concat_missingvals)
)
print(df)
A B C
0 10.1 10.2 10.3
1 10.4 10.5 10.6
pandas.DataFrame.to_excel
将结果数据框存储在新的 Excel 文件中。
with pd.ExcelWriter("testData_Retouche.xlsx") as writer:
df.to_excel(writer, index=False, sheet_name="Result")
col_idx = df.columns.get_loc('A') #Put the column name here
writer.sheets['Result'].set_column(col_idx, col_idx, 10) #10 is the column width