此问题涉及OpenPyxl的delete_rows()函数修改excel文件中的行索引号的方法。
Goal:使用此代码段,我想检查表中每一行的特定列,如果该列的单元格值等于特定值或为空,那么我想从表中删除整行。
问题:当delete_rows()函数删除一行时,似乎会更改其余行的索引,从而导致其余行无法正确删除(我介绍了一个行计数器,它根据关于是否发生删除,但这似乎无济于事)。
非常感谢任何想法或建议。谢谢!
链接到sample.xlsx文件:https://filebin.net/c4gd9b4kd38burun
如果上面的链接失效,以下是sample.xlsx文件的屏幕截图:
代码段:
#imports
from openpyxl import load_workbook
#load file
excel_file = "sample.xlsx"
workbook = load_workbook(filename=excel_file, data_only=True)
sheet = workbook.active
#deleting first row, and cleaning up headers
sheet.delete_rows(1)
sheet["D1"].value = "check 1"
sheet["E1"].value = "check 2"
#remove all rows that start with years earlier than 2018 or are empty
row_number=2
for row in sheet.iter_rows(min_col=2, max_col=2, min_row=2, values_only=True):
for cell in row:
str_cell = str(cell)
if str_cell.startswith("2017"):
sheet.delete_rows(idx=row_number)
row_number = row_number - 1
if str_cell.startswith("2016"):
sheet.delete_rows(idx=row_number)
row_number = row_number - 1
if str_cell.startswith("2015"):
sheet.delete_rows(idx=row_number)
row_number = row_number - 1
if str_cell == None:
sheet.delete_rows(idx=row_number)
row_number = row_number - 1
row_number = row_number + 1
#save as new file
workbook.save(filename="sample_test.xlsx")
经过更多研究,我遇到了以下stackoverflow问题:Can't get OpenPyXl to delete rows
用户Charlie Clark指出,应以相反的顺序执行行删除,以避免出现索引问题。我调整了代码以实现此想法,并使其能够按需执行。我在下面附加了我的工作代码,以防其他人遇到类似问题时指出正确的方向。
代码段:
#imports
from openpyxl import load_workbook
#load file
excel_file = "sample.xlsx"
workbook = load_workbook(filename=excel_file, data_only=True)
sheet = workbook.active
#deleting first row, and cleaning up headers
sheet.delete_rows(1)
sheet["D1"].value = "check 1"
sheet["E1"].value = "check 2"
i=1
del_rows = []
for row in sheet.iter_rows(min_col=2, max_col=2, min_row=2):
i += 1
rowcellvals = []
for cell in row:
str_cell = str(cell.value)
if str_cell.startswith("2017"):
del_rows.append(i)
elif str_cell.startswith("2016"):
del_rows.append(i)
elif str_cell.startswith("2015"):
del_rows.append(i)
elif cell.value is None:
del_rows.append(i)
for r in reversed(del_rows):
sheet.delete_rows(r)
#save as new file
workbook.save(filename="sample_test.xlsx")