使用下面的数据集如下 Df=
Roll. Marks. Grade City
0 123. 40 C. Kol
1. 476 80. B. Bhu
2 789 20 D. Che
3. NaN. 90. A. Kol
4. NaN. 70. A. Che
我需要检查是否有任何卷为 NaN/null,以及是否有任何分数低于 35。 如果有任何此类发现,需要将错误记录在以下格式的 CSV 文件中。
> Timestamp. Error details
> Current date. Check failed at row no.3,column Roll. Error value:Nan
> Current date. Check failed at row no.2,column Marks. Error value:20
我尝试过使用 isna.any() 和 if 条件进行检查,但在检查后使用 CSV writer 创建和写入日志
roll=df['roll'].isna().any()
if roll:
print("Roll is missing")
with open('log.csv','w',newline='') as log:
writer=csv.writer(log)
writer.writerow([df].columns)
for v in df.to_numpy():
writer.writerow(v)
但它没有达到上述格式。使用Python 3.6 请帮忙
stack
和 items
: 创建 DataFrame
checks = {'Roll': lambda s: s.isna(), 'Marks': lambda s: s.lt(35)}
m = df.apply(checks).stack()
out = pd.DataFrame({'Timestamp': pd.Timestamp('today').normalize(),
'Error details': [
f'Check failed at row no.{row}, column {col}. Error value: {val}'
for (row, col), val in df[list(checks)].stack(dropna=False)[m].items()
]})
输出:
Timestamp Error details
0 2024-04-22 Check failed at row no.2, column Marks. Error value: 20.0
1 2024-04-22 Check failed at row no.3, column Roll. Error value: nan
2 2024-04-22 Check failed at row no.4, column Roll. Error value: nan