如何检查数据框中的某些验证并以某种格式将错误记录在 CSV 文件中?

问题描述 投票:0回答:1

使用下面的数据集如下 Df=

    Roll.  Marks.  Grade City
   0  123.  40      C.    Kol
   1. 476   80.     B.    Bhu
   2  789   20      D.    Che
   3. NaN.  90.     A.    Kol
   4. NaN.  70.     A.    Che

我需要检查是否有任何卷为 NaN/null,以及是否有任何分数低于 35。 如果有任何此类发现,需要将错误记录在以下格式的 CSV 文件中。

> Timestamp.       Error details
> Current date.   Check failed at row no.3,column Roll. Error value:Nan
> Current date.   Check failed at row no.2,column Marks. Error value:20

我尝试过使用 isna.any() 和 if 条件进行检查,但在检查后使用 CSV writer 创建和写入日志

roll=df['roll'].isna().any()
if roll:
  print("Roll is missing")
  with open('log.csv','w',newline='') as log:
    writer=csv.writer(log)
    writer.writerow([df].columns)
    for v in df.to_numpy():
      writer.writerow(v)

但它没有达到上述格式。使用Python 3.6 请帮忙

python pandas dataframe csv logging
1个回答
0
投票

您可以使用检查函数字典

stack
items
:

创建 DataFrame
checks = {'Roll': lambda s: s.isna(), 'Marks': lambda s: s.lt(35)}

m = df.apply(checks).stack()

out = pd.DataFrame({'Timestamp': pd.Timestamp('today').normalize(),
                    'Error details': [
                        f'Check failed at row no.{row}, column {col}. Error value: {val}'
                         for (row, col), val in df[list(checks)].stack(dropna=False)[m].items()
                    ]})

输出:

   Timestamp                                              Error details
0 2024-04-22  Check failed at row no.2, column Marks. Error value: 20.0
1 2024-04-22    Check failed at row no.3, column Roll. Error value: nan
2 2024-04-22    Check failed at row no.4, column Roll. Error value: nan
© www.soinside.com 2019 - 2024. All rights reserved.