注意:您可以使用
安装特定的pandera版本pip 安装“pandera[极地]”之前
我们正在尝试使用极坐标进行简单的验证示例。我们无法理解问题或它产生的原因。但当任何验证失败并且数据中存在 null 时,它会抛出 polars.exceptions.ComputeError 异常。
例如,在下面的代码中,虚拟数据包含带有 None 的 extract_date 特征。如果 case_id 都是 int 可转换字符串,则运行正常,但如果任何 case_id 不是 int 可转换字符串,则抛出异常。
这是代码:
import pandera.polars as pa
import polars as pl
from datetime import date
import json
class CaseSchema(pa.DataFrameModel):
case_id: int = pa.Field(nullable=False, unique=True, coerce=True)
gdwh_portfolio_id: str = pa.Field(nullable=False, unique=True, coerce=True)
extract_date: date = pa.Field(nullable=True, coerce=True)
class Config:
drop_invalid_rows = True
invalid_lf = pl.DataFrame({
#"case_id": ["1", "2", "3"],
"case_id": ["1", "2", "abc"],
"gdwh_portfolio_id": ["d", "e", "f"],
"extract_date": [date(2024,1,1), date(2024,1,2), None]
})
try:
CaseSchema.validate(invalid_lf, lazy=True)
except pa.errors.SchemaErrors as e:
print(json.dumps(e.message, indent=4))
它给出: 'failure_case' 对于 1 个值中的 1 个:[{"abc","f",null}] 如果您取消注释“
case_id": ["1", "2", "3"]
,并注释"case_id": ["1", "2", "abc"]
,则运行正常。
不知道为什么当有空值时它会出现恐慌。如果数据中没有空值,它就可以正常工作。
我们得到的痕迹是:
> Traceback (most recent call last):
> File "<frozen runpy>", line 198, in _run_module_as_main
> File "<frozen runpy>", line 88, in _run_code
> File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/erehoba-acc-payments-req/code/Users/ourrehman/dna-payments-and-accounts/data_validation/test.py", line 22, in <module>
> CaseSchema.validate(invalid_lf, lazy=True)
> File "/anaconda/envs/pandera-polars/lib/python3.11/site-packages/pandera/api/dataframe/model.py", line 289, in validate
> cls.to_schema().validate(
> File "/anaconda/envs/pandera-polars/lib/python3.11/site-packages/pandera/api/polars/container.py", line 58, in validate
> output = self.get_backend(check_obj).validate(
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File "/anaconda/envs/pandera-polars/lib/python3.11/site-packages/pandera/backends/polars/container.py", line 65, in validate
> check_obj = parser(check_obj, *args)
> ^^^^^^^^^^^^^^^^^^^^^^^^
> File "/anaconda/envs/pandera-polars/lib/python3.11/site-packages/pandera/backends/polars/container.py", line 398, in coerce_dtype
> check_obj = self._coerce_dtype_helper(check_obj, schema)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File "/anaconda/envs/pandera-polars/lib/python3.11/site-packages/pandera/backends/polars/container.py", line 486, in _coerce_dtype_helper
> raise SchemaErrors(
> ^^^^^^^^^^^^^
> File "/anaconda/envs/pandera-polars/lib/python3.11/site-packages/pandera/errors.py", line 183, in __init__
> ).failure_cases_metadata(schema.name, schema_errors)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File "/anaconda/envs/pandera-polars/lib/python3.11/site-packages/pandera/backends/polars/base.py", line 173, in failure_cases_metadata
> ).cast(
> ^^^^^
> File "/anaconda/envs/pandera-polars/lib/python3.11/site-packages/polars/dataframe/frame.py", line 6624, in cast
> return self.lazy().cast(dtypes, strict=strict).collect(_eager=True)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File "/anaconda/envs/pandera-polars/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 1810, in collect
> return wrap_df(ldf.collect())
> ^^^^^^^^^^^^^
> polars.exceptions.ComputeError: conversion from `struct[3]` to `str` failed in column 'failure_case' for 1 out of 1 values: [{"abc","f",null}]
它应该适用于具有 null 并设置了 nullable=True
的列潘德拉:0.19.0b3 极地:0.20.23 蟒蛇:3.11
这是 0.190b3 中的一个错误。我创建了一个问题:https://github.com/unionai-oss/pandera/issues/1607
PR 将解决该问题,但现在也有 0.19.0 版本可用。