空转换错误 - Polars.Exceptions.ComputeError - 带 Polars 的 pandera(0.19.0b3)

问题描述 投票:0回答:1

注意:您可以使用

安装特定的pandera版本

pip 安装“pandera[极地]”之前

我们正在尝试使用极坐标进行简单的验证示例。我们无法理解问题或它产生的原因。但当任何验证失败并且数据中存在 null 时,它会抛出 polars.exceptions.ComputeError 异常。

例如,在下面的代码中,虚拟数据包含带有 None 的 extract_date 特征。如果 case_id 都是 int 可转换字符串,则运行正常,但如果任何 case_id 不是 int 可转换字符串,则抛出异常。

这是代码:

import pandera.polars as pa
import polars as pl
from datetime import date
import json

class CaseSchema(pa.DataFrameModel):
    case_id: int = pa.Field(nullable=False, unique=True, coerce=True)
    gdwh_portfolio_id: str = pa.Field(nullable=False, unique=True, coerce=True)
    extract_date: date = pa.Field(nullable=True, coerce=True)

    class Config:
        drop_invalid_rows = True

invalid_lf = pl.DataFrame({
    #"case_id": ["1", "2", "3"],
    "case_id": ["1", "2", "abc"],
    "gdwh_portfolio_id": ["d", "e", "f"],
    "extract_date": [date(2024,1,1), date(2024,1,2), None]
})

try:
    CaseSchema.validate(invalid_lf, lazy=True)
except pa.errors.SchemaErrors as e:
        print(json.dumps(e.message, indent=4))

它给出: 'failure_case' 对于 1 个值中的 1 个:[{"abc","f",null}] 如果您取消注释“

case_id": ["1", "2", "3"]
,并注释
"case_id": ["1", "2", "abc"]
,则运行正常。

不知道为什么当有空值时它会出现恐慌。如果数据中没有空值,它就可以正常工作。

我们得到的痕迹是:


> Traceback (most recent call last):
>   File "<frozen runpy>", line 198, in _run_module_as_main
>   File "<frozen runpy>", line 88, in _run_code
>   File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/erehoba-acc-payments-req/code/Users/ourrehman/dna-payments-and-accounts/data_validation/test.py", line 22, in <module>
>     CaseSchema.validate(invalid_lf, lazy=True)
>   File "/anaconda/envs/pandera-polars/lib/python3.11/site-packages/pandera/api/dataframe/model.py", line 289, in validate
>     cls.to_schema().validate(
>   File "/anaconda/envs/pandera-polars/lib/python3.11/site-packages/pandera/api/polars/container.py", line 58, in validate
>     output = self.get_backend(check_obj).validate(
>              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/anaconda/envs/pandera-polars/lib/python3.11/site-packages/pandera/backends/polars/container.py", line 65, in validate
>     check_obj = parser(check_obj, *args)
>                 ^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/anaconda/envs/pandera-polars/lib/python3.11/site-packages/pandera/backends/polars/container.py", line 398, in coerce_dtype
>     check_obj = self._coerce_dtype_helper(check_obj, schema)
>                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/anaconda/envs/pandera-polars/lib/python3.11/site-packages/pandera/backends/polars/container.py", line 486, in _coerce_dtype_helper
>     raise SchemaErrors(
>           ^^^^^^^^^^^^^
>   File "/anaconda/envs/pandera-polars/lib/python3.11/site-packages/pandera/errors.py", line 183, in __init__
>     ).failure_cases_metadata(schema.name, schema_errors)
>       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/anaconda/envs/pandera-polars/lib/python3.11/site-packages/pandera/backends/polars/base.py", line 173, in failure_cases_metadata
>     ).cast(
>       ^^^^^
>   File "/anaconda/envs/pandera-polars/lib/python3.11/site-packages/polars/dataframe/frame.py", line 6624, in cast
>     return self.lazy().cast(dtypes, strict=strict).collect(_eager=True)
>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/anaconda/envs/pandera-polars/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 1810, in collect
>     return wrap_df(ldf.collect())
>                    ^^^^^^^^^^^^^
> polars.exceptions.ComputeError: conversion from `struct[3]` to `str` failed in column 'failure_case' for 1 out of 1 values: [{"abc","f",null}]

预期行为

它应该适用于具有 null 并设置了 nullable=True

的列

版本

潘德拉:0.19.0b3 极地:0.20.23 蟒蛇:3.11

python python-polars pandera
1个回答
0
投票

这是 0.190b3 中的一个错误。我创建了一个问题:https://github.com/unionai-oss/pandera/issues/1607

PR 将解决该问题,但现在也有 0.19.0 版本可用。

© www.soinside.com 2019 - 2024. All rights reserved.