有没有办法检查标准内置数据类型(str、float、int)以外的数据类型?我尝试了以下操作,但这会导致错误,因为 Pandera 不知道数据类型 LineString。有什么建议吗?
import pandera as pa
from shapely.geometry import LineString
class Schema(pa.DataFrameModel:
id: int = pa.Field(unique=True, nullable=False)
geometry: LineString = pa.Field(nullable=False)
try:
Schema.validate(some_df, lazy=True)
except pa.errors.SchemaErrors as err:
errors = err.failure_cases
error_data = err.data
pandera 扩展是否可行?那如何实施呢?
Pandera 是一个很棒的 Python 数据验证库,但它只验证简单的数据类型和结构,不支持 Shapely 库中的“LineString”等复杂数据类型。您可能需要编写自己的验证逻辑来支持此类特殊数据类型。
import pandera as pa
from shapely.geometry import LineString
import geopandas as gpd
# Sample DataFrame
data = {'id': [1, 2, 3],
'geometry': [LineString([(0, 0), (1, 1)]), LineString([(1, 1), (2, 2)]), LineString([(2, 2), (3, 3)])]}
df = gpd.GeoDataFrame(data)
# Define a custom validation function
def validate_line_string_geometry(df):
for geom in df['geometry']:
if not isinstance(geom, LineString):
raise ValueError("Invalid LineString geometry")
return df
# Create a Pandera schema
schema = pa.DataFrameSchema({
"id": pa.Column(pa.Int, nullable=False),
"geometry": pa.Check(validate_line_string_geometry, nullable=False)
})
schema.validate(df) # This will raise an error if validation fails