使用 from_pandas 将 GeoDataFrame 转换为极坐标失败并出现 ArrowTypeError:未传递 numpy.dtype 对象

问题描述 投票:0回答:1

我尝试使用

from_pandas
将 GeoDataFrame 转换为极坐标 DataFrame。我收到一个 ArrowTypeError: Did not pass numpy.dtype object 异常。

预期结果将是一个极坐标 DataFrame,其中

geometry
列键入为
pl.Object

我知道https://github.com/geopolars/geopolars(alpha)和https://github.com/pola-rs/polars/issues/1830并且对形状优美的物体没问题现在只是表示为 pl.Object。

这是一个演示该问题的最小示例:

## Minimal example displaying the issue
import geopandas as gpd
print("geopandas version: ", gpd.__version__)
import geodatasets
print("geodatasets version: ", geodatasets.__version__)
import polars as pl
print("polars version: ", pl.__version__)

gdf = gpd.GeoDataFrame.from_file(geodatasets.get_path("nybb"))
print("\nOriginal GeoDataFrame")
print(gdf.dtypes)
print(gdf.head())

print("\nGeoDataFrame to Polars without geometry")
print(pl.from_pandas(gdf.drop("geometry", axis=1)).head())

try:
    print("\nGeoDataFrame to Polars naiive") 
    print(pl.from_pandas(gdf).head())
except Exception as e:
    print(e)

try:
    print("\nGeoDataFrame to Polars with schema override") 
    print(pl.from_pandas(gdf, schema_overrides={"geometry": pl.Object}).head())
except Exception as e:
    print(e)

# again to print stack trace
pl.from_pandas(gdf).head()

输出

geopandas version:  0.14.4
geodatasets version:  2023.12.0
polars version:  0.20.23

Original GeoDataFrame
BoroCode         int64
BoroName        object
Shape_Leng     float64
Shape_Area     float64
geometry      geometry
dtype: object
   BoroCode       BoroName     Shape_Leng    Shape_Area  \
0         5  Staten Island  330470.010332  1.623820e+09   
1         4         Queens  896344.047763  3.045213e+09   
2         3       Brooklyn  741080.523166  1.937479e+09   
3         1      Manhattan  359299.096471  6.364715e+08   
4         2          Bronx  464392.991824  1.186925e+09   

                                            geometry  
0  MULTIPOLYGON (((970217.022 145643.332, 970227....  
1  MULTIPOLYGON (((1029606.077 156073.814, 102957...  
2  MULTIPOLYGON (((1021176.479 151374.797, 102100...  
3  MULTIPOLYGON (((981219.056 188655.316, 980940....  
4  MULTIPOLYGON (((1012821.806 229228.265, 101278...  

GeoDataFrame to Polars without geometry
shape: (5, 4)
┌──────────┬───────────────┬───────────────┬────────────┐
│ BoroCode ┆ BoroName      ┆ Shape_Leng    ┆ Shape_Area │
│ ---      ┆ ---           ┆ ---           ┆ ---        │
│ i64      ┆ str           ┆ f64           ┆ f64        │
╞══════════╪═══════════════╪═══════════════╪════════════╡
│ 5        ┆ Staten Island ┆ 330470.010332 ┆ 1.6238e9   │
│ 4        ┆ Queens        ┆ 896344.047763 ┆ 3.0452e9   │
│ 3        ┆ Brooklyn      ┆ 741080.523166 ┆ 1.9375e9   │
│ 1        ┆ Manhattan     ┆ 359299.096471 ┆ 6.3647e8   │
│ 2        ┆ Bronx         ┆ 464392.991824 ┆ 1.1869e9   │
└──────────┴───────────────┴───────────────┴────────────┘

GeoDataFrame to Polars naiive
Did not pass numpy.dtype object

GeoDataFrame to Polars with schema override
Did not pass numpy.dtype object

堆栈跟踪(有和没有

schema_overrides
都一样)

---------------------------------------------------------------------------
ArrowTypeError                            Traceback (most recent call last)
Cell In[59], line 27
     24     print(e)
     26 # again to print stack trace
---> 27 pl.from_pandas(gdf).head()

File c:\Users\...\polars\convert.py:571, in from_pandas(data, schema_overrides, rechunk, nan_to_null, include_index)
    568     return wrap_s(pandas_to_pyseries("", data, nan_to_null=nan_to_null))
    569 elif isinstance(data, pd.DataFrame):
    570     return wrap_df(
--> 571         pandas_to_pydf(
    572             data,
    573             schema_overrides=schema_overrides,
    574             rechunk=rechunk,
    575             nan_to_null=nan_to_null,
    576             include_index=include_index,
    577         )
    578     )
    579 else:
    580     msg = f"expected pandas DataFrame or Series, got {type(data).__name__!r}"

File c:\Users\...\polars\_utils\construction\dataframe.py:1032, in pandas_to_pydf(data, schema, schema_overrides, strict, rechunk, nan_to_null, include_index)
   1025         arrow_dict[str(idxcol)] = plc.pandas_series_to_arrow(
   1026             data.index.get_level_values(idxcol),
   1027             nan_to_null=nan_to_null,
   1028             length=length,
   1029         )
   1031 for col in data.columns:
-> 1032     arrow_dict[str(col)] = plc.pandas_series_to_arrow(
   1033         data[col], nan_to_null=nan_to_null, length=length
   1034     )
   1036 arrow_table = pa.table(arrow_dict)
   1037 return arrow_to_pydf(
   1038     arrow_table,
   1039     schema=schema,
   (...)
   1042     rechunk=rechunk,
   1043 )

File c:\Users\...\polars\_utils\construction\other.py:97, in pandas_series_to_arrow(values, length, nan_to_null)
     95     return pa.array(values, from_pandas=nan_to_null)
     96 elif dtype:
---> 97     return pa.array(values, from_pandas=nan_to_null)
     98 else:
     99     # Pandas Series is actually a Pandas DataFrame when the original DataFrame
    100     # contains duplicated columns and a duplicated column is requested with df["a"].
    101     msg = "duplicate column names found: "

File c:\Users\...\pyarrow\array.pxi:323, in pyarrow.lib.array()

File c:\Users\...\pyarrow\array.pxi:79, in pyarrow.lib._ndarray_to_array()

File c:\Users\...\pyarrow\array.pxi:67, in pyarrow.lib._ndarray_to_type()

File c:\Users\...\pyarrow\error.pxi:123, in pyarrow.lib.check_status()

ArrowTypeError: Did not pass numpy.dtype object
python dataframe geopandas python-polars pyarrow
1个回答
0
投票

polars.from_pandas
目前不支持 Geopandas。

https://docs.pola.rs/py-polars/html/reference/api/polars.from_pandas.html#polars-from-pandas

从 pandas 数据帧、系列或索引构建 Polars 数据帧或系列。

函数签名:

polars.from_pandas(
data: pd.DataFrame | pd.Series[Any] | pd.Index[Any] | pd.DatetimeIndex,
*,
schema_overrides: SchemaDict | None = None,
rechunk: bool = True,
nan_to_null: bool = True,
include_index: bool = False,
) → DataFrame | Series

您可能需要将 geo df 转换为 pandas df,然后将其转换为 Polars df。

© www.soinside.com 2019 - 2024. All rights reserved.