ArrowInvalid:无法将 <POINT (266000 4016000)> 转换为 Point 类型:在推断 Arrow 数据类型时无法识别 Python 值类型

问题描述 投票:0回答:1

我有一个

sparkdataframe
,我使用
dataframe
将其转换为 pandas-on-spark
pandas_api()
,但是当我运行代码时。

# Import necessary libraries
import geopandas as gpd
import pandas as pd
from shapely.geometry import Point

# Convert the Pandas DataFrame to a GeoDataFrame
model_gdf = gpd.GeoDataFrame(
    df_new,
    crs=sweden_gdf.crs,
    geometry=df_new.apply(lambda row: Point(row['x']*1000, row['y']*1000), axis=1)

我收到此错误

ArrowInvalid: Could not convert <POINT (266000 4016000)> with type Point: did not recognize Python value type when inferring an Arrow data type
python pyspark geopandas pyarrow
1个回答
0
投票

不幸的是,Spark 数据框不适用于复杂的数据类型,例如形状几何,因此您必须将其转换为 pandas 数据框,然后转换为地理数据框,然后执行所有几何计算,然后将其转换回 Spark 数据框。

import geopandas as gpd
import pandas as pd
from shapely.geometry import Point
from pyspark.sql import SparkSession

# Create a Spark session
spark = SparkSession.builder.appName("YourAppName").getOrCreate()

# Convert Spark DataFrame to Pandas DataFrame
df_new_pd = df_new.toPandas()

# Convert the Pandas DataFrame to a GeoDataFrame
# Adjust the projection system (CRS) as per your requirements
model_gdf = gpd.GeoDataFrame(
    df_new_pd,
    crs=sweden_gdf.crs,  # Assuming sweden_gdf is already defined with a CRS
    geometry=df_new_pd.apply(lambda row: Point(row['x']*1000, row['y']*1000), 
axis=1)
)

# Perform your geometric calculations here on model_gdf

# Once calculations are done, convert GeoDataFrame back to Pandas DataFrame
result_pd = pd.DataFrame(model_gdf.drop(columns='geometry'))

# Convert Pandas DataFrame back to Spark DataFrame
result_df = spark.createDataFrame(result_pd)
© www.soinside.com 2019 - 2024. All rights reserved.