我有一个
sparkdataframe
,我使用 dataframe
将其转换为 pandas-on-spark pandas_api()
,但是当我运行代码时。
# Import necessary libraries
import geopandas as gpd
import pandas as pd
from shapely.geometry import Point
# Convert the Pandas DataFrame to a GeoDataFrame
model_gdf = gpd.GeoDataFrame(
df_new,
crs=sweden_gdf.crs,
geometry=df_new.apply(lambda row: Point(row['x']*1000, row['y']*1000), axis=1)
我收到此错误
ArrowInvalid: Could not convert <POINT (266000 4016000)> with type Point: did not recognize Python value type when inferring an Arrow data type
不幸的是,Spark 数据框不适用于复杂的数据类型,例如形状几何,因此您必须将其转换为 pandas 数据框,然后转换为地理数据框,然后执行所有几何计算,然后将其转换回 Spark 数据框。
import geopandas as gpd
import pandas as pd
from shapely.geometry import Point
from pyspark.sql import SparkSession
# Create a Spark session
spark = SparkSession.builder.appName("YourAppName").getOrCreate()
# Convert Spark DataFrame to Pandas DataFrame
df_new_pd = df_new.toPandas()
# Convert the Pandas DataFrame to a GeoDataFrame
# Adjust the projection system (CRS) as per your requirements
model_gdf = gpd.GeoDataFrame(
df_new_pd,
crs=sweden_gdf.crs, # Assuming sweden_gdf is already defined with a CRS
geometry=df_new_pd.apply(lambda row: Point(row['x']*1000, row['y']*1000),
axis=1)
)
# Perform your geometric calculations here on model_gdf
# Once calculations are done, convert GeoDataFrame back to Pandas DataFrame
result_pd = pd.DataFrame(model_gdf.drop(columns='geometry'))
# Convert Pandas DataFrame back to Spark DataFrame
result_df = spark.createDataFrame(result_pd)