运行 Apache Sedona Spatial Join 到 DF 会引发版本错误?

问题描述 投票:0回答:1

所以我正在运行来自 Apache Sedona 的笔记本here。我可以将所有数据加载并打印到 DF。但是一旦我与 RDD 进行空间连接

airports_rdd = Adapter.toSpatialRdd(airports_df, "geometry")
# Drop the duplicate name column in countries_df
countries_df = countries_df.drop("NAME")
countries_rdd = Adapter.toSpatialRdd(countries_df, "geometry")

airports_rdd.analyze()
countries_rdd.analyze()

# 4 is the num partitions used in spatial partitioning. This is an optional parameter
airports_rdd.spatialPartitioning(GridType.KDBTREE, 4)
countries_rdd.spatialPartitioning(airports_rdd.getPartitioner())

buildOnSpatialPartitionedRDD = True
usingIndex = True
considerBoundaryIntersection = True
airports_rdd.buildIndex(IndexType.QUADTREE, buildOnSpatialPartitionedRDD)

result_pair_rdd = JoinQueryRaw.SpatialJoinQueryFlat(airports_rdd, countries_rdd, usingIndex, considerBoundaryIntersection)

result2 = Adapter.toDf(result_pair_rdd, countries_rdd.fieldNames, airports.fieldNames, sedona)

result2.createOrReplaceTempView("join_result_with_all_cols")
# Select the columns needed in the join
result2 = sedona.sql("SELECT leftgeometry as country_geom, NAME_EN, rightgeometry as airport_geom, name FROM join_result_with_all_cols")

我遇到了错误

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_279305/309031602.py in <module>
     18 result_pair_rdd = JoinQueryRaw.SpatialJoinQueryFlat(airports_rdd, countries_rdd, usingIndex, considerBoundaryIntersection)
     19 
---> 20 result2 = Adapter.toDf(result_pair_rdd, countries_rdd.fieldNames, airports.fieldNames, sedona)
     21 
     22 result2.createOrReplaceTempView("join_result_with_all_cols")

~/clone/ext/public/python/apachesedona/1/5/1/dist/lib/python3.10/sedona/core/jvm/config.py in applier(*args, **kwargs)
     55                     f"please use version higher than {version}"
     56                 )
---> 57                 raise AttributeError(f"Not available before {version} sedona version")
     58             result = function(*args, **kwargs)
     59             return result

AttributeError: Not available before 1.0.0 sedona version

这在官方笔记本中是预期的吗(看起来更新不到两周?)。如果没有,我该如何解决这个问题?我正在使用 Apache Sedona 1.5.1、Scala 2.12.18 和 PySpark 3.2.1 运行,因此我的 Sedona 版本肯定高于错误中所需的 1.0.0 版本。

apache-spark pyspark apache-sedona
1个回答
0
投票

这是一个已知错误,我们正在努力解决。要修复此问题,请确保使用阴影 jar:https://github.com/apache/sedona/issues/1247

或者,只需使用基于 SQL 的空间连接

© www.soinside.com 2019 - 2024. All rights reserved.