所以我正在运行来自 Apache Sedona 的笔记本here。我可以将所有数据加载并打印到 DF。但是一旦我与 RDD 进行空间连接
airports_rdd = Adapter.toSpatialRdd(airports_df, "geometry")
# Drop the duplicate name column in countries_df
countries_df = countries_df.drop("NAME")
countries_rdd = Adapter.toSpatialRdd(countries_df, "geometry")
airports_rdd.analyze()
countries_rdd.analyze()
# 4 is the num partitions used in spatial partitioning. This is an optional parameter
airports_rdd.spatialPartitioning(GridType.KDBTREE, 4)
countries_rdd.spatialPartitioning(airports_rdd.getPartitioner())
buildOnSpatialPartitionedRDD = True
usingIndex = True
considerBoundaryIntersection = True
airports_rdd.buildIndex(IndexType.QUADTREE, buildOnSpatialPartitionedRDD)
result_pair_rdd = JoinQueryRaw.SpatialJoinQueryFlat(airports_rdd, countries_rdd, usingIndex, considerBoundaryIntersection)
result2 = Adapter.toDf(result_pair_rdd, countries_rdd.fieldNames, airports.fieldNames, sedona)
result2.createOrReplaceTempView("join_result_with_all_cols")
# Select the columns needed in the join
result2 = sedona.sql("SELECT leftgeometry as country_geom, NAME_EN, rightgeometry as airport_geom, name FROM join_result_with_all_cols")
我遇到了错误
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/tmp/ipykernel_279305/309031602.py in <module>
18 result_pair_rdd = JoinQueryRaw.SpatialJoinQueryFlat(airports_rdd, countries_rdd, usingIndex, considerBoundaryIntersection)
19
---> 20 result2 = Adapter.toDf(result_pair_rdd, countries_rdd.fieldNames, airports.fieldNames, sedona)
21
22 result2.createOrReplaceTempView("join_result_with_all_cols")
~/clone/ext/public/python/apachesedona/1/5/1/dist/lib/python3.10/sedona/core/jvm/config.py in applier(*args, **kwargs)
55 f"please use version higher than {version}"
56 )
---> 57 raise AttributeError(f"Not available before {version} sedona version")
58 result = function(*args, **kwargs)
59 return result
AttributeError: Not available before 1.0.0 sedona version
这在官方笔记本中是预期的吗(看起来更新不到两周?)。如果没有,我该如何解决这个问题?我正在使用 Apache Sedona 1.5.1、Scala 2.12.18 和 PySpark 3.2.1 运行,因此我的 Sedona 版本肯定高于错误中所需的 1.0.0 版本。
这是一个已知错误,我们正在努力解决。要修复此问题,请确保使用阴影 jar:https://github.com/apache/sedona/issues/1247
或者,只需使用基于 SQL 的空间连接