我正在尝试将 Apache Sedona 与 Python 结合使用,特别是与 PySpark 版本 3.5.0 和 Python 3.11.6 结合使用。但是,我在设置过程中遇到了与未解决的依赖关系相关的问题。错误信息的相关部分如下:
:::: WARNINGS
module not found: edu.ucar#cdm-core;5.4.2
==== local-m2-cache: tried
file:/.m2/repository/edu/ucar/cdm-core/5.4.2/cdm-core-5.4.2.pom
-- artifact edu.ucar#cdm-core;5.4.2!cdm-core.jar:
file:/.m2/repository/edu/ucar/cdm-core/5.4.2/cdm-core-5.4.2.jar
==== local-ivy-cache: tried
/.ivy2/local/edu.ucar/cdm-core/5.4.2/ivys/ivy.xml
-- artifact edu.ucar#cdm-core;5.4.2!cdm-core.jar:
/.ivy2/local/edu.ucar/cdm-core/5.4.2/jars/cdm-core.jar
==== central: tried
https://repo1.maven.org/maven2/edu/ucar/cdm-core/5.4.2/cdm-core-5.4.2.pom
-- artifact edu.ucar#cdm-core;5.4.2!cdm-core.jar:
https://repo1.maven.org/maven2/edu/ucar/cdm-core/5.4.2/cdm-core-5.4.2.jar
==== spark-packages: tried
https://repos.spark-packages.org/edu/ucar/cdm-core/5.4.2/cdm-core-5.4.2.pom
-- artifact edu.ucar#cdm-core;5.4.2!cdm-core.jar:
https://repos.spark-packages.org/edu/ucar/cdm-core/5.4.2/cdm-core-5.4.2.jar
::::::::::::::::::::::::::::::::::::::::::::::
:: UNRESOLVED DEPENDENCIES ::
::::::::::::::::::::::::::::::::::::::::::::::
:: edu.ucar#cdm-core;5.4.2: not found
::::::::::::::::::::::::::::::::::::::::::::::
我使用的代码如下:
from pyspark.sql import SparkSession
from pyspark import StorageLevel
import geopandas as gpd
import pandas as pd
from shapely.geometry import Point
from shapely.geometry import Polygon
from sedona.spark import *
from sedona.core.geom.envelope import Envelope
config = SedonaContext.builder() .\
config('spark.jars.packages',
'org.apache.sedona:sedona-spark-shaded-3.4_2.12:1.5.1,'
'org.datasyslab:geotools-wrapper:1.5.1-28.2'). \
getOrCreate()
sedona = SedonaContext.create(config)
sc = sedona.sparkContext
print("Sedona context is " + sc)
我遵循了官方文档,但似乎存在未解决的依赖关系问题,可能与缺少软件包或配置有关。官方文档没有提供成功设置所需依赖项的详尽列表。您能否帮助澄清可能需要哪些额外的配置或软件包来解决此问题并成功使用 PySpark 3.5.0 设置 Apache Sedona?
您可以尝试指定可以下载 jar 文件的存储库。这里存储库配置用于指定 Maven 代表
cdm 核心可以在下面找到
https://mvnrepository.com/artifact/edu.ucar/cdm-core/5.4.2
from sedona.spark import *
packages = "org.apache.sedona:sedona-spark-shaded-3.4_2.12:1.5.1," \
"org.datasyslab:geotools-wrapper:1.5.1-28.2," \
"edu.ucar:cdm-core:5.4.2,"
repository = "https://repo1.maven.org/maven2"
config = SedonaContext.builder()\
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")\
.config("spark.kryo.registrator", SedonaKryoRegistrator.getName) \
.config("spark.jars.packages", packages) \
.config("spark.jars.repositories", repository) \
.getOrCreate()
sedona = SedonaContext.create(config)
sc = sedona.sparkContext
print("Sedona context is ", sc)