尝试使用 Pyspark 访问 Apache Sedona 上下文时未解决的依赖关系

问题描述 投票:0回答:1

我正在尝试将 Apache Sedona 与 Python 结合使用,特别是与 PySpark 版本 3.5.0 和 Python 3.11.6 结合使用。但是,我在设置过程中遇到了与未解决的依赖关系相关的问题。错误信息的相关部分如下:

:::: WARNINGS
module not found: edu.ucar#cdm-core;5.4.2

==== local-m2-cache: tried

  file:/.m2/repository/edu/ucar/cdm-core/5.4.2/cdm-core-5.4.2.pom

  -- artifact edu.ucar#cdm-core;5.4.2!cdm-core.jar:

  file:/.m2/repository/edu/ucar/cdm-core/5.4.2/cdm-core-5.4.2.jar

==== local-ivy-cache: tried

  /.ivy2/local/edu.ucar/cdm-core/5.4.2/ivys/ivy.xml

  -- artifact edu.ucar#cdm-core;5.4.2!cdm-core.jar:

  /.ivy2/local/edu.ucar/cdm-core/5.4.2/jars/cdm-core.jar

==== central: tried

  https://repo1.maven.org/maven2/edu/ucar/cdm-core/5.4.2/cdm-core-5.4.2.pom

  -- artifact edu.ucar#cdm-core;5.4.2!cdm-core.jar:

  https://repo1.maven.org/maven2/edu/ucar/cdm-core/5.4.2/cdm-core-5.4.2.jar

==== spark-packages: tried

  https://repos.spark-packages.org/edu/ucar/cdm-core/5.4.2/cdm-core-5.4.2.pom

  -- artifact edu.ucar#cdm-core;5.4.2!cdm-core.jar:

  https://repos.spark-packages.org/edu/ucar/cdm-core/5.4.2/cdm-core-5.4.2.jar

    ::::::::::::::::::::::::::::::::::::::::::::::

    ::          UNRESOLVED DEPENDENCIES         ::

    ::::::::::::::::::::::::::::::::::::::::::::::

    :: edu.ucar#cdm-core;5.4.2: not found

    ::::::::::::::::::::::::::::::::::::::::::::::

我使用的代码如下:

    from pyspark.sql import SparkSession
from pyspark import StorageLevel
import geopandas as gpd
import pandas as pd
from shapely.geometry import Point
from shapely.geometry import Polygon

from sedona.spark import *
from sedona.core.geom.envelope import Envelope

config = SedonaContext.builder() .\
    config('spark.jars.packages',
           'org.apache.sedona:sedona-spark-shaded-3.4_2.12:1.5.1,'
           'org.datasyslab:geotools-wrapper:1.5.1-28.2'). \
    getOrCreate()

sedona = SedonaContext.create(config)
sc = sedona.sparkContext

print("Sedona context is " + sc)

我遵循了官方文档,但似乎存在未解决的依赖关系问题,可能与缺少软件包或配置有关。官方文档没有提供成功设置所需依赖项的详尽列表。您能否帮助澄清可能需要哪些额外的配置或软件包来解决此问题并成功使用 PySpark 3.5.0 设置 Apache Sedona?

apache-spark pyspark geopandas geotools apache-sedona
1个回答
0
投票

您可以尝试指定可以下载 jar 文件的存储库。这里存储库配置用于指定 Maven 代表

cdm 核心可以在下面找到

https://mvnrepository.com/artifact/edu.ucar/cdm-core/5.4.2

from sedona.spark import *

packages = "org.apache.sedona:sedona-spark-shaded-3.4_2.12:1.5.1," \
           "org.datasyslab:geotools-wrapper:1.5.1-28.2," \
           "edu.ucar:cdm-core:5.4.2,"

repository = "https://repo1.maven.org/maven2"


config = SedonaContext.builder()\
    .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")\
    .config("spark.kryo.registrator", SedonaKryoRegistrator.getName) \
    .config("spark.jars.packages", packages) \
    .config("spark.jars.repositories", repository) \
    .getOrCreate()

sedona = SedonaContext.create(config)
sc = sedona.sparkContext

print("Sedona context is ", sc)
© www.soinside.com 2019 - 2024. All rights reserved.