无法在Azure databricks中实例化EventHubsSourceProvider

问题描述 投票:0回答:1

我正在尝试使用以下代码从 Azure Databricks 的事件中心读取数据。

from pyspark.sql.functions import *
from pyspark.sql.types import *

NAMESPACE_NAME = "*myEventHub*"
KEY_NAME = "*MyPolicyName*"
KEY_VALUE = "*MySharedAccessKey*"

connectionString = "Endpoint=sb://{0}.servicebus.windows.net/;SharedAccessKeyName={1};SharedAccessKey={2};EntityPath=ingestion".format(NAMESPACE_NAME, KEY_NAME, KEY_VALUE)

ehConf = {}
ehConf['eventhubs.connectionString'] = connectionString

df = spark \
  .readStream \
  .format("eventhubs") \
  .options(**ehConf) \
  .load()

它抛出这个错误

java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.streaming.eventhubs.EventHubsSourceProvider could not be instantiated

我正在使用 Databricks 运行时版本 14.2(Scala 2.12、Spark 3.5.0) 并安装了 com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.22 包。 请有人帮忙解决这个错误

azure azure-databricks azure-eventhub
1个回答
0
投票

卸载当前软件包并重新启动集群。再次安装该软件包并尝试使用加密的连接字符串运行代码,如下所示。

connectionString = "Endpoint=sb://{0}.servicebus.windows.net/;SharedAccessKeyName={1};SharedAccessKey={2};EntityPath=jgsevents".format(NAMESPACE_NAME, KEY_NAME, KEY_VALUE)

ehConf = {}
ehConf['eventhubs.connectionString'] = sc._jvm.org.apache.spark.eventhubs.EventHubsUtils.encrypt(connectionString)

输出:

df.select(col("body").cast("string")).display()
身体
[ { "key1": "value1", "key2": "value2", "key3": "value3", "nestedKey": { "nestedKey1": "nestedValue1" }, "arrayKey": [ "arrayValue1", “arrayValue2”]}]

欲了解更多信息,请参阅

© www.soinside.com 2019 - 2024. All rights reserved.