为什么Databricks Python无法从我的Azure Datalake Storage Gen1中读取?

问题描述 投票:2回答:1

我正在尝试使用语法(由mydir/mycsv.csv启发)从Databricks笔记本中读取Azure Data Lake Storage Gen1中的文件documentation

configs = {"dfs.adls.oauth2.access.token.provider.type": "ClientCredential",
           "dfs.adls.oauth2.client.id": "123abc-1e42-31415-9265-12345678",
           "dfs.adls.oauth2.credential": dbutils.secrets.get(scope = "adla", key = "adlamaywork"),
           "dfs.adls.oauth2.refresh.url": "https://login.microsoftonline.com/123456abc-2718-aaaa-9999-42424242abc/oauth2/token"}

dbutils.fs.mount(
  source = "adl://myadls.azuredatalakestore.net/mydir",
  mount_point = "/mnt/adls",
  extra_configs = configs)

post_processed = spark.read.csv("/mnt/adls/mycsv.csv").collect()

post_processed.head(10).to_csv("/dbfs/processed.csv")

dbutils.fs.unmount("/mnt/adls")

我的客户端123abc-1e42-31415-9265-12345678可以访问Data Lake Storage myadls,并且我已经使用以下命令创建了机密

databricks secrets put --scope adla --key adlamaywork

[当我在Databricks笔记本中执行上面的pyspark代码时,当使用spark.read.csv访问csv文件时,会得到

com.microsoft.azure.datalake.store.ADLException:获取信息时出错对于文件/mydir/mycsv.csv

[用dbfs ls dbfs:/mnt/adls导航dbfs时,似乎有父挂载点,但是我得到了

错误:b'{“错误代码”:“ IO_ERROR”,“消息”:“获取访问权限时出错令牌\ n在1次尝试后最后一次遇到异常[HTTP0(null)]“}'

我在做什么错?

python pyspark azure-data-lake databricks azure-databricks
1个回答
0
投票

如果您不一定需要将目录挂载到dbfs中,则可以尝试直接从adls中读取,如下所示:

spark.conf.set("dfs.adls.oauth2.access.token.provider.type", "ClientCredential")
spark.conf.set("dfs.adls.oauth2.access.token.provider", "org.apache.hadoop.fs.adls.oauth2.ConfCredentialBasedAccessTokenProvider")
spark.conf.set("dfs.adls.oauth2.client.id", "123abc-1e42-31415-9265-12345678")
spark.conf.set("dfs.adls.oauth2.credential", dbutils.secrets.get(scope = "adla", key = "adlamaywork"))
spark.conf.set("dfs.adls.oauth2.refresh.url", "https://login.microsoftonline.com/123456abc-2718-aaaa-9999-42424242abc/oauth2/token")

csvFile = "adl://myadls.azuredatalakestore.net/mydir/mycsv.csv"

df = spark.read.format('csv').options(header='true', inferschema='true').load(csvFile)
© www.soinside.com 2019 - 2024. All rights reserved.