我有一个Azure Databricks(Databricks 6.4(包括Apache Spark 2.4.5,Scala 2.11))配置了Active Directory直通以支持查询Azure Data Lake Gen 2存储帐户的标准群集。
ADLS通过python挂载:
configs = {
"fs.azure.account.auth.type": "CustomAccessToken",
"fs.azure.account.custom.token.provider.class": spark.conf.get("spark.databricks.passthrough.adls.gen2.tokenProviderClassName")
}
dbutils.fs.mount(
source = "abfss://[email protected]/",
mount_point = "/mnt/taxi",
extra_configs = configs)
使用数据砖笔记本中的{sparkR}返回结果。
taxiall <- read.df("/mnt/taxi/yellow",source="parquet")
collect(mean(rollup(taxiall, "vendorID", "puLocationId"), "totalAmount"))
使用{sparklyr}会导致令牌出现问题。
library(sparklyr)
library(dplyr)
sc <- spark_connect(method = "databricks")
yellowtaxi <- spark_read_parquet(path="/mnt/taxi/yellow",sc=sc)
yellow_taxi %>%
group_by(vendorID, puLocationId) %>%
summarise(avgFare= mean(totalAmount), n= n()) ->
fares
collect(fares)
错误:com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException:找不到ADLS Gen2令牌
要确保sparklyr与凭据传递一起使用还需要其他吗?