我正在尝试读取使用嵌套 jSON 创建的数据帧。不知何故,我无法读取嵌套键之一并收到错误。
val df=spark.read.json(Seq("""
[{
"testPlans": [{
"attr1": "abc"
}, {
"attr2": "bac",
"attr3": [{
"uniqueId": "111"
}]
}]
}]""").toDS())
df.select("testPlans.attr1","testPlans.attr2","testPlans.attr3.uniqueId").show()
错误消息:错误:org.apache.spark.sql.AnalysisException:由于数据类型不匹配,无法解析 'testPlans.
attr3
['uniqueId']':参数 2 需要整数类型,但是,''uniqueId''是字符串类型。;
要访问嵌套数组,您需要首先
explode
它
import org.apache.spark.sql.functions._
df.withColumn("attr3", explode($"testPlans.attr3"))
.select("testPlans.attr1", "testPlans.attr2", "attr3.uniqueId")
.show()
+-----------+-----------+--------+
| attr1| attr2|uniqueId|
+-----------+-----------+--------+
|[abc, null]|[null, bac]| null|
|[abc, null]|[null, bac]| [111]|
+-----------+-----------+--------+