我有一个应该分解的架构,下面是该架构
|-- CaseNumber: string (nullable = true)
|-- Customers: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- Contacts: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- FirstName: string (nullable = true)
| | | | |-- LastName: string (nullable = true)
我希望我的模式是这样,
|-- CaseNumber: string (nullable = true)
|-- FirstName: string (nullable = true)
|-- LastName: string (nullable = true)
或
+----------+---------------------+
| CaseNumber| FirstName| LastName|
+----------+---------------------+
| 1 | aa | bb |
+----------|-----------|---------|
| 2 | cc | dd |
+------------------------------- |
我是数据砖的新手,不胜感激。谢谢
这是一种无需使用爆炸命令即可解决它的方法-
case class MyCase(val Customers = Array[Customer](), CaseNumber : String
)
case class Customers(val Contacts = Array[Contacts]()
)
case class Contacts(val Firstname:String, val LastName:String
)
val dataset = // dataframe.as[MyCase]
dataset.map{ mycase =>
// return a Seq of tuples like - (mycase.caseNumber, //read customer's contract's first and last name )
//one row per first and last names, repeat mycase.caseNumber .. basically a loop
}.flatmap(identity)