如何在数据块中分解数据框架构

问题描述 投票:0回答:1

我有一个应该分解的架构,下面是该架构

 |-- CaseNumber: string (nullable = true)
 |-- Customers: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- Contacts: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- FirstName: string (nullable = true)
 |    |    |    |    |-- LastName: string (nullable = true)

我希望我的模式是这样,

|-- CaseNumber: string (nullable = true)
|-- FirstName: string (nullable = true)
|-- LastName: string (nullable = true)

+----------+---------------------+
| CaseNumber| FirstName| LastName|
+----------+---------------------+
|       1  |     aa    |      bb |
+----------|-----------|---------|   
|       2  |     cc    |      dd | 
+------------------------------- |

我是数据砖的新手,不胜感激。谢谢

please click this for sample data

scala dataframe apache-spark distributed-computing databricks
1个回答
0
投票

这是一种无需使用爆炸命令即可解决它的方法-

case class MyCase(val Customers = Array[Customer](), CaseNumber : String
)

case class Customers(val Contacts = Array[Contacts]()
)

case class Contacts(val Firstname:String, val LastName:String
)

val dataset = // dataframe.as[MyCase]

dataset.map{ mycase => 

// return a Seq of tuples like - (mycase.caseNumber, //read customer's contract's first and last name )
//one row per first and last names, repeat mycase.caseNumber .. basically a loop
}.flatmap(identity)

© www.soinside.com 2019 - 2024. All rights reserved.