我希望有效地在Spark中使用Scala动态压平镶木地板文件。我想知道实现这一目标的有效方法。
镶木地板文件包含多个深度级别的多个阵列和结构类型嵌套。镶木地板文件架构将来可能会发生变化,因此我无法对任何属性进行硬编码。期望的最终结果是展平的分隔文件。
使用flatmap和递归爆炸工作的解决方案会是什么?
示例架构:
|-- exCar: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- exCarOne: string (nullable = true)
| | |-- exCarTwo: string (nullable = true)
| | |-- exCarThree: string (nullable = true)
|-- exProduct: string (nullable = true)
|-- exName: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- exNameOne: string (nullable = true)
| | |-- exNameTwo: string (nullable = true)
| | |-- exNameThree: string (nullable = true)
| | |-- exNameFour: string (nullable = true)
| | |-- exNameCode: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- exNameCodeOne: string (nullable = true)
| | | | |-- exNameCodeTwo: string (nullable = true)
| | |-- exColor: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- exColorOne: string (nullable = true)
| | | | |-- exColorTwo: string (nullable = true)
| | | | |-- exWheelColor: array (nullable = true)
| | | | | |-- element: struct (containsNull = true)
| | | | | | |-- exWheelColorOne: string (nullable = true)
| | | | | | |-- exWheelColorTwo: string (nullable = true)
| | | | | | |--exWheelColorThree: string (nullable =true)
| | |-- exGlass: string (nullable = true)
|-- exDetails: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- exBill: string (nullable = true)
| | |-- exAccount: string (nullable = true)
| | |-- exLoan: string (nullable = true)
| | |-- exRate: string (nullable = true)
期望的输出架构:
exCar.exCarOne
exCar.exCarTwo
exCar.exCarThree
exProduct
exName.exNameOne
exName.exNameTwo
exName.exNameThree
exName.exNameFour
exName.exNameCode.exNameCodeOne
exName.exNameCode.exNameCodeTwo
exName.exColor.exColorOne
exName.exColor.exColorTwo
exName.exColor.exWheelColor.exWheelColorOne
exName.exColor.exWheelColor.exWheelColorTwo
exName.exColor.exWheelColor.exWheelColorThree
exName.exGlass
exDetails.exBill
exDetails.exAccount
exDetails.exLoan
exDetails.exRate
有两件事需要做:
1)从最外层的嵌套数组中爆炸数组列到里面的数组:爆炸exName
(给你很多行包含json,包含exColor
),然后exColor
然后你爆炸,允许你访问exWheelColor
等。
2)将嵌套的json投影到单独的列。