将 Dataframe 转换为 scala 中的嵌套 Json

问题描述 投票:0回答:1

我有一个数据框,如下所示,用*1、*2这样构造每个级别的json..并且“->”显示父节点的子节点

dataframe.show
id*1 姓名*1 ppu*1 类型*1 配料1->id2 配料1->类型2 击球手1->击球手2->id*3 面糊2->类型3
0001 蛋糕 0.55 甜甜圈 5001 1001 常规
0001 蛋糕 0.55 甜甜圈 5002 釉面 1002 巧克力

我需要输出为嵌套 json,如下所示

{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters":
    {
        "batter":
            [
                { "id": "1001", "type": "Regular" },
                { "id": "1002", "type": "Chocolate" },
                { "id": "1003", "type": "Blueberry" },
                { "id": "1004", "type": "Devil's Food" }
            ]
    },
"topping":
    [
        { "id": "5001", "type": "None" },
        { "id": "5002", "type": "Glazed" },
        { "id": "5005", "type": "Sugar" },
        { "id": "5007", "type": "Powdered Sugar" },
        { "id": "5006", "type": "Chocolate with Sprinkles" },
        { "id": "5003", "type": "Chocolate" },
        { "id": "5004", "type": "Maple" }
    ]
}

我尝试将数据帧转换为

dataframe.toJson

这给了我错误的输出,请帮助我如何迭代数据帧并创建如上所述的嵌套 json

json dataframe scala apache-spark
1个回答
0
投票

第 1 步:将

type
id
列组合成 struct

import org.apache.spark.sql._
import org.apache.spark.sql.functions._

val df = ...
val df1 = df.withColumn("topping", struct(col("toppings1->id2").as("id"), col("toppings1->type2").as("type")))
            .withColumn("batters", struct(col("batters1->batter2->id*3").as("id"), col("batter2->type3").as("type")))

结果:

root
 |-- id*1: string (nullable = true)
 |-- name*1: string (nullable = true)
 |-- ppu*1: string (nullable = true)
 |-- type*1: string (nullable = true)
 |-- toppings1->id2: string (nullable = true)
 |-- toppings1->type2: string (nullable = true)
 |-- batters1->batter2->id*3: string (nullable = true)
 |-- batter2->type3: string (nullable = true)
 |-- topping: struct (nullable = false)
 |    |-- id: string (nullable = true)
 |    |-- type: string (nullable = true)
 |-- batters: struct (nullable = false)
 |    |-- id: string (nullable = true)
 |    |-- type: string (nullable = true)

第 2 步*:按

id*1
分组:

val df2 = df1.groupBy("id*1")
                         .agg(first("name*1").as("name"),
                              first("ppu*1").as("ppu"),
                              first("type*1").as("type"),
                              collect_list("topping").as("toppings"),
                              collect_list("batters").as("batters"))
                         .withColumnRenamed("id*1", "id")

结果:

root
 |-- id: string (nullable = true)
 |-- name: string (nullable = true)
 |-- ppu: string (nullable = true)
 |-- type: string (nullable = true)
 |-- toppings: array (nullable = false)
 |    |-- element: struct (containsNull = false)
 |    |    |-- id: string (nullable = true)
 |    |    |-- type: string (nullable = true)
 |-- batters: array (nullable = false)
 |    |-- element: struct (containsNull = false)
 |    |    |-- id: string (nullable = true)
 |    |    |-- type: string (nullable = true)

第2步*:转换成Json:

df2.select(to_json(struct("id", "name", "ppu", "type", "toppings", "batters"))).show(truncate=false)

结果:

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|to_json(struct(id, name, ppu, type, toppings, batters))                                                                                                                                                   |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"id":"0001","name":"Cake","ppu":"0.55","type":"donut","toppings":[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"}],"batters":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"}]}|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
© www.soinside.com 2019 - 2024. All rights reserved.