如何将 json 数组解析为多列 - Pyspark

问题描述 投票:0回答:1

我有一张像下面这样的桌子

工作_id 工作分数 工作状态 工作详情
工作_a 234.5 活跃 [{“xyz”:“”,“abc”:“”,“def”:“avg_xyz_value”,“ghi”:“”,“jkl”:“XYZDatafeed110m2211V1”,“mno”:{“uvw”:“ XYZ_TAG_30201.PV"},"pqr": 0.26911367973842457,"stu": 0.1234},{"xyz": "","abc": "","def": "avg_xyz_value","ghi":""," jkl":"XYZDatafeed110m2211V1","mno": {"uvw":"XYZ_TAG_30202.PV"},"pqr": 0.16911367973842457,"stu": 0.3623},{"xyz": "","abc": "" ,“def”:“avg_xyz_value”,“ghi”:“”,“jkl”:“XYZDatafeed110m2211V1”,“mno”:{“uvw”:“XYZ_TAG_30203.PV”},“pqr”:0.36911367973842457,“stu”: 0.2345},{"xyz": "","abc": "","def": "avg_xyz_value","ghi":"","jkl":"XYZDatafeed110m2211V1","mno": {"uvw": "XYZ_TAG_30204.PV"},"pqr": 0.46911367973842457,"stu": 0.9345},{"xyz": "","abc": "","def": "avg_xyz_value","ghi":"", “jkl”:“XYZDatafeed110m2211V1”,“mno”:{“uvw”:“XYZ_TAG_30201.PV”},“pqr”:0.56911367973842457,“stu”:0.5345}

我想将数据解析为多个列,如下所示

工作_id 工作分数 工作状态 uvw 斯图
XYZ数据馈送110m2211V1 234.5 活跃 XYZ_TAG_30201.PV 0.1234
XYZ数据馈送110m2211V1 234.5 活跃 XYZ_TAG_30202.PV 0.3623
XYZ数据馈送110m2211V1 234.5 活跃 XYZ_TAG_30203.PV 0.2345
XYZ数据馈送110m2211V1 234.5 活跃 XYZ_TAG_30204.PV 0.9345
XYZ数据馈送110m2211V1 234.5 活跃 XYZ_TAG_30204.PV 0.5345

如何在 pyspark 中执行此操作?

非常感谢您的帮助!

pyspark
1个回答
0
投票

检查下面的代码。

df
.withColumn(
    "job_details",
    expr("
      transform(
        from_json(
          job_details,
          'array<map<string,string>>'
        ),
        e -> named_struct(
              'job_id', e['jkl'],
              'job_score',job_score,
              'job_status',job_status,
              'uvw', from_json(e['mno'],'map<string,string>')['uvw'],
              'stu',e['stu']
            )
      )
    ")
)
.selectExpr("inline(job_details)")
.show(false)

+---------------------+---------+----------+----------------+------+
|job_id               |job_score|job_status|uvw             |stu   |
+---------------------+---------+----------+----------------+------+
|XYZDatafeed110m2211V1|234.5    |Active    |XYZ_TAG_30201.PV|0.1234|
|XYZDatafeed110m2211V1|234.5    |Active    |XYZ_TAG_30202.PV|0.3623|
|XYZDatafeed110m2211V1|234.5    |Active    |XYZ_TAG_30203.PV|0.2345|
|XYZDatafeed110m2211V1|234.5    |Active    |XYZ_TAG_30204.PV|0.9345|
|XYZDatafeed110m2211V1|234.5    |Active    |XYZ_TAG_30201.PV|0.5345|
+---------------------+---------+----------+----------------+------+
© www.soinside.com 2019 - 2024. All rights reserved.