我有一张像下面这样的桌子
工作_id | 工作分数 | 工作状态 | 工作详情 |
---|---|---|---|
工作_a | 234.5 | 活跃 | [{“xyz”:“”,“abc”:“”,“def”:“avg_xyz_value”,“ghi”:“”,“jkl”:“XYZDatafeed110m2211V1”,“mno”:{“uvw”:“ XYZ_TAG_30201.PV"},"pqr": 0.26911367973842457,"stu": 0.1234},{"xyz": "","abc": "","def": "avg_xyz_value","ghi":""," jkl":"XYZDatafeed110m2211V1","mno": {"uvw":"XYZ_TAG_30202.PV"},"pqr": 0.16911367973842457,"stu": 0.3623},{"xyz": "","abc": "" ,“def”:“avg_xyz_value”,“ghi”:“”,“jkl”:“XYZDatafeed110m2211V1”,“mno”:{“uvw”:“XYZ_TAG_30203.PV”},“pqr”:0.36911367973842457,“stu”: 0.2345},{"xyz": "","abc": "","def": "avg_xyz_value","ghi":"","jkl":"XYZDatafeed110m2211V1","mno": {"uvw": "XYZ_TAG_30204.PV"},"pqr": 0.46911367973842457,"stu": 0.9345},{"xyz": "","abc": "","def": "avg_xyz_value","ghi":"", “jkl”:“XYZDatafeed110m2211V1”,“mno”:{“uvw”:“XYZ_TAG_30201.PV”},“pqr”:0.56911367973842457,“stu”:0.5345} |
我想将数据解析为多个列,如下所示
工作_id | 工作分数 | 工作状态 | uvw | 斯图 |
---|---|---|---|---|
XYZ数据馈送110m2211V1 | 234.5 | 活跃 | XYZ_TAG_30201.PV | 0.1234 |
XYZ数据馈送110m2211V1 | 234.5 | 活跃 | XYZ_TAG_30202.PV | 0.3623 |
XYZ数据馈送110m2211V1 | 234.5 | 活跃 | XYZ_TAG_30203.PV | 0.2345 |
XYZ数据馈送110m2211V1 | 234.5 | 活跃 | XYZ_TAG_30204.PV | 0.9345 |
XYZ数据馈送110m2211V1 | 234.5 | 活跃 | XYZ_TAG_30204.PV | 0.5345 |
如何在 pyspark 中执行此操作?
非常感谢您的帮助!
检查下面的代码。
df
.withColumn(
"job_details",
expr("
transform(
from_json(
job_details,
'array<map<string,string>>'
),
e -> named_struct(
'job_id', e['jkl'],
'job_score',job_score,
'job_status',job_status,
'uvw', from_json(e['mno'],'map<string,string>')['uvw'],
'stu',e['stu']
)
)
")
)
.selectExpr("inline(job_details)")
.show(false)
+---------------------+---------+----------+----------------+------+
|job_id |job_score|job_status|uvw |stu |
+---------------------+---------+----------+----------------+------+
|XYZDatafeed110m2211V1|234.5 |Active |XYZ_TAG_30201.PV|0.1234|
|XYZDatafeed110m2211V1|234.5 |Active |XYZ_TAG_30202.PV|0.3623|
|XYZDatafeed110m2211V1|234.5 |Active |XYZ_TAG_30203.PV|0.2345|
|XYZDatafeed110m2211V1|234.5 |Active |XYZ_TAG_30204.PV|0.9345|
|XYZDatafeed110m2211V1|234.5 |Active |XYZ_TAG_30201.PV|0.5345|
+---------------------+---------+----------+----------------+------+