从另一个 Dataframe 中的一行创建一个 pyspark Dataframe

问题描述 投票:0回答:0

我有这个数据框:

df_json = spark.sql(select id, type, json from process.base_json where)
df_json.show()

+--------+--------+---------------------------------------------------------------------------------+
|  id    |  type  |                               json                                              |
+--------+--------+---------------------------------------------------------------------------------+
|23016494|TAX     |{"Id":"253","RESULT":{"DATA":{"response":[{"message":"ID 253 invalid"}]}}}       |
|23020867|WARRANTY|{"Id":"108","RESULT":{"DATA":{"result":[{"message":"Nomatches"}]}},"Type":"ID"}  |
|23021055|WARRANTY|{"Id":"332","RESULT":{"DATA":{"detail":{"cre":"BANK","nid":"332"}]}},"Type":"ID"}|
|23016497|TAX     |{"Id":"643","RESULT":{"DATA":{"registry":[{"dv":"5","st":"ACT","name":"MAY"}]}}} |
+-----------+--------------------+------------------------------------------------------------------+

我想为数据框的每一行创建一个新的数据框,以便能够单独解析 json。

df_json1 +--------+--------+---------------------------- ---------------------------------------------- + |编号 |类型 | JSON | +--------+--------+---------------------------- ---------------------------------------------- + |23016494|TAX |{"Id":"253","RESULT":{"DATA":{"response":[{"message":"ID 253 invalid"}]}}} | +--------+--------+---------------------------- ---------------------------------------------- +

df_json2 +--------+--------+---------------------------- ---------------------------------------------- + |编号 |类型 | JSON | +--------+--------+---------------------------- ---------------------------------------------- + |23020867|保修|{"Id":"108","RESULT":{"DATA":{"result":[{"message":"Nomatches"}]}},"Type":"ID"} | +--------+--------+---------------------------- ---------------------------------------------- +

df_json3 +--------+--------+---------------------------- ---------------------------------------------- + |编号 |类型 | JSON | +--------+--------+---------------------------- ---------------------------------------------- + |23021055|保修|{"Id":"332","RESULT":{"DATA":{"detail":{"cre":"BANK","nid":"332"}]}},"输入":"ID"}| +--------+--------+---------------------------- ---------------------------------------------- +

df_json4 +--------+--------+---------------------------- ---------------------------------------------- + |编号 |类型 | JSON | +--------+--------+---------------------------- ---------------------------------------------- + |23016497|TAX |{"Id":"643","RESULT":{"DATA":{"registry":[{"dv":"5","st":"ACT","name": "五月"}]}}} | +------------+--------------------+-------------- ---------------------------------------------- +

json dataframe pyspark
© www.soinside.com 2019 - 2024. All rights reserved.