使用 pyspark 从具有键值对的 json 对象的嵌套数组中删除字段

问题描述 投票:0回答:1

使用 pyspark 从具有键值对(empval)的 json 对象的嵌套数组中删除 id 字段

输入

+----------+--------+----------------------------------------------------------------------------------------------------------+
| empno    | empcode| empval                                                                                                   |
+----------+--------+----------------------------------------------------------------------------------------------------------+
| employee1| 100DRE | [{"id": "123", "key1": "value1", "key2": "value2"}, {"id": "234", "key1": "te", "key2": "value2"}, {"id": "345", "key1": "grtregert", "key2": "value2"}] |
+----------+--------+----------------------------------------------------------------------------------------------------------+

预期产量

+----------+--------+---------------------------------------------------------------------------------------------------------------------+
| empno    | empcode| newColumn                                                                                                           |
+----------+--------+---------------------------------------------------------------------------------------------------------------------+
| employee1| 100DRE | [{"key1": "value1", "key2": "value2"}, {"key1": "te", "key2": "value2"}, {"key1": "grtregert", "key2": "value2"}]|
+----------+--------+---------------------------------------------------------------------------------------------------------------------+
pyspark databricks
1个回答
0
投票

简单,使用

from_json
函数并将所需的
array<struct<key1: string, key2: string>>
模式传递给它。

df.show(False)
+---------+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
|empno    |empcode|empval                                                                                                                                                  |
+---------+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
|employee1|100DRE |[{"id": "123", "key1": "value1", "key2": "value2"}, {"id": "234", "key1": "te", "key2": "value2"}, {"id": "345", "key1": "grtregert", "key2": "value2"}]|
+---------+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
df
.selectExpr(
   "empno", 
   "empcode", 
   "to_json(from_json(empval, 'array<struct<key1: string, key2: string>>')) AS newColumn"
)
.show(False)
+---------+-------+------------------------------------------------------------------------------------------------------+
|empno    |empcode|newColumn                                                                                             |
+---------+-------+------------------------------------------------------------------------------------------------------+
|employee1|100DRE |[{"key1":"value1","key2":"value2"},{"key1":"te","key2":"value2"},{"key1":"grtregert","key2":"value2"}]|
+---------+-------+------------------------------------------------------------------------------------------------------+
© www.soinside.com 2019 - 2024. All rights reserved.