将数组列转换为PySpark数据框中的结构数组

问题描述 投票:0回答:1

在Pyspark中,我有一个包含3列的数据框

| str1      | array_of_str1        | array_of_str2  |
+-----------+----------------------+----------------+
| John      | [Size, Color]        | [M, Black]     |
| Tom       | [Size, Color]        | [L, White]     |
| Matteo    | [Size, Color]        | [M, Red]       |

我想添加以结构类型包含3列的Array列

| str1      | array_of_str1        | array_of_str2  | concat_result                                 |
+-----------+----------------------+----------------+-----------------------------------------------+
| John      | [Size, Color]        | [M, Black]     | [[[John, Size , M], [John, Color, Black]]]    |
| Tom       | [Size, Color]        | [L, White]     | [[[Tom, Size , L], [Tom, Color, White]]]      |
| Matteo    | [Size, Color]        | [M, Red]       | [[[Matteo, Size , M], [Matteo, Color, Red]]]  |
python arrays apache-spark struct pyspark
1个回答
0
投票

下面的代码将帮助您将concat_result属性添加到现有数据框。

dataframe.selectExpr("*","array(array(str1, array_of_str1[0], array_of_str2[0]),array(str1, array_of_str1[1], array_of_str2[1])) as concat_result").show(false)
© www.soinside.com 2019 - 2024. All rights reserved.