在Pyspark中,我有一个包含3列的数据框
| str1 | array_of_str1 | array_of_str2 |
+-----------+----------------------+----------------+
| John | [Size, Color] | [M, Black] |
| Tom | [Size, Color] | [L, White] |
| Matteo | [Size, Color] | [M, Red] |
我想添加以结构类型包含3列的Array列
| str1 | array_of_str1 | array_of_str2 | concat_result |
+-----------+----------------------+----------------+-----------------------------------------------+
| John | [Size, Color] | [M, Black] | [[[John, Size , M], [John, Color, Black]]] |
| Tom | [Size, Color] | [L, White] | [[[Tom, Size , L], [Tom, Color, White]]] |
| Matteo | [Size, Color] | [M, Red] | [[[Matteo, Size , M], [Matteo, Color, Red]]] |
下面的代码将帮助您将concat_result
属性添加到现有数据框。
dataframe.selectExpr("*","array(array(str1, array_of_str1[0], array_of_str2[0]),array(str1, array_of_str1[1], array_of_str2[1])) as concat_result").show(false)