以新表/数据框格式[pyspark]有效地将火花数据框列转置/分解为行[pyspark]

问题描述 投票:1回答:1

如何以这种方式有效地爆炸pyspark数据框:

+----+-------+------+------+
| id |sport  |travel| work |
+----+-------+------+------+
| 1  | 0.2   | 0.4  | 0.6  |
+----+-------+------+------+
| 2  | 0.7   | 0.9  | 0.5  |
+----+-------+------+------+

我想要的输出是这个:

+------+--------+  
| c_id | score  |  
+------+--------+  
| 1    | 0.2    |  
+------+--------+  
| 1    | 0.4    |  
+------+--------+  
| 1    | 0.6    |  
+------+--------+  
| 2    | 0.7    |  
+------+--------+  
| 2    | 0.9    |  
+------+--------+  
| 2    | 0.5    |  
+------+--------+  
python apache-spark pyspark explode
1个回答
0
投票

首先,您可以将3列放在array中,然后是arrays_zip,然后是explode,并用.*解压缩它们,然后再用select并重命名未压缩的列。

© www.soinside.com 2019 - 2024. All rights reserved.