如何以这种方式有效地爆炸pyspark数据框:
+----+-------+------+------+
| id |sport |travel| work |
+----+-------+------+------+
| 1 | 0.2 | 0.4 | 0.6 |
+----+-------+------+------+
| 2 | 0.7 | 0.9 | 0.5 |
+----+-------+------+------+
我想要的输出是这个:
+------+--------+
| c_id | score |
+------+--------+
| 1 | 0.2 |
+------+--------+
| 1 | 0.4 |
+------+--------+
| 1 | 0.6 |
+------+--------+
| 2 | 0.7 |
+------+--------+
| 2 | 0.9 |
+------+--------+
| 2 | 0.5 |
+------+--------+
首先,您可以将3列放在array
中,然后是arrays_zip
,然后是explode
,并用.*
解压缩它们,然后再用select
并重命名未压缩的列。