如何使用 pyspark 将不同的值放在一列中

问题描述 投票:0回答:1

我想将不同的值放在一列中 例如,

NV 值1 值2 值3 值4
234 1 10 0 0 0
234 2 0 15 0 0
234 3 0 0 20 0
234 4 0 0 0 25

根据Q的不同,从不同列拉取的1-4列值是不同的。

我想让它像下表一样带Q或不带Q

NV 值1 值2 值3 值4
234 1234 10 15 20 25
apache-spark pyspark
1个回答
0
投票

尝试下面的代码。

df
.selectExpr(
    "CONCAT_WS('', COLLECT_SET(NV)) as NV",
    "CONCAT_WS('', COLLECT_SET(Q)) as Q",
    "CONCAT_WS('', FILTER(COLLECT_SET(`value 1`), elem -> elem <> 0)) AS `value 1`",
    "CONCAT_WS('', FILTER(COLLECT_SET(`value 2`), elem -> elem <> 0)) AS `value 2`",
    "CONCAT_WS('', FILTER(COLLECT_SET(`value 3`), elem -> elem <> 0)) AS `value 3`",
    "CONCAT_WS('', FILTER(COLLECT_SET(`value 4`), elem -> elem <> 0)) AS `value 4`"
).show(false)

// Exiting paste mode, now interpreting.

+---+----+-------+-------+-------+-------+
|NV |Q   |value 1|value 2|value 3|value 4|
+---+----+-------+-------+-------+-------+
|234|1234|10     |15     |20     |25     |
+---+----+-------+-------+-------+-------+

© www.soinside.com 2019 - 2024. All rights reserved.