Pyspark,帮助更改此代码以减去不同的行

问题描述 投票:0回答:1

当周 > 1 时,我想从前一个 value1 行中减去当前 value2 行

这是我的数据: '''

from pyspark.sql import functions as f

data = [
    (1, 1, 1),
    (2, 0, 5),
    (3, 0, 10),
    (4, 0, 20),
    (5, 0, 30),
    (6, 0, 40)
]
columns = ["week", "value1", "value2"]
df = spark.createDataFrame(data, columns)

'''

这是代码:

w=Window.orderBy("week")
                          
df2 = df.withColumn('value1',
                    f.when((f.col('week') > 1),
                           f.sum(f.lag(df['value1']).over(w) - df['value2']).over(w)
                          ).otherwise(
                        f.col('value1')
                    )
                   )

当前代码从当前 value2 行中减去前一行值 1,但我想将其切换为从前一个 value1 行中减去当前 value2 行。

答案应该是这样的:

+----+------+------+
|week|value1|value2|
+----+------+------+
|1   |1     |1     |
|2   |4     |5     |
|3   |6     |10    |
|4   |14    |20    |
|5   |16    |30    |
|6   |24    |40    |
+----+------+------+

感谢所有在这里提供帮助的人,我非常感谢你们!

pyspark
1个回答
0
投票

简单,使用

row_number
函数来识别偶数行或奇数行,将它们乘以
-1
,然后将
abs(running sum)
应用于该列以获得所需的输出。

df.
withColumn(
    "value1", 
    expr("""
        ABS(
            CASE WHEN week > 1 THEN 
                SUM(
                    CASE WHEN (ROW_NUMBER() OVER(ORDER BY week) % 2 == 0) THEN 
                        -1 * value2 
                    ELSE 
                        value2 
                    END
                ) OVER(ORDER BY week)
            ELSE 
                value1 
            END
        )
    """)
).show(False)
+----+------+------+
|week|value1|value2|
+----+------+------+
|1   |1     |1     |
|2   |4     |5     |
|3   |6     |10    |
|4   |14    |20    |
|5   |16    |30    |
|6   |24    |40    |
+----+------+------+
© www.soinside.com 2019 - 2024. All rights reserved.