当周 > 1 时,我想从前一个 value1 行中减去当前 value2 行
这是我的数据: '''
from pyspark.sql import functions as f
data = [
(1, 1, 1),
(2, 0, 5),
(3, 0, 10),
(4, 0, 20),
(5, 0, 30),
(6, 0, 40)
]
columns = ["week", "value1", "value2"]
df = spark.createDataFrame(data, columns)
'''
这是代码:
w=Window.orderBy("week")
df2 = df.withColumn('value1',
f.when((f.col('week') > 1),
f.sum(f.lag(df['value1']).over(w) - df['value2']).over(w)
).otherwise(
f.col('value1')
)
)
当前代码从当前 value2 行中减去前一行值 1,但我想将其切换为从前一个 value1 行中减去当前 value2 行。
答案应该是这样的:
+----+------+------+
|week|value1|value2|
+----+------+------+
|1 |1 |1 |
|2 |4 |5 |
|3 |6 |10 |
|4 |14 |20 |
|5 |16 |30 |
|6 |24 |40 |
+----+------+------+
感谢所有在这里提供帮助的人,我非常感谢你们!
简单,使用
row_number
函数来识别偶数行或奇数行,将它们乘以 -1
,然后将 abs(running sum)
应用于该列以获得所需的输出。
df.
withColumn(
"value1",
expr("""
ABS(
CASE WHEN week > 1 THEN
SUM(
CASE WHEN (ROW_NUMBER() OVER(ORDER BY week) % 2 == 0) THEN
-1 * value2
ELSE
value2
END
) OVER(ORDER BY week)
ELSE
value1
END
)
""")
).show(False)
+----+------+------+
|week|value1|value2|
+----+------+------+
|1 |1 |1 |
|2 |4 |5 |
|3 |6 |10 |
|4 |14 |20 |
|5 |16 |30 |
|6 |24 |40 |
+----+------+------+