3、6、12 个月前间隔 pyspark

问题描述 投票:0回答:1

如何选择 V1 行 3、6、12 个月间隔 假设有如下表所示的表格

V1
202307 10
202306 20
202305 30
202304 40
202303 50
202302 60
202301 70

我想像下表一样创建它。 V2是3个月前的,V3是6个月前的

V1 V2 V3
202307 10 30 60
pyspark
1个回答
0
投票

检查下面。

df
.withColumn(
  "first_date", 
  expr("""FIRST(TO_DATE(month, 'yyyyMM')) OVER(ORDER BY month DESC)""")
)
.withColumn(
  "diff", 
   expr("""CAST(MONTHS_BETWEEN(first_date, TO_DATE(month, 'yyyyMM')) AS INT) + 1""")
)
.selectExpr("FILTER(collect_list(struct(first_date, v1, diff)),e -> e.diff == 1 OR e.diff % 3 == 0) AS list")
.selectExpr(
    "list.first_date[0] AS month", 
    "FILTER(list, e -> e.diff == 1).v1[0] AS V1", 
    "FILTER(list, e -> e.diff == 3).v1[0] AS V2", 
    "FILTER(list, e -> e.diff == 6).v1[0] AS V3"
)
.show(false)
© www.soinside.com 2019 - 2024. All rights reserved.