IN聚合,基于SUM,使用Scala在Apache Spark Dataframe中选择特定的行值

问题描述 投票:0回答:1

我想在下面的数据集中作为SUM(QUANTITY)列进行聚合,并且基于SUM,要选择相关的行值,如下所述:

输入:

val df = Seq(
  ("Acc1","111","1111","Sec1",-4000),
  ("Acc2","222","2222","Sec1",2000),
  ("Acc3","333","3333","Sec1",1000),
  ("Acc4","444","4444","Sec1",-10000)
  ).toDF("ACCOUNT_NO","LONG_IND","SHORT_IND","SECURITY_ID","QUANTITY")

How to aggregation based on the SUM(QUANTITY) such that, in final result
  (a) if Sum is negative, the row values with the maximum negative value (-10000 in this case) should take precedence
  (b) if Sum is positive, the row values with the maximum positive value (2000) should take precedence 

  In this case, SUM is negative = -4000+2000+1000-10000 = -11000, so case (a) should take precedence in above dataset, to give result as below:

Desired Output after aggregation:  
+-----------+----------+--------+---------+--------+
|SECURITY_ID|ACCOUNT_NO|LONG_IND|SHORT_IND|QUANTITY|
+-----------+----------+--------+---------+--------+
|       SEC1|      ACC4|     444|     4444|  -11000|
+-----------+----------+--------+---------+--------+

尝试的方法:

val resultDF = df
    .groupBy("SECURITY_ID")
    .agg(
      max($"SECURITY_ID").as("SECURITY_ID"),
      max($"ACCOUNT_NO").as("ACCOUNT_NO"),
      max(when($"SUM_QUANTITY" > 0, $"LONG_IND")).as("LONG_IND"),
      max(when($"SUM_QUANTITY" < 0, $"SHORT_IND")).as("SHORT_IND"),
      sum($"QUANTITY").cast("Long").as("SUM_QUANTITY")
    )
    .toDF("SECURITY_ID", "ACCOUNT_NO","LONG_IND","SHORT_IND","QUANTITY")

是否可以通过某种方式使用RANK来获得此结果?

scala apache apache-spark hadoop
1个回答
0
投票

是的,您可以将窗口函数与row_number()结合使用以实现所需的功能:

val windowAsc = Window.partitionBy($"SECURITY_ID").orderBy($"QUANTITY".asc)
val windowDesc = Window.partitionBy($"SECURITY_ID").orderBy($"QUANTITY".desc)

df
  .withColumn("sum",sum($"QUANTITY").over(window))
  .withColumn("rnb",when($"sum"<0,row_number().over(windowAsc)).otherwise(row_number().over(windowDesc)))
  .where($"rnb"===1)
  .withColumn("QUANTITY",$"sum")
  .drop("rnb","sum")
  .show()

给予:

+----------+--------+---------+-----------+--------+
|ACCOUNT_NO|LONG_IND|SHORT_IND|SECURITY_ID|QUANTITY|
+----------+--------+---------+-----------+--------+
|      Acc4|     444|     4444|       Sec1|  -11000|
+----------+--------+---------+-----------+--------+
© www.soinside.com 2019 - 2024. All rights reserved.