分组依据后如何将值汇总到地图列表中?

问题描述 投票:0回答:1

我有一张类似的桌子

id  | fruit  | buy_time
------------------------
1   | apple  | 100
1   | banana | 105        
2   | grapes | 102
2   | orange | 101
2   | apple | 110

我的预期输出(按ID分组的地图列表)

id  | buy_info
------------------------
1   | [{"fruit": "apple", "time": 100}, {"fruit": "banana", "time": 105}]
2   | [{"fruit": "orange", "time": 101}, {"fruit": "grapes", "time": 102}, {"fruit": "apple", "time": 110}]

scala dataframe apache-spark apache-spark-sql
1个回答
0
投票

.groupByto_json (Spark-2.4+) + collect_list +struct功能一起使用。

Example:

import org.apache.spark.sql.functions._
val df=Seq((1,"apple",100),(1,"banana",105),(2,"grapes",102),(2,"orange",101),(2,"apple",101)).toDF("id","fruit","buy_time")

df.groupBy("id").agg(to_json(collect_list(struct(col("fruit"),col("buy_time").alias("time")))).alias("buy_info")).show(10,false)
//+---+------------------------------------------------------------------------------------------+
//|id |buy_info                                                                                  |
//+---+------------------------------------------------------------------------------------------+
//|1  |[{"fruit":"apple","time":100},{"fruit":"banana","time":105}]                              |
//|2  |[{"fruit":"grapes","time":102},{"fruit":"orange","time":101},{"fruit":"apple","time":101}]|
//+---+------------------------------------------------------------------------------------------+
© www.soinside.com 2019 - 2024. All rights reserved.