我有一张类似的桌子
id | fruit | buy_time
------------------------
1 | apple | 100
1 | banana | 105
2 | grapes | 102
2 | orange | 101
2 | apple | 110
我的预期输出(按ID分组的地图列表)
id | buy_info
------------------------
1 | [{"fruit": "apple", "time": 100}, {"fruit": "banana", "time": 105}]
2 | [{"fruit": "orange", "time": 101}, {"fruit": "grapes", "time": 102}, {"fruit": "apple", "time": 110}]
将.groupBy
与to_json (Spark-2.4+) + collect_list +struct
功能一起使用。
Example:
import org.apache.spark.sql.functions._
val df=Seq((1,"apple",100),(1,"banana",105),(2,"grapes",102),(2,"orange",101),(2,"apple",101)).toDF("id","fruit","buy_time")
df.groupBy("id").agg(to_json(collect_list(struct(col("fruit"),col("buy_time").alias("time")))).alias("buy_info")).show(10,false)
//+---+------------------------------------------------------------------------------------------+
//|id |buy_info |
//+---+------------------------------------------------------------------------------------------+
//|1 |[{"fruit":"apple","time":100},{"fruit":"banana","time":105}] |
//|2 |[{"fruit":"grapes","time":102},{"fruit":"orange","time":101},{"fruit":"apple","time":101}]|
//+---+------------------------------------------------------------------------------------------+