透过Dataframe列转换用户ID Spark [复制]

问题描述 投票:0回答:1

这个问题在这里已有答案:

我有一个看起来像的数据帧

+------+------------+------------------+
|UserID|Attribute   | Value            |
+------+------------+------------------+
|123   |  City      | San Francisco    |
|123   |  Lang      | English          |
|111   |  Lang      | French           |
|111   |  Age       | 23               |
|111   |  Gender    | Female           |
+------+------------+------------------+

所以我有一些不同的属性,对于某些用户来说可以为空(有限的属性说最多20个)

我想将此DF转换为

+-----+--------------+---------+-----+--------+
|User |City          | Lang    | Age | Gender |
+-----+--------------+---------+-----+--------+
|123  |San Francisco | English | NULL| NULL   |
|111  |          NULL| French  | 23  | Female |
+-----+--------------+---------+-----+--------+

我对Spark和Scala很新。

scala apache-spark spark-dataframe
1个回答
2
投票

您可以使用pivot获得所需的输出:

import org.apache.spark.sql.functions._
import sparkSession.sqlContext.implicits._

df.groupBy("UserID")
  .pivot("Attribute")
  .agg(first("Value")).show()    

这将为您提供所需的输出:

+------+----+-------------+------+-------+
|UserID| Age|         City|Gender|   Lang|
+------+----+-------------+------+-------+
|   111|  23|         null|Female| French|
|   123|null|San Francisco|  null|English|
+------+----+-------------+------+-------+
© www.soinside.com 2019 - 2024. All rights reserved.