这个问题在这里已有答案:
我有一个看起来像的数据帧
+------+------------+------------------+
|UserID|Attribute | Value |
+------+------------+------------------+
|123 | City | San Francisco |
|123 | Lang | English |
|111 | Lang | French |
|111 | Age | 23 |
|111 | Gender | Female |
+------+------------+------------------+
所以我有一些不同的属性,对于某些用户来说可以为空(有限的属性说最多20个)
我想将此DF转换为
+-----+--------------+---------+-----+--------+
|User |City | Lang | Age | Gender |
+-----+--------------+---------+-----+--------+
|123 |San Francisco | English | NULL| NULL |
|111 | NULL| French | 23 | Female |
+-----+--------------+---------+-----+--------+
我对Spark和Scala很新。
您可以使用pivot
获得所需的输出:
import org.apache.spark.sql.functions._
import sparkSession.sqlContext.implicits._
df.groupBy("UserID")
.pivot("Attribute")
.agg(first("Value")).show()
这将为您提供所需的输出:
+------+----+-------------+------+-------+
|UserID| Age| City|Gender| Lang|
+------+----+-------------+------+-------+
| 111| 23| null|Female| French|
| 123|null|San Francisco| null|English|
+------+----+-------------+------+-------+