pyspark中的反向分组功能?

问题描述 投票:0回答:1

样本数据:

+-----------+------------+---------+
|City       |Continent   |    Price|
+-----------+------------+---------+
|     A     |  Asia      |      100| 
|     B     |  Asia      |      110|
|     C     |  Africa    |       60|
|     D     |  Europe    |      170|
|     E     |  Europe    |       90|
|     F     |  Africa    |      100|
+-----------+------------+---------+

输出:对于第二列,我知道我们可以只使用

df.groupby("Continent").agg({'Price':'avg'})

但是我们如何计算第三列?第三列按城市分类属于每个大洲,然后计算平均价格。

预期输出

    ------------+--------------+----------------------------------------------+
    |Continent  | Average Price|Average Price for cities not in this continent|
    +-----------+--------------+----------------------------------------------+
    | Asia      |           105|          105                                              |
    | Africa    |            80|          117.5                               |
    | Europe    |           130|          92.5                                |
    +-----------+--------------+----------------------------------------------+
apache-spark pyspark-sql
1个回答
0
投票
    >>> from pyspark.sql.functions import col,avg
    >>> df.show()
    +----+---------+-----+
    |City|Continent|Price|
    +----+---------+-----+
    |   A|     Asia|  100|
    |   B|     Asia|  110|
    |   C|   Africa|   60|
    |   D|   Europe|  170|
    |   E|   Europe|   90|
    |   F|   Africa|  100|
    +----+---------+-----+

 >>> df1 = df.alias("a").join(df.alias("b"), col("a.Continent") != col("b.Continent"),"left").select(col("a.*"), col("b.price").alias("b_price"))   
 >>> df1.groupBy("Continent").agg(avg(col("Price")).alias("Average Price"), avg(col("b_price")).alias("Average Price for cities not in this continent")).show()
    +---------+-------------+----------------------------------------------+
    |Continent|Average Price|Average Price for cities not in this continent|
    +---------+-------------+----------------------------------------------+
    |   Europe|        130.0|                                          92.5|
    |   Africa|         80.0|                                         117.5|
    |     Asia|        105.0|                                         105.0|
    +---------+-------------+----------------------------------------------+
© www.soinside.com 2019 - 2024. All rights reserved.