如何对两列进行分组并使用 pyspark 计算出每个分组列的平均总值

问题描述 投票:0回答:1

我有以下 DataFrame 并使用 Pyspark,我试图得到以下答案:

  1. 按选择的总票价
  2. 选择的总小费
  3. 选择的平均阻力
  4. 平均逐滴拖拽
选择 掉落 票价 提示 拖动
1 1 4.00 4.00 1.00
1 2 5.00 10.00 8.00
1 2 5.00 15.00 12.00
3 2 11.00 12.00 17.00
3 5 41.00 25.00 13.00
4 6 50.00 70.00 2.00

我的查询到目前为止是这样的:

from pyspark.sql import functions as func
from pyspark.sql.functions import desc

df.groupBy('Pick', 'Drop') \
    .agg(
        func.sum('Fare').alias('FarePick'),
        func.sum('Tip').alias('TipPick'),
        func.avg('Drag').alias('AvgDragPick'),
        func.avg('Drag').alias('AvgDragDrop')) \
    .orderBy('Pick').show()

不过,我觉得这似乎不太正确。我有点陷入(4),因为 groupby 似乎不正确。有人可以在这里提出更正建议吗?

python pyspark apache-spark-sql
1个回答
0
投票

我将您的表数据添加到

data
变量中,并将这 4 个步骤分开。

from pyspark.sql import SparkSession
from pyspark.sql import functions as func

spark = SparkSession.builder \
    .appName("testSession") \
    .getOrCreate()

data = [
    (1, 1, 4.00, 4.00, 1.00),
    (1, 2, 5.00, 10.00, 8.00),
    (1, 2, 5.00, 15.00, 12.00),
    (3, 2, 11.00, 12.00, 17.00),
    (3, 5, 41.00, 25.00, 13.00),
    (4, 6, 50.00, 70.00, 2.00)
]

columns = ["Pick", "Drop", "Fare", "Tip", "Drag"]
df = spark.createDataFrame(data, columns)

# 1 and 2 and 3
df.groupBy('Pick').agg(
    func.sum('Fare').alias('TotalFarePick'),
    func.sum('Tip').alias('TotalTipPick'),
    func.avg('Drag').alias('AvgDragPick')
).orderBy('Pick').show()

# 4
df.groupBy('Drop').agg(
    func.avg('Drag').alias('AvgDragDrop')
).orderBy('Drop').show()

spark.stop()

两张表的输出:

+----+-------------+------------+-----------+
|Pick|TotalFarePick|TotalTipPick|AvgDragPick|
+----+-------------+------------+-----------+
|   1|         14.0|        29.0|        7.0|
|   3|         52.0|        37.0|       15.0|
|   4|         50.0|        70.0|        2.0|
+----+-------------+------------+-----------+

+----+------------------+
|Drop|       AvgDragDrop|
+----+------------------+
|   1|               1.0|
|   2|12.333333333333334|
|   5|              13.0|
|   6|               2.0|
+----+------------------+
© www.soinside.com 2019 - 2024. All rights reserved.