如何对 pandas 数据框中的多列进行分组？

Question

我正在处理来自 BoardGameGeek 的棋盘游戏数据，我想创建一个数据框，根据最小玩家数量和类别对棋盘游戏进行分组。

以下是列名称：['name', 'category', 'playtime', 'playtime_num', 'avg_ rating', 'num_ ratings', 'min_players'].

我首先根据“min_players”创建了一个名为“support_solo”的新列，指示棋盘游戏是否支持单人游戏：“支持单人”、“不支持单人”。

然后我创建了一个groupby对象：

grouped = popular.groupby(['support_solo', 'category'])

之后，我调用基本聚合函数来获取每个类别、每个“单人/非单人小组”中游戏数量的详细信息，以及其他字段（例如游戏时间）的平均值。但是，我很难获得每个类别评分最多的游戏。我使用了一个辅助函数和所有 groupby 聚合的字典：

def game_with_highest_ratings(group):
    max_ratings_index = group['num_ratings'].idxmax()
    return group.loc[max_ratings_index, 'name']

aggregations = {
    'name': 'count', # total number of games in each category
    'num_ratings': game_with_highest_ratings, # game with the most ratings in each category
    'avg_rating': 'mean', # average rating of games in each category
    'playtime_num': 'mean', # average playtime of games in each category
}

grouped_result = grouped.agg(aggregations)

我不断收到 KeyError: 'num_ ratings'，并且我不知道如何解决此问题。我已经检查了正确的列名称。我该如何解决这个问题，或者有其他方法吗？

Answer 1

agg

仅接收一个 Series，而不是完整的 DataFrame。

最有效的解决方法可能是聚合

idxmax

并对其进行后处理。

aggregations = {
    'name': 'count', # total number of games in each category
    'num_ratings': 'idxmax', # game with the most ratings in each category
    'avg_rating': 'mean', # average rating of games in each category
    'playtime_num': 'mean', # average playtime of games in each category
}

grouped_result = grouped.agg(aggregations)
grouped_result['num_ratings'] = grouped_result['num_ratings'].map(popular['name'])

如何对 pandas 数据框中的多列进行分组？

问题描述投票：0回答：1

1个回答

最新问题

如何对 pandas 数据框中的多列进行分组？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1