基于多列 pandas 的分组依据和总和的运行总计

问题描述 投票:0回答:1

我正在使用以下数据集,但在根据团队 ID 计算总分时遇到了麻烦。一支球队可以是主场也可以是客场,我希望计算他们的总得分。

我已经成功创建了基于 home_id 和away_id 的运行总计以及基于 home/away id 的运行平均值,但我很难根据这两列进行计算

例如,如果在第 1 场比赛中主队得分 1,然后在第 2 场比赛中他们是客队并得分 3,我想创建一个列,表示到目前为止他们在数据集中总共得分 4

到目前为止我的代码是:

import pandas as pd

game_data = pd.read_csv('game_data.csv')

game_data['home_avg_home_games'] = game_data.groupby('home_id')['home_score'].transform(lambda x: x.rolling(165, min_periods = 0).mean())
game_data['home_avg_against_home_games'] = game_data.groupby('home_id')['away_score'].transform(lambda x: x.rolling(165, min_periods = 0).mean())

game_data['away_avg_away_games'] = game_data.groupby('away_id')['away_score'].transform(lambda x: x.rolling(165, min_periods = 0).mean())
game_data['away_avg_against_away_games'] = game_data.groupby('away_id')['home_score'].transform(lambda x: x.rolling(165, min_periods = 0).mean())

game_data['scored_home_total'] = game_data.groupby('home_id')['home_score'].cumsum()
game_data['scored_away_total'] = game_data.groupby('away_id')['away_score'].cumsum()
游戏_id 离开_id home_id 客场得分 首页_分数 home_avg_home_games 主场平均对抗主场比赛 away_avg_away_games away_avg_against_away_games
446877 138 134 1 4 4 1 1 4
446911 141 139 5 3 3 5 5 3
446873 121 118 3 4 4 3 3 4
446875 137 158 12 3 3 12 12 3
446872 142 110 2 3 3 2 2 3
446876 136 140 2 3 3 2 2 3
446874 143 113 2 6 6 2 2 6
446879 120 144 4 3 3 4 4 3
446871 119 135 15 0 0 15 15 0
446878 141 139 5 3 3 5 5 3
446869 115 109 10 5 5 10 10 5
446889 112 108 9 0 0 9 9 0
446868 145 133 4 3 3 4 4 3
446870 117 147 5 3 3 5 5 3
446867 111 114 6 2 2 6 6 2
446896 121 118 2 0 2 2.5 2.5 2
446910 138 134 5 6 5 3 3 5
446887 141 139 2 3 3 4 4 3
446883 116 146 8 7 7 8 8 7
446886 136 140 10 2 2.5 6 6 2.5
446885 137 158 2 1 2 7 7 2
446882 115 109 6 11 8 8 8 8
446880 112 108 6 1 0.5 7.5 7.5 0.5
446881 145 133 5 4 3.5 4.5 4.5 3.5
446884 119 135 3 0 0 9 9 0
446901 141 139 3 5 3.5 3.75 3.75 3.5
446898 137 158 3 4 2.666666667 5.666666667 5.666666667 2.666666667
446899 136 140 9 5 3.333333333 7 7 3.333333333
446891 115 109 4 3 6.333333333 6.666666667 6.666666667 6.333333333

我想要的输出是:

游戏_id 离开_id home_id 客场得分 首页_分数 home_avg_for_home_games 主场平均对抗主场比赛 away_avg_away_games away_avg_against_away_games home_total_score 客场总分
446877 1 2 1 4 4 1 1 4 4 1
446911 2 3 5 3 3 5 5 3 3 5
446873 1 3 3 4 3.5 4 2 4 7 4
python pandas dataframe data-cleaning
1个回答
0
投票

我创建了名为

total_score
的新列,总结了每个团队在
home_score
away_score
上的得分 将 pandas 导入为 pd

data = {
    'game_id': [446877, 446911, 446873, 446875, 446872, 446876, 446874, 446879, 446871, 446878, 446869, 446889, 446868, 446870, 446867, 446896, 446910, 446887, 446883, 446886, 446885, 446882, 446880, 446881, 446884, 446901, 446898, 446899, 446891],
    'away_id': [138, 141, 121, 137, 142, 136, 143, 120, 119, 141, 115, 112, 145, 117, 111, 121, 138, 141, 116, 136, 137, 115, 112, 145, 119, 141, 137, 136, 115],
    'home_id': [134, 139, 118, 158, 110, 140, 113, 144, 135, 139, 109, 108, 133, 147, 114, 118, 134, 139, 146, 140, 158, 109, 108, 133, 135, 139, 158, 140, 109],
    'away_score': [1, 5, 3, 12, 2, 2, 2, 4, 15, 5, 10, 9, 4, 5, 6, 2, 5, 3, 8, 10, 2, 6, 6, 5, 3, 3, 3, 9, 4],
    'home_score': [4, 3, 4, 3, 3, 3, 6, 3, 0, 3, 5, 0, 3, 3, 2, 0, 6, 3, 7, 2, 1, 11, 1, 4, 0, 5, 4, 5, 3] }

df = pd.DataFrame(data)

df['total_score'] = df.groupby('home_id')['home_score'].cumsum().fillna(0) + df.groupby('away_id')['away_score'].cumsum().fillna(0)

print(df)

最终输出是这样的:

    game_id  away_id  home_id  away_score  home_score  total_score
0    446877      138      134           1           4            5
1    446911      141      139           5           3            8
2    446873      121      118           3           4            7
3    446875      137      158          12           3           15
4    446872      142      110           2           3            5
5    446876      136      140           2           3            5
6    446874      143      113           2           6            8
7    446879      120      144           4           3            7
8    446871      119      135          15           0           15
9    446878      141      139           5           3           16
10   446869      115      109          10           5           15
11   446889      112      108           9           0            9
12   446868      145      133           4           3            7
13   446870      117      147           5           3            8
14   446867      111      114           6           2            8
15   446896      121      118           2           0            9
16   446910      138      134           5           6           16
17   446887      141      139           3           3           22
18   446883      116      146           8           7           15
19   446886      136      140          10           2           17
20   446885      137      158           2           1           18
21   446882      115      109           6          11           32
22   446880      112      108           6           1           16
23   446881      145      133           5           4           16
24   446884      119      135           3           0           18
25   446901      141      139           3           5           30
26   446898      137      158           3           4           25
27   446899      136      140           9           5           31
28   446891      115      109           4           3           39
© www.soinside.com 2019 - 2024. All rights reserved.