我正在使用以下数据集,但在根据团队 ID 计算总分时遇到了麻烦。一支球队可以是主场也可以是客场,我希望计算他们的总得分。
我已经成功创建了基于 home_id 和away_id 的运行总计以及基于 home/away id 的运行平均值,但我很难根据这两列进行计算
例如,如果在第 1 场比赛中主队得分 1,然后在第 2 场比赛中他们是客队并得分 3,我想创建一个列,表示到目前为止他们在数据集中总共得分 4
到目前为止我的代码是:
import pandas as pd
game_data = pd.read_csv('game_data.csv')
game_data['home_avg_home_games'] = game_data.groupby('home_id')['home_score'].transform(lambda x: x.rolling(165, min_periods = 0).mean())
game_data['home_avg_against_home_games'] = game_data.groupby('home_id')['away_score'].transform(lambda x: x.rolling(165, min_periods = 0).mean())
game_data['away_avg_away_games'] = game_data.groupby('away_id')['away_score'].transform(lambda x: x.rolling(165, min_periods = 0).mean())
game_data['away_avg_against_away_games'] = game_data.groupby('away_id')['home_score'].transform(lambda x: x.rolling(165, min_periods = 0).mean())
game_data['scored_home_total'] = game_data.groupby('home_id')['home_score'].cumsum()
game_data['scored_away_total'] = game_data.groupby('away_id')['away_score'].cumsum()
游戏_id | 离开_id | home_id | 客场得分 | 首页_分数 | home_avg_home_games | 主场平均对抗主场比赛 | away_avg_away_games | away_avg_against_away_games |
---|---|---|---|---|---|---|---|---|
446877 | 138 | 134 | 1 | 4 | 4 | 1 | 1 | 4 |
446911 | 141 | 139 | 5 | 3 | 3 | 5 | 5 | 3 |
446873 | 121 | 118 | 3 | 4 | 4 | 3 | 3 | 4 |
446875 | 137 | 158 | 12 | 3 | 3 | 12 | 12 | 3 |
446872 | 142 | 110 | 2 | 3 | 3 | 2 | 2 | 3 |
446876 | 136 | 140 | 2 | 3 | 3 | 2 | 2 | 3 |
446874 | 143 | 113 | 2 | 6 | 6 | 2 | 2 | 6 |
446879 | 120 | 144 | 4 | 3 | 3 | 4 | 4 | 3 |
446871 | 119 | 135 | 15 | 0 | 0 | 15 | 15 | 0 |
446878 | 141 | 139 | 5 | 3 | 3 | 5 | 5 | 3 |
446869 | 115 | 109 | 10 | 5 | 5 | 10 | 10 | 5 |
446889 | 112 | 108 | 9 | 0 | 0 | 9 | 9 | 0 |
446868 | 145 | 133 | 4 | 3 | 3 | 4 | 4 | 3 |
446870 | 117 | 147 | 5 | 3 | 3 | 5 | 5 | 3 |
446867 | 111 | 114 | 6 | 2 | 2 | 6 | 6 | 2 |
446896 | 121 | 118 | 2 | 0 | 2 | 2.5 | 2.5 | 2 |
446910 | 138 | 134 | 5 | 6 | 5 | 3 | 3 | 5 |
446887 | 141 | 139 | 2 | 3 | 3 | 4 | 4 | 3 |
446883 | 116 | 146 | 8 | 7 | 7 | 8 | 8 | 7 |
446886 | 136 | 140 | 10 | 2 | 2.5 | 6 | 6 | 2.5 |
446885 | 137 | 158 | 2 | 1 | 2 | 7 | 7 | 2 |
446882 | 115 | 109 | 6 | 11 | 8 | 8 | 8 | 8 |
446880 | 112 | 108 | 6 | 1 | 0.5 | 7.5 | 7.5 | 0.5 |
446881 | 145 | 133 | 5 | 4 | 3.5 | 4.5 | 4.5 | 3.5 |
446884 | 119 | 135 | 3 | 0 | 0 | 9 | 9 | 0 |
446901 | 141 | 139 | 3 | 5 | 3.5 | 3.75 | 3.75 | 3.5 |
446898 | 137 | 158 | 3 | 4 | 2.666666667 | 5.666666667 | 5.666666667 | 2.666666667 |
446899 | 136 | 140 | 9 | 5 | 3.333333333 | 7 | 7 | 3.333333333 |
446891 | 115 | 109 | 4 | 3 | 6.333333333 | 6.666666667 | 6.666666667 | 6.333333333 |
我想要的输出是:
游戏_id | 离开_id | home_id | 客场得分 | 首页_分数 | home_avg_for_home_games | 主场平均对抗主场比赛 | away_avg_away_games | away_avg_against_away_games | home_total_score | 客场总分 |
---|---|---|---|---|---|---|---|---|---|---|
446877 | 1 | 2 | 1 | 4 | 4 | 1 | 1 | 4 | 4 | 1 |
446911 | 2 | 3 | 5 | 3 | 3 | 5 | 5 | 3 | 3 | 5 |
446873 | 1 | 3 | 3 | 4 | 3.5 | 4 | 2 | 4 | 7 | 4 |
我创建了名为
total_score
的新列,总结了每个团队在 home_score
和 away_score
上的得分
将 pandas 导入为 pd
data = {
'game_id': [446877, 446911, 446873, 446875, 446872, 446876, 446874, 446879, 446871, 446878, 446869, 446889, 446868, 446870, 446867, 446896, 446910, 446887, 446883, 446886, 446885, 446882, 446880, 446881, 446884, 446901, 446898, 446899, 446891],
'away_id': [138, 141, 121, 137, 142, 136, 143, 120, 119, 141, 115, 112, 145, 117, 111, 121, 138, 141, 116, 136, 137, 115, 112, 145, 119, 141, 137, 136, 115],
'home_id': [134, 139, 118, 158, 110, 140, 113, 144, 135, 139, 109, 108, 133, 147, 114, 118, 134, 139, 146, 140, 158, 109, 108, 133, 135, 139, 158, 140, 109],
'away_score': [1, 5, 3, 12, 2, 2, 2, 4, 15, 5, 10, 9, 4, 5, 6, 2, 5, 3, 8, 10, 2, 6, 6, 5, 3, 3, 3, 9, 4],
'home_score': [4, 3, 4, 3, 3, 3, 6, 3, 0, 3, 5, 0, 3, 3, 2, 0, 6, 3, 7, 2, 1, 11, 1, 4, 0, 5, 4, 5, 3] }
df = pd.DataFrame(data)
df['total_score'] = df.groupby('home_id')['home_score'].cumsum().fillna(0) + df.groupby('away_id')['away_score'].cumsum().fillna(0)
print(df)
最终输出是这样的:
game_id away_id home_id away_score home_score total_score
0 446877 138 134 1 4 5
1 446911 141 139 5 3 8
2 446873 121 118 3 4 7
3 446875 137 158 12 3 15
4 446872 142 110 2 3 5
5 446876 136 140 2 3 5
6 446874 143 113 2 6 8
7 446879 120 144 4 3 7
8 446871 119 135 15 0 15
9 446878 141 139 5 3 16
10 446869 115 109 10 5 15
11 446889 112 108 9 0 9
12 446868 145 133 4 3 7
13 446870 117 147 5 3 8
14 446867 111 114 6 2 8
15 446896 121 118 2 0 9
16 446910 138 134 5 6 16
17 446887 141 139 3 3 22
18 446883 116 146 8 7 15
19 446886 136 140 10 2 17
20 446885 137 158 2 1 18
21 446882 115 109 6 11 32
22 446880 112 108 6 1 16
23 446881 145 133 5 4 16
24 446884 119 135 3 0 18
25 446901 141 139 3 5 30
26 446898 137 158 3 4 25
27 446899 136 140 9 5 31
28 446891 115 109 4 3 39