更新:原问题 此处: 我需要从H2H比赛中获得总进球数的平均值。
Home Away Home_goals Away_goals
------------------------------------
Team 1 Team 2 2 1
Team 3 Team 4 3 5
Team 2 Team 1 5 3
Team 4 Team 3 1 5
输出。
Home Away Home_goals Away_goals Mean
------------------------------------------------------
Team 1 Team 2 2 1 5.5 ((2+1+5+3)/2)
Team 3 Team 4 3 5 7 ((3+5+1+5)/2)
Team 2 Team 1 5 3 5.5 ((2+1+5+3)/2)
Team 4 Team 3 1 5 7 ((3+5+1+5)/2)
下面的代码工作正常,但我遇到了另一个问题。如果我有n场球队1和球队2之间的比赛,我想根据n-1场比赛计算平均数(不包括最后一场)。我可以修改下面的代码吗?
a = np.sort(df[["Home", "Away"]], axis=1)
df['Mean'] = (pd.DataFrame(a, index=df.index)
.assign(sum = df[['Home_goals','Away_goals']].sum(axis=1))
.groupby([0,1])['sum']
.transform('mean'))
谢谢你
你可以添加 iloc
到lambda函数 GroupBy.transform
:
a = np.sort(df[["Home", "Away"]], axis=1)
df['Mean'] = (pd.DataFrame(a, index=df.index)
.assign(sum = df[['Home_goals','Away_goals']].sum(axis=1))
.groupby([0,1])['sum']
.transform(lambda x: x.iloc[:-1].mean()))
下面对每一个两队组合产生一个平均值,同时放弃最后一场比赛。
我添加了几场比赛来证明这一点。
Home Away Home_goals Away_goals
Team-1 Team-2 2 1
Team-3 Team-4 3 5
Team-2 Team-1 5 3
Team-4 Team-3 1 5
Team-2 Team-1 10 10
Team-4 Team-3 10 10
代码:
a = np.sort(df[["Home", "Away"]], axis=1)
df = pd.concat([df, pd.DataFrame(a, columns=["team1", "team2"])], axis="columns")
df["sum"] = df[["Home_goals", "Away_goals"]].sum(axis="columns")
drop_last = df.groupby(["team1", "team2"]).apply(lambda x: x.iloc[:-1]).drop(["team1", "team2"], axis="columns").reset_index()
drop_last["mean"] = drop_last.groupby(["team1", "team2"])["sum"].transform("mean")
drop_last = drop_last[["team1", "team2", "mean"]]
drop_last = drop_last.drop_duplicates()
res = pd.merge(df, drop_last, on = ["team1", "team2"])
print(res)
结果:
Home Away Home_goals Away_goals team1 team2 sum mean
0 Team-1 Team-2 2 1 Team-1 Team-2 3 5.5
1 Team-2 Team-1 5 3 Team-1 Team-2 8 5.5
2 Team-2 Team-1 10 10 Team-1 Team-2 20 5.5
3 Team-3 Team-4 3 5 Team-3 Team-4 8 7.0
4 Team-4 Team-3 1 5 Team-3 Team-4 6 7.0
5 Team-4 Team-3 10 10 Team-3 Team-4 20 7.0