我正在尝试对包含多个相同名称的数据框执行累加和。我想创建另一个df,该df具有每个玩家得分的累积总和,同时还要认识到名称有时不是唯一的。学校将是第二个标准。这是我正在查看的示例:
df = pd.DataFrame({'Player':['John Smith', 'John Smith', 'John Smith', 'John Smith', 'John Smith'],
'School':['Duke', 'Duke', 'Duke', 'Kentucky', 'Kentucky'],
'Date':['1-1-20', '1-3-20', '1-7-20', '1-3-20', '1-08-20'],
'Points Scored':['20', '30', '15', '8', '9']})
print(df)
Player School Date Points Scored
0 John Smith Duke 1-1-20 20
1 John Smith Duke 1-3-20 30
2 John Smith Duke 1-7-20 15
3 John Smith Kentucky 1-3-20 8
4 John Smith Kentucky 1-08-20 9
我尝试使用df.groupby(by = ['Player','School','Date'])。sum()。groupby(level = [0])。cumsum()...但这没有似乎没有区分第二个标准。我也尝试过按学校对值进行排序但在那里找不到任何运气。预期的输出将如下表所示;
Player School Date Points Scored Cumulative Sum Points Scored
0 John Smith Duke 1-1-20 20 20
1 John Smith Duke 1-3-20 30 50
2 John Smith Duke 1-7-20 15 65
3 John Smith Kentucky 1-3-20 8 8
4 John Smith Kentucky 1-08-20 9 17
提前感谢您的帮助!
import numpy as np
import pandas as pd
df = pd.DataFrame({'Player':['John Smith', 'John Smith', 'John Smith', 'John Smith', 'John Smith'],
'School':['Duke', 'Duke', 'Duke', 'Kentucky', 'Kentucky'],
'Date':['1-1-20', '1-3-20', '1-7-20', '1-3-20', '1-08-20'],
'Points Scored':[20, 30, 15, 8, 9]}) # change to integer here
df['Cumulative Sum Points Scored'] = df.groupby(['Player','School'])['Points Scored'].apply(np.cumsum)
输出:
Player School Date Points Scored Cumulative Sum Points Scored
0 John Smith Duke 1-1-20 20 20
1 John Smith Duke 1-3-20 30 50
2 John Smith Duke 1-7-20 15 65
3 John Smith Kentucky 1-3-20 8 8
4 John Smith Kentucky 1-08-20 9 17