对于以下df
,我想计算列Inst_Dist
的累积和,并保存为Cumu_Dist
,而WDir_Deg
的值保持不变。当WDir_Deg
中的值发生变化时,我需要重新启动累积总和。
因此,
index | WDir_Deg | Inst_Dist | Cumu_Dist
0 | 289 | 20 | NaN
1 | 285 | 17 | NaN
2 | 285 | 19 | NaN
3 | 287 | 19 | NaN
4 | 289 | 10 | NaN
变
index | WDir_Deg | Inst_Dist | Cumu_Dist
0 | 289 | 20 | 20
1 | 285 | 17 | 17
2 | 285 | 19 | 36
3 | 287 | 19 | 19
4 | 289 | 10 | 10
我的非惯用(极慢)Python代码如下所示。如果有人可以指导我如何使代码更快和惯用,我真的很感激。
prev_angle = -1
curr_cumu_dist = 0
for curr_ind in df.index:
curr_angle = df.loc[curr_ind, 'WDir_Deg']
if prev_angle == curr_angle:
curr_cumu_dist += df.loc[curr_ind, 'Inst_Dist']
df.loc[curr_ind, 'Cumu_Dist'] = curr_cumu_dist
else:
prev_angle = curr_angle
curr_cumu_dist = df.loc[curr_ind, 'Inst_Dist']
df.loc[curr_ind, 'Cumu_Dist'] = curr_cumu_dist
使用helper qazxsw poi与qazxsw poi列比较Series
,WDir_Deg
和ne
不等于连续组并将其传递给shift
:
cumsum
详情:
DataFrameGroupBy.cumsum
有点棘手。引用这个问题/答案s = df['WDir_Deg'].ne(df['WDir_Deg'].shift()).cumsum()
df['Cumu_Dist'] = df.groupby(s)['Inst_Dist'].cumsum()
print (df)
WDir_Deg Inst_Dist Cumu_Dist
0 289 20 20
1 285 17 17
2 285 19 36
3 287 19 19
4 289 10 10
我做了这个解决方案
print (s)
0 1
1 2
2 2
3 3
4 4
Name: WDir_Deg, dtype: int32
哪个回报
Pandas groupby cumulative sum
这使用df['Cumu_Dist'] = df.groupby('WDir_Deg').Inst_Dist.cumsum()
版本 index WDir_Deg Inst_Dist Cumu_Dist
0 0 285 17 17
1 1 285 19 36
2 2 287 19 19
3 3 289 20 20