使用groupby和cumsum在pandas中获取新列

问题描述 投票:0回答:4

我有以下数据框:

班级 已收到 已发布
FD 10 0
FD 0 2
RM 5 0
RM 0 3
FD 0 2
下午 5 0
下午 1 0
RM 1 0
FD 4 0

我需要下面的数据框:

班级 已收到 已发布 剩余数量
FD 10 0 10
FD 0 2 8
RM 5 0 5
RM 0 3 2
FD 0 2 6
下午 5 0 5
下午 1 0 6
RM 1 0 3
FD 4 0 10

剩余数量列是每个班级收到的-发出的cumsum()。我尝试过不同的方法,但我不明白。

python pandas cumsum
4个回答
3
投票
df['Remaining Quantity'] = df.groupby('Class').apply(lambda x: x['Received'].cumsum() - x['Issued'].cumsum()).reset_index(level = 0, drop=True)

输出:

  Class  Received  Issued  Remaining Quantity
0    FD        10       0                  10
1    FD         0       2                   8
2    RM         5       0                   5
3    RM         0       3                   2
4    FD         0       2                   6
5    PM         5       0                   5
6    PM         1       0                   6
7    RM         1       0                   3
8    FD         4       0                  10

3
投票

另一种可能的解决方案:

df["Remaining Quatity"] = (
    df.eval("tmp=Received-Issued").groupby("Class")["tmp"].cumsum()
)

输出:

print(df)

  Class  Received  Issued  Remaining Quatity
0    FD        10       0                 10
1    FD         0       2                  8
2    RM         5       0                  5
3    RM         0       3                  2
4    FD         0       2                  6
5    PM         5       0                  5
6    PM         1       0                  6
7    RM         1       0                  3
8    FD         4       0                 10

2
投票

另一种解决方案:

df["Remaining Quatity"] = (g := df.groupby("Class").cumsum())["Received"] - g["Issued"]
print(df)

打印:

  Class  Received  Issued  Remaining Quatity
0    FD        10       0                 10
1    FD         0       2                  8
2    RM         5       0                  5
3    RM         0       3                  2
4    FD         0       2                  6
5    PM         5       0                  5
6    PM         1       0                  6
7    RM         1       0                  3
8    FD         4       0                 10

1
投票

一种方法是使用

.stack
计算差异,然后沿索引将值分配回。

df['Remaining Quality'] = df.assign(
            Issued=df['Issued'] * -1).set_index('Class',append=True)\
           .stack().groupby(level=1).cumsum().unstack(-1).droplevel(1,0)['Issued']

print(df)

  Class  Received  Issued  Remaining Quality
0    FD        10       0                 10
1    FD         0       2                  8
2    RM         5       0                  5
3    RM         0       3                  2
4    FD         0       2                  6
5    PM         5       0                  5
6    PM         1       0                  6
7    RM         1       0                  3
8    FD         4       0                 10
© www.soinside.com 2019 - 2024. All rights reserved.