Python/Pandas 中的 Cumsum,条件是将下限重置为零

问题描述 投票:0回答:1

我正在努力寻找我的代码中的错误,请就我如何解决问题和进展寻求您的建议。本质上,我正在尝试计算 Pandas DataFrame 列的累计总和。条件是累计和输出在下降到负值时重置为 0。 DF 由产品类型/活动/数量组成(买入:+ve/卖出:-ve 值)。我提供了用于构建模拟数据报的代码以及用于计算累计和的代码。但是,我并没有真正得到我所期望的输出。该表还包括 2 个附加列(desired_output 和 py_output)——前者是我期望的结果,后来是我在 Python 中看到的运行代码的输出。 我正在使用下面的代码片段来获取 ['quantity'] 列的累计总和:

neg = df['quantity'] < 0
df['py_output'] = df['quantity'].groupby([neg[::-1].cumsum(),df['product']]).cumsum().clip(0)

任何关于我出错的建议/建议以及我可以做些什么来获得正确的输出将不胜感激:-)

import pandas as pd


data = [['Product-1', 'Time-1', '1. BUY', 1395, 1395]
        , ['Product-1', 'Time-2', '2. SELL', -9684, 0]
        , ['Product-1', 'Time-3', '1. BUY', 1352, 1352]
        , ['Product-1', 'Time-4', '2. SELL', -1348, 4]
        , ['Product-1', 'Time-5', '1. BUY', 1951, 1955]
        , ['Product-1', 'Time-6', '2. SELL', -1947, 8]
        , ['Product-1', 'Time-7', '1. BUY', 2554, 2562]
        , ['Product-1', 'Time-8', '1. BUY', 714, 3276]
        , ['Product-1', 'Time-9', '1. BUY', 445, 3721]
        , ['Product-1', 'Time-10', '1. BUY', 2948, 6669]
        , ['Product-1', 'Time-11', '1. BUY', 1995, 8664]
        , ['Product-1', 'Time-12', '2. SELL', -4161, 4503]
        , ['Product-1', 'Time-13', '2. SELL', -4161, 342]
        , ['Product-1', 'Time-14', '2. SELL', -2895, 0]
        , ['Product-1', 'Time-15', '1. BUY', 186, 186]
        , ['Product-1', 'Time-16', '1. BUY', 2646, 2832]
        , ['Product-1', 'Time-17', '1. BUY', 2594, 5426]
        , ['Product-1', 'Time-18', '2. SELL', -3202, 2224]
        , ['Product-1', 'Time-19', '1. BUY', 4170, 6394]
        , ['Product-1', 'Time-20', '1. BUY', 1766, 8160]
        , ['Product-1', 'Time-21', '2. SELL', -4403, 3757]
        , ['Product-1', 'Time-22', '2. SELL', -3523, 234]
        , ['Product-1', 'Time-23', '1. BUY', 1403, 1637]
        , ['Product-1', 'Time-24', '1. BUY', 1566, 3203]
        , ['Product-1', 'Time-25', '2. SELL', -1357, 1846]
        , ['Product-1', 'Time-26', '2. SELL', -1566, 280]
        , ['Product-1', 'Time-27', '1. BUY', 791, 1071]
        , ['Product-1', 'Time-28', '1. BUY', 2384, 3455]
        , ['Product-1', 'Time-29', '1. BUY', 1292, 4747]
        , ['Product-1', 'Time-30', '1. BUY', 1343, 6090]
        , ['Product-1', 'Time-31', '1. BUY', 322, 6412]
        , ['Product-2', 'Time-1', '1. BUY', 1248, 1248]
        , ['Product-2', 'Time-2', '1. BUY', 3276, 4524]
        , ['Product-2', 'Time-3', '1. BUY', 707, 5231]
        , ['Product-2', 'Time-4', '2. SELL', -3534, 1697]
        , ['Product-2', 'Time-5', '1. BUY', 1358, 3055]
        , ['Product-2', 'Time-6', '1. BUY', 253, 3308]
        , ['Product-2', 'Time-7', '2. SELL', -1082, 2226]
        , ['Product-2', 'Time-8', '1. BUY', 238, 2464]
        , ['Product-2', 'Time-9', '1. BUY', 371, 2835]]

cols = ['product', 'time', 'activity', 'quantity', 'desired_output']
 
df = pd.DataFrame(data, columns=cols)
 
neg = df['quantity'] < 0
df['py_output'] = df['quantity'].groupby([neg[::-1].cumsum(),df['product']]).cumsum().clip(0)

print(df)

我通过大量参考资料进行了研究,包括下面的 Stackoverflow 线程。然而,不幸的是,我一直无法找到能给我正确答案的解决方案。

Python Pandas groupby 有限累计和

Pandas DF 上的 Cumsum,对于负累积值重置为零

python pandas numpy conditional-statements cumsum
1个回答
0
投票

如果性能/速度/效率对您来说不是很重要,请尝试使用简单的

for
循环:

cumsum = 0
result = []
for i in df["quantity"]:
    if cumsum + i < 0:
        cumsum = 0
    else:
        cumsum += i
    result.append(cumsum)
df["result"] = result

要分别计算每个产品的总和,您可以使用

groupby
transform

def zero_bounded_cumsum(values):
    cumsum = 0
    result = []
    for i in values:
        if cumsum + i < 0:
            cumsum = 0
        else:
            cumsum += i
        result.append(cumsum)
    return result

df["result"] = df.groupby("product")["quantity"].transform(zero_bounded_cumsum)
© www.soinside.com 2019 - 2024. All rights reserved.