groupby Pandas 的一些操作

问题描述 投票:0回答:1

我有这个数据框

import pandas as pd
import math
from pandas import Timestamp

Date = [Timestamp('2024-03-16 23:59:42'), Timestamp('2024-03-16 23:59:42'), Timestamp('2024-03-16 23:59:44'), Timestamp('2024-03-16 23:59:44'), Timestamp('2024-03-16 23:59:44'), Timestamp('2024-03-16 23:59:47'), Timestamp('2024-03-16 23:59:48'), Timestamp('2024-03-16 23:59:48'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49')]
Price = [0.6729, 0.6728, 0.6728, 0.6728, 0.6728, 0.673, 0.6728, 0.6729, 0.6728, 0.6728, 0.6728, 0.6728, 0.6728, 0.6728, 0.6728, 0.6728, 0.6728, 0.6728, 0.6729, 0.6728]
Side = [-1, -1, -1, 1, -1, 1, -1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, -1]
Amount = [1579.2963000000002, 7.400799999999999, 6.728, 177.61919999999998, 797.2679999999999, 33650.0, 131.196, 48.448800000000006, 0.6728, 0.6728, 0.6728, 6.728, 0.6728, 1.3456, 0.6728, 0.6728, 0.6728, 0.6728, 0.6729, 0.6728]
buy = [math.nan, math.nan, math.nan, 177.61919999999998, math.nan, 33650.0, math.nan, 48.448800000000006, math.nan, math.nan, math.nan, math.nan, math.nan, math.nan, math.nan, math.nan, math.nan, math.nan, 49.121700000000004, math.nan]

df = pd.DataFrame({
    'Date':Date,
    'Price':Price,
    'Side':Side,
    'Amount':Amount,
    'buy':buy
})

print(df)

我使用

得到了
buy

df['buy'] = df[df['Side'] == 1].groupby([df['Date'].dt.floor('H'), 'Price'])['Amount'].cumsum()

但是我想在

buy
列中获取 0 而不是 nan 值,如果该价格尚未在组中满足或累积和的先前值

结果

buy
列需要 - [0,0,0,177.6192,177.6192,33650, 177.6192,48.4488, 177.6192,.....]

我该如何实现这个?

python pandas group-by
1个回答
0
投票

您可以

reindex
ffill
fillna

df['buy'] = (df[df['Side'] == 1].groupby([df['Date'].dt.floor('H'), 'Price'])['Amount'].cumsum()
             .reindex(df.index).ffill().fillna(0)
            )

或者分两步:

df['buy'] = df[df['Side'] == 1].groupby([df['Date'].dt.floor('H'), 'Price'])['Amount'].cumsum()
df['buy'] = df['buy'].ffill().fillna(0)

输出:

                  Date   Price  Side      Amount         buy
0  2024-03-16 23:59:42  0.6729    -1   1579.2963      0.0000
1  2024-03-16 23:59:42  0.6728    -1      7.4008      0.0000
2  2024-03-16 23:59:44  0.6728    -1      6.7280      0.0000
3  2024-03-16 23:59:44  0.6728     1    177.6192    177.6192
4  2024-03-16 23:59:44  0.6728    -1    797.2680    177.6192
5  2024-03-16 23:59:47  0.6730     1  33650.0000  33650.0000
6  2024-03-16 23:59:48  0.6728    -1    131.1960  33650.0000
7  2024-03-16 23:59:48  0.6729     1     48.4488     48.4488
8  2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
9  2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
10 2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
11 2024-03-16 23:59:49  0.6728    -1      6.7280     48.4488
12 2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
13 2024-03-16 23:59:49  0.6728    -1      1.3456     48.4488
14 2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
15 2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
16 2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
17 2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
18 2024-03-16 23:59:49  0.6729     1      0.6729     49.1217
19 2024-03-16 23:59:49  0.6728    -1      0.6728     49.1217
© www.soinside.com 2019 - 2024. All rights reserved.