熊猫分组和变形

问题描述 投票:0回答:1

我有一个由以下方式生成的数据框:

df = pd.DataFrame({'date' : [*['2020-01-01']*3, *['2020-01-02']*3, *['2020-01-03']*3], 
                   'id' : ['A1', 'A2', 'A3']*3, 
                   'qty' : [50, 10, 20, 40, 10, 20, 40, 15, 25]
                  }).sort_values('date')

我想获得一个新列“delta”,它是每个日期/id 的数量差异。我以为

df['delta'] = df.groupby(['date', 'id'])['qty'].transform(lambda x: x.diff()).sort_index()
可以工作,但我得到:

date    id  qty variation
0   2020-01-01  A1  50  NaN
1   2020-01-01  A2  10  NaN
2   2020-01-01  A3  20  NaN
3   2020-01-02  A1  40  NaN
4   2020-01-02  A2  10  NaN
5   2020-01-02  A3  20  NaN
6   2020-01-03  A1  40  NaN
7   2020-01-03  A2  15  NaN
8   2020-01-03  A3  25  NaN

我期望得到的地方:

date    id  qty variation
0   2020-01-01  A1  50  NaN
1   2020-01-01  A2  10  NaN
2   2020-01-01  A3  20  NaN
3   2020-01-02  A1  40  -10
4   2020-01-02  A2  10  0
5   2020-01-02  A3  20  0
6   2020-01-03  A1  40  0
7   2020-01-03  A2  15  5
8   2020-01-03  A3  25  5

有什么建议吗?

python pandas dataframe transform
1个回答
0
投票

您的方法的问题在于

transform
独立应用于每个组,因此它计算每个组内的差异,但不会计算具有相同日期的不同组之间的差异。要达到所需的结果,您可以将
groupby
diff
一起使用。

更正代码:

import pandas as pd

df = pd.DataFrame({
    'date': [*['2020-01-01']*3, *['2020-01-02']*3, *['2020-01-03']*3], 
    'id': ['A1', 'A2', 'A3']*3, 
    'qty': [50, 10, 20, 40, 10, 20, 40, 15, 25]
}).sort_values('date')

df['variation'] = df.groupby('id')['qty'].diff().fillna(0)

输出:

        date    id  qty variation
0   2020-01-01  A1  50  0.0
1   2020-01-01  A2  10  0.0
2   2020-01-01  A3  20  0.0
3   2020-01-02  A1  40  -10.0
4   2020-01-02  A2  10  0.0
5   2020-01-02  A3  20  0.0
6   2020-01-03  A1  40  0.0
7   2020-01-03  A2  15  5.0
8   2020-01-03  A3  25  5.0
© www.soinside.com 2019 - 2024. All rights reserved.