如何基于group by添加新列并向列添加条件？

Question

我有以下代码

import pandas as pd
import numpy as np

data = {
    'id': [1, 2, 3, 4, 5, 6, 7],
    'date': ['2019-02-01', '2019-02-10', '2019-02-25', '2019-03-05', '2019-03-16', '2019-04-05', '2019-05-15'],
    'date_difference': [None, 9, 15, 11, 10, 19, 40],
    'number': [1, 0, 1, 0, 0, 0, 0],
    'text': ['A', 'A', 'A', 'A', 'A', 'B', 'B']
}

df = pd.DataFrame(data)

id	日期	日期_差异	数字	文字
1	2019-02-01	空	1	A
2	2019-02-10	9	0	A
3	2019-02-25	15	1	A
4	2019-03-05	11	0	A
5	2019-03-16	10	0	A
6	2019-04-05	19	0	B
7	2019-05-15	40	0	B

基于

text

和

number

列，我想生成一个名为

test

的新列。在每组中，

text

列从日期开始按降序排列。当

number == 0

时，步长从

开始。当它在组内找到

时，步长增加

。如果组内的

列中没有

number

，则步长在同一组中保持为

。

我有以下代码，但无法产生所需的结果。

df['test'] = df.groupby(['text', 'number'])['number'].transform(lambda x, step_size=1: step_size if x.iloc[0] == 0 else None)

决赛桌应该是这样的

id	日期	日期_差异	数字	文字	测试
1	2019-02-01	空	1	A	2
2	2019-02-10	9	0	A	2
3	2019-02-25	15	1	A	1
4	2019-03-05	11	0	A	1
5	2019-03-16	10	0	A	1
6	2019-04-05	19	0	B	1
7	2019-05-15	40	0	B	1

Answer 1

我的尝试：

import pandas as pd


data = {
    'id': [1, 2, 3, 4, 5, 6, 7],
    'date': ['2019-02-01', '2019-02-10', '2019-02-25', '2019-03-05', '2019-03-16', '2019-04-05', '2019-05-15'],
    'date_difference': [None, 9, 15, 11, 10, 19, 40],
    'number': [1, 0, 1, 0, 0, 0, 0],
    'text': ['A', 'A', 'A', 'A', 'A', 'B', 'B']
}

df = pd.DataFrame(data)

out = df.assign(
    # We assign the following values to the series name "test"
    test=df
    # Group on "text" -- if we grouped on ["text", "number"] we wouldn't see different numbers within the groups.
    .groupby("text")
    # Apply a chain of methods to the group (a pd.DataFrame).
    .apply(
        lambda g: (
            # We sort "date" in descending order as you mention this partially controls the step size.
            g.sort_values(by="date", ascending=False)
            # We shift "number" forward one period with a fill_value of 1 for any newly introduced nulls.
            .number.shift(periods=1, fill_value=1)
            # Cumulatively sum the shifted "number" values
            .cumsum()
        )
        # This will result in the new series, albeit sorted by descending "date".
    )
    # Drop the "text" level of the new multi-index.
    .droplevel("text")
    # The assign method acts as join, rearranging the newly created series to match the index of `df`.
)
print(out)

   id        date  date_difference  number text  test
0   1  2019-02-01              NaN       1    A     2
1   2  2019-02-10              9.0       0    A     2
2   3  2019-02-25             15.0       1    A     1
3   4  2019-03-05             11.0       0    A     1
4   5  2019-03-16             10.0       0    A     1
5   6  2019-04-05             19.0       0    B     1
6   7  2019-05-15             40.0       0    B     1

如何基于group by添加新列并向列添加条件？

问题描述投票：0回答：1

1个回答

最新问题

如何基于group by添加新列并向列添加条件？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1