我有以下形式的样本数据:
week year flag_1 flag_2
26 2022 0 0
27 2022 1 0
28 2022 0 0
2 2023 0 1
3 2023 1 0
4 2023 0 0
5 2023 1 1
6 2023 0 1
7 2023 0 0
8 2023 0 0
9 2023 0 0
10 2023 0 1
11 2023 0 1
我想创建两个新列span_flag_1、span_flag_2。
如果 flag_1 == 1,span_flag_1 将包含 1,对于所有其他行,span_flag_1 将是 max(自上一个 flag_1 = 1 起的周数,距下一个 flag_1 = 1 的周数)我怎样才能做同样的事情?
import pandas as pd
data = {
"week": [26, 27, 28, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
"year": [2022, 2022, 2022, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023],
"flag_1": [0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0],
"flag_2": [0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1]
}
df = pd.DataFrame(data)
df['overall_week'] = df['year'] * 52 + df['week']
df['since_last_flag_1'] = df[df['flag_1'] == 1]['overall_week']
df['since_last_flag_1'] = df['since_last_flag_1'].fillna(method='ffill')
df['since_last_flag_1'] = df['overall_week'] - df['since_last_flag_1']
df['till_next_flag_1'] = df[df['flag_1'] == 1]['overall_week']
df['till_next_flag_1'] = df['till_next_flag_1'].fillna(method='bfill')
df['till_next_flag_1'] = df['till_next_flag_1'] - df['overall_week']
df['span_flag_1'] = df[['since_last_flag_1', 'till_next_flag_1']].max(axis=1)
df.loc[df['flag_1'] == 1, 'span_flag_1'] = 1
df.drop(columns=['overall_week', 'since_last_flag_1', 'till_next_flag_1'], inplace=True)
print(df)
这给出了
week year flag_1 flag_2 span_flag_1
0 26 2022 0 0 1.0
1 27 2022 1 0 1.0
2 28 2022 0 0 27.0
3 2 2023 0 1 27.0
4 3 2023 1 0 1.0
5 4 2023 0 0 1.0
6 5 2023 1 1 1.0
7 6 2023 0 1 1.0
8 7 2023 0 0 2.0
9 8 2023 0 0 3.0
10 9 2023 0 0 4.0
11 10 2023 0 1 5.0
12 11 2023 0 1 6.0