Pandas 或 pyspark 跨列创建

问题描述 投票:0回答:1

我有以下形式的样本数据:

week    year    flag_1  flag_2
26      2022    0        0
27      2022    1        0
28      2022    0        0
2       2023    0        1
3       2023    1        0
4       2023    0        0
5       2023    1        1
6       2023    0        1
7       2023    0        0
8       2023    0        0
9       2023    0        0
10      2023    0        1
11      2023    0        1

我想创建两个新列span_flag_1、span_flag_2。

如果 flag_1 == 1,span_flag_1 将包含 1,对于所有其他行,span_flag_1 将是 max(自上一个 flag_1 = 1 起的周数,距下一个 flag_1 = 1 的周数)

我怎样才能做同样的事情?

python pandas pyspark dataset
1个回答
0
投票
你可以用 pandas 来实现这一点

import pandas as pd data = { "week": [26, 27, 28, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "year": [2022, 2022, 2022, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023], "flag_1": [0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0], "flag_2": [0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1] } df = pd.DataFrame(data) df['overall_week'] = df['year'] * 52 + df['week'] df['since_last_flag_1'] = df[df['flag_1'] == 1]['overall_week'] df['since_last_flag_1'] = df['since_last_flag_1'].fillna(method='ffill') df['since_last_flag_1'] = df['overall_week'] - df['since_last_flag_1'] df['till_next_flag_1'] = df[df['flag_1'] == 1]['overall_week'] df['till_next_flag_1'] = df['till_next_flag_1'].fillna(method='bfill') df['till_next_flag_1'] = df['till_next_flag_1'] - df['overall_week'] df['span_flag_1'] = df[['since_last_flag_1', 'till_next_flag_1']].max(axis=1) df.loc[df['flag_1'] == 1, 'span_flag_1'] = 1 df.drop(columns=['overall_week', 'since_last_flag_1', 'till_next_flag_1'], inplace=True) print(df)
这给出了

week year flag_1 flag_2 span_flag_1 0 26 2022 0 0 1.0 1 27 2022 1 0 1.0 2 28 2022 0 0 27.0 3 2 2023 0 1 27.0 4 3 2023 1 0 1.0 5 4 2023 0 0 1.0 6 5 2023 1 1 1.0 7 6 2023 0 1 1.0 8 7 2023 0 0 2.0 9 8 2023 0 0 3.0 10 9 2023 0 0 4.0 11 10 2023 0 1 5.0 12 11 2023 0 1 6.0
    
© www.soinside.com 2019 - 2024. All rights reserved.