Pandas:在日期之间填充空值

问题描述 投票:0回答:1

考虑以下数据框

日期 id 值1 值2
2022-01-01 11 1 12
2022-01-02 11 2 13
2022-01-03 11 NaN NaN
2022-01-04 11 NaN NaN
2022-01-05 11 5 15
2022-01-01 22 11 3
2022-01-02 22 NaN NaN
2022-01-03 22 NaN NaN
2022-01-04 22 NaN NaN
2022-01-05 22 12 34
2022-01-01 33 2 4
2022-01-02 33 4 8
2022-01-03 33 NaN NaN
2022-01-04 33 NaN NaN
2022-01-05 33 NaN NaN
2022-01-01 44 NaN NaN
2022-01-02 44 34 89
2022-01-03 44 NaN NaN
2022-01-04 44 35 NaN
2022-01-05 44 NaN NaN

我期望的输出应该是:

  1. 按 id 列分组
  2. 仅当整行为空时才填充 null
  3. 仅当存在上限日期时才在 id 分组列上填充空值。例如。对于 id 11,仅因为 2022-01-05 存在,才会在 2022-01-03 和 2022-01-04 发生空插补。但对于 id 33,将不会有空值插补,因为 2022 年 1 月 3 日之后没有值

最终输出:

日期 id 值1 值2
2022-01-01 11 1 12
2022-01-02 11 2 13
2022-01-03 11 2 13
2022-01-04 11 2 13
2022-01-05 11 5 15
2022-01-01 22 11 3
2022-01-02 22 11 3
2022-01-03 22 11 3
2022-01-04 22 11 3
2022-01-05 22 12 34
2022-01-01 33 2 4
2022-01-02 33 4 8
2022-01-03 33 NaN NaN
2022-01-04 33 NaN NaN
2022-01-05 33 NaN NaN
2022-01-01 44 NaN NaN
2022-01-02 44 34 89
2022-01-03 44 34 89
2022-01-04 44 35 NaN
2022-01-05 44 NaN NaN

我尝试了 ffill() 函数,但它没有考虑第 3 点提到的问题

python-3.x function group-by null
1个回答
0
投票

您的规则是向后填充,但实施向前填充。

样本数据生成:

import pandas as pd

data_dict = {
    "date": ["2022-01-01", "2022-01-02", "2022-01-03", "2022-01-04", "2022-01-05",
             "2022-01-01", "2022-01-02", "2022-01-03", "2022-01-04", "2022-01-05",
             "2022-01-01", "2022-01-02", "2022-01-03", "2022-01-04", "2022-01-05",
             "2022-01-01", "2022-01-02", "2022-01-03", "2022-01-04", "2022-01-05"],
    "id": [11, 11, 11, 11, 11,
           22, 22, 22, 22, 22,
           33, 33, 33, 33, 33,
           44, 44, 44, 44, 44],
    "value1": [1, 2, None, None, 5,
               11, None, None, None, 12,
               2, 4, None, None, None,
               None, 34, None, 35, None],
    "value2": [12, 13, None, None, 15,
               3, None, None, None, 34,
               4, 8, None, None, None,
               None, 89, None, None, None]
}

df = pd.DataFrame(data_dict)
df

使用向后填充创建虚拟填充指示器

bfill
:

df[['dum1', 'dum2']] = df.groupby('id')[['value1', 'value2']].bfill().notna()

然后使用

groupby
向前填充
ffill

df[['fvalue1', 'fvalue2']] = df.groupby('id')[['dum1', 'dum2']].ffill()

最后结合:

df['nvalue1'] = np.where(df['dum1']==True, df['fvalue1'], np.NaN)
df['nvalue2'] = np.where(df['dum2']==True, df['fvalue2'], np.NaN)
© www.soinside.com 2019 - 2024. All rights reserved.