考虑以下数据框
日期 | id | 值1 | 值2 |
---|---|---|---|
2022-01-01 | 11 | 1 | 12 |
2022-01-02 | 11 | 2 | 13 |
2022-01-03 | 11 | NaN | NaN |
2022-01-04 | 11 | NaN | NaN |
2022-01-05 | 11 | 5 | 15 |
2022-01-01 | 22 | 11 | 3 |
2022-01-02 | 22 | NaN | NaN |
2022-01-03 | 22 | NaN | NaN |
2022-01-04 | 22 | NaN | NaN |
2022-01-05 | 22 | 12 | 34 |
2022-01-01 | 33 | 2 | 4 |
2022-01-02 | 33 | 4 | 8 |
2022-01-03 | 33 | NaN | NaN |
2022-01-04 | 33 | NaN | NaN |
2022-01-05 | 33 | NaN | NaN |
2022-01-01 | 44 | NaN | NaN |
2022-01-02 | 44 | 34 | 89 |
2022-01-03 | 44 | NaN | NaN |
2022-01-04 | 44 | 35 | NaN |
2022-01-05 | 44 | NaN | NaN |
我期望的输出应该是:
最终输出:
日期 | id | 值1 | 值2 |
---|---|---|---|
2022-01-01 | 11 | 1 | 12 |
2022-01-02 | 11 | 2 | 13 |
2022-01-03 | 11 | 2 | 13 |
2022-01-04 | 11 | 2 | 13 |
2022-01-05 | 11 | 5 | 15 |
2022-01-01 | 22 | 11 | 3 |
2022-01-02 | 22 | 11 | 3 |
2022-01-03 | 22 | 11 | 3 |
2022-01-04 | 22 | 11 | 3 |
2022-01-05 | 22 | 12 | 34 |
2022-01-01 | 33 | 2 | 4 |
2022-01-02 | 33 | 4 | 8 |
2022-01-03 | 33 | NaN | NaN |
2022-01-04 | 33 | NaN | NaN |
2022-01-05 | 33 | NaN | NaN |
2022-01-01 | 44 | NaN | NaN |
2022-01-02 | 44 | 34 | 89 |
2022-01-03 | 44 | 34 | 89 |
2022-01-04 | 44 | 35 | NaN |
2022-01-05 | 44 | NaN | NaN |
我尝试了 ffill() 函数,但它没有考虑第 3 点提到的问题
您的规则是向后填充,但实施向前填充。
样本数据生成:
import pandas as pd
data_dict = {
"date": ["2022-01-01", "2022-01-02", "2022-01-03", "2022-01-04", "2022-01-05",
"2022-01-01", "2022-01-02", "2022-01-03", "2022-01-04", "2022-01-05",
"2022-01-01", "2022-01-02", "2022-01-03", "2022-01-04", "2022-01-05",
"2022-01-01", "2022-01-02", "2022-01-03", "2022-01-04", "2022-01-05"],
"id": [11, 11, 11, 11, 11,
22, 22, 22, 22, 22,
33, 33, 33, 33, 33,
44, 44, 44, 44, 44],
"value1": [1, 2, None, None, 5,
11, None, None, None, 12,
2, 4, None, None, None,
None, 34, None, 35, None],
"value2": [12, 13, None, None, 15,
3, None, None, None, 34,
4, 8, None, None, None,
None, 89, None, None, None]
}
df = pd.DataFrame(data_dict)
df
使用向后填充创建虚拟填充指示器
bfill
:
df[['dum1', 'dum2']] = df.groupby('id')[['value1', 'value2']].bfill().notna()
然后使用
groupby
向前填充ffill
:
df[['fvalue1', 'fvalue2']] = df.groupby('id')[['dum1', 'dum2']].ffill()
最后结合:
df['nvalue1'] = np.where(df['dum1']==True, df['fvalue1'], np.NaN)
df['nvalue2'] = np.where(df['dum2']==True, df['fvalue2'], np.NaN)