2列日期之间最接近的日期,生成新列

问题描述 投票:0回答:1

我有这个玩具数据集:

df = pd.DataFrame({'user':[1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,4],
                  'd1':['1995-09-01','1995-09-02','1995-10-03','1995-10-04','1995-10-05','1995-11-07','1995-11-08','1995-11-09','1995-11-10','1995-11-15','1995-12-18','1995-12-19','1995-12-20','1995-12-23','1995-12-26','1995-12-30'],
                  'd2':['1995-10-05','1995-10-05','1995-10-05',\
                        '1995-11-08','1995-11-08','1995-11-08','1995-11-08',\
                        '1995-12-10','1995-12-10','1995-12-10','1995-12-10',\
                        '1995-12-27','1995-12-27','1995-12-27','1995-12-27','1995-12-27'],})

当按用户和 d1 (

df = df.sort_values(['user', 'd1'])
) 排序时,得出:

  user      d1         d2
    1   1995-09-01  1995-10-05
    1   1995-09-02  1995-10-05
    1   1995-10-03  1995-10-05
    2   1995-10-04  1995-11-08
    2   1995-10-05  1995-11-08
    2   1995-11-07  1995-11-08
    2   1995-11-08  1995-11-08
    3   1995-11-09  1995-12-10
    3   1995-11-10  1995-12-10
    3   1995-11-15  1995-12-10
    3   1995-12-18  1995-12-10
    4   1995-12-19  1995-12-27
    4   1995-12-20  1995-12-27
    4   1995-12-23  1995-12-27
    4   1995-12-26  1995-12-27
    4   1995-12-30  1995-12-27

需要生成一个新列[d3],其中d1到d2列最接近。例如,如果 d1 中存在 d2 日期,则 d3 显示 d2 日期。否则显示最近的日期。

请注意,结果按用户分组。

以下数据框是所需的结果:

  user      d1          d2         d3
    1   1995-09-01  1995-10-05  1995-10-03
    1   1995-09-02  1995-10-05  1995-10-03
    1   1995-10-03  1995-10-05  1995-10-03
    2   1995-10-04  1995-11-08  1995-11-08
    2   1995-10-05  1995-11-08  1995-11-08
    2   1995-11-07  1995-11-08  1995-11-08
    2   1995-11-08  1995-11-08  1995-11-08
    3   1995-11-09  1995-12-10  1995-12-18
    3   1995-11-10  1995-12-10  1995-12-18
    3   1995-11-15  1995-12-10  1995-12-18
    3   1995-12-18  1995-12-10  1995-12-18
    4   1995-12-19  1995-12-27  1995-12-26
    4   1995-12-20  1995-12-27  1995-12-26
    4   1995-12-23  1995-12-27  1995-12-26
    4   1995-12-26  1995-12-27  1995-12-26
    4   1995-12-30  1995-12-27  1995-12-26

我尝试改编这个post和另一个one的解决方案,但没有成功。

pandas date group-by merge
1个回答
0
投票

您可以计算两个日期之间的绝对差,获取每组的最小值和

map
值:

df[['d1', 'd2']] = df[['d1', 'd2']].apply(pd.to_datetime)

idx = df['d2'].sub(df['d1']).abs().groupby(df['user']).idxmin()

df['d3'] = df['user'] .map(df.loc[idx, 'd1'].set_axis(idx.index))
© www.soinside.com 2019 - 2024. All rights reserved.