Python:根据创建时间合并两个数据帧

问题描述 投票:0回答:1

我有两个 df,两个 df 必须按班级和加入日期合并。请检查以下 df's

df1


class      teacher  age instructor_joining_date
   A        mark    50  2024-01-20 07:18:29.599
   A        john    45  2024-05-08 05:31:21.379


df2


class   count   student_joining_date
A          1    2024-05-17 01:05:58.072
A         50    2024-04-10 10:39:06.608
A         75    2024-04-05 09:49:07.246


Final output df

class   count   student_joining_date      teacher   age
A         1    2024-05-17 01:05:58.072    john       45
A        50    2024-04-10 10:39:06.608    mark       50
A        75    2024-04-05 09:49:07.246    mark       50

对于 df2,我们已按类别和加入日期合并 df1

编辑: 如果student_joining_date 和instructor_joining_date 不同,则可以。如果student_joining_date大于instructor_joining_date,那么该老师将被映射到这里

python pandas dataframe numpy time-series
1个回答
0
投票

您必须使用

merge_asof
,然后使用
reindex
恢复原始顺序:

df1['instructor_joining_date'] = pd.to_datetime(df1['instructor_joining_date'])
df2['student_joining_date'] = pd.to_datetime(df2['student_joining_date'])

out = (pd.merge_asof(df2.sort_values(by='student_joining_date').reset_index(),
                     df1.sort_values(by='instructor_joining_date'),
                     left_on='student_joining_date', right_on='instructor_joining_date',
                     by='class')
         .set_index('index').reindex(df2.index)
         .drop(columns='instructor_joining_date')
      )

输出:

  class  count    student_joining_date teacher  age
0     A      1 2024-05-17 01:05:58.072    john   45
1     A     50 2024-04-10 10:39:06.608    mark   50
2     A     75 2024-04-05 09:49:07.246    mark   50
© www.soinside.com 2019 - 2024. All rights reserved.