我有两个 df,两个 df 必须按班级和加入日期合并。请检查以下 df's
df1
class teacher age instructor_joining_date
A mark 50 2024-01-20 07:18:29.599
A john 45 2024-05-08 05:31:21.379
df2
class count student_joining_date
A 1 2024-05-17 01:05:58.072
A 50 2024-04-10 10:39:06.608
A 75 2024-04-05 09:49:07.246
Final output df
class count student_joining_date teacher age
A 1 2024-05-17 01:05:58.072 john 45
A 50 2024-04-10 10:39:06.608 mark 50
A 75 2024-04-05 09:49:07.246 mark 50
对于 df2,我们已按类别和加入日期合并 df1
编辑: 如果student_joining_date 和instructor_joining_date 不同,则可以。如果student_joining_date大于instructor_joining_date,那么该老师将被映射到这里
merge_asof
,然后使用reindex
恢复原始顺序:
df1['instructor_joining_date'] = pd.to_datetime(df1['instructor_joining_date'])
df2['student_joining_date'] = pd.to_datetime(df2['student_joining_date'])
out = (pd.merge_asof(df2.sort_values(by='student_joining_date').reset_index(),
df1.sort_values(by='instructor_joining_date'),
left_on='student_joining_date', right_on='instructor_joining_date',
by='class')
.set_index('index').reindex(df2.index)
.drop(columns='instructor_joining_date')
)
输出:
class count student_joining_date teacher age
0 A 1 2024-05-17 01:05:58.072 john 45
1 A 50 2024-04-10 10:39:06.608 mark 50
2 A 75 2024-04-05 09:49:07.246 mark 50