我想在 pandas 中做一些匹配,比如 Excel 中的 Vlookup。根据表1中的一些条件,找到表2中的最新日期:
表一:
Name Threshold1 Threshold2
A 9 8
B 14 13
表二:
Date Name Value
1/1 A 10
1/2 A 9
1/3 A 9
1/4 A 8
1/5 A 8
1/1 B 15
1/2 B 14
1/3 B 14
1/4 B 13
1/5 B 13
想要的桌子是这样的:
Name Threshold1 Threshold1_Date Threshold2 Threshold2_Date
A 9 1/3 8 1/5
B 14 1/3 13 1/5
提前致谢!
# assuming dataframe is already sorted on `date`
# drop the duplicates per Name and Value keeping the max date
cols = ['Name', 'Value']
s = df2.drop_duplicates(cols, keep='last').set_index(cols)['Date']
# for each threshold column use multindex.map to substitute
# values from df2 based on matching Name and Threshold value
for c in df1.filter(like='Threshold'):
df1[c + '_date'] = df1.set_index(['Name', c]).index.map(s)
Name Threshold1 Threshold2 Threshold1_date Threshold2_date
0 A 9 8 1/3 1/5
1 B 14 13 1/3 1/5
这个有用吗?
(df_out := df1.melt('Name', value_name='Value')\
.merge(df2, on=['Name', 'Value'])\
.sort_values('Date')\
.drop_duplicates(['Name', 'variable'], keep='last')\
.set_index(['Name', 'variable'])\
.unstack().sort_index(level=1, axis=1))\
.set_axis(df_out.columns.map('_'.join), axis=1).reset_index()
输出:
Name Date_Threshold1 Value_Threshold1 Date_Threshold2 Value_Threshold2
0 A 1/3 9 1/5 8
1 B 1/3 14 1/5 13
这里有一种方法可以解决您的问题:
idxByThreshCol = ( df1.set_index('Name').pipe(lambda d:
{col: d[[col]]
.rename(columns={col:'Value'})
.set_index('Value', append=True) for col in d.columns}) )
latestDtByNameVal = df2.groupby(['Name','Value']).last()
res = ( df1
.assign(**{f'{k}_Date': latestDtByNameVal.loc[v.index,'Date'].to_numpy()
for k, v in idxByThreshCol.items()}) )
如果您希望结果列按照您的问题进行排序,您可以添加以下内容:
from itertools import chain
res = res[['Name'] + list(chain.from_iterable([[col, f'{col}_Date'] for col in df1.drop(columns='Name').columns]))]
输出:
Name Threshold1 Threshold1_Date Threshold2 Threshold2_Date
0 A 9 1/3 8 1/5
1 B 14 1/3 13 1/5