输入:
df1 = pd.DataFrame({"A": [1, 1, 2, 2, 3], "B": [1, 2, 1, 3, 1]})
df2 = pd.DataFrame({"C": [1, 1, 2, 2], "D": [1, 2, 1, 4]})
预期输出:
A | B | C | D |
---|---|---|---|
1 | 1 | 1 | 1 |
1 | 2 | 1 | 2 |
2 | 1 | 2 | 1 |
2 | 3 | 2 | 南 |
3 | 1 | 南 | 南 |
我尝试了以下方法:
joined = pd.merge(df1, df2, left_on=['A', 'B'], right_on=['C', 'D'], how='left')
我得到的输出是:
A B C D
0 1 1 1.0 1.0
1 1 2 1.0 2.0
2 2 1 2.0 1.0
3 2 3 NaN NaN
4 3 1 NaN NaN
如果你正在进行这样的合并,你就无法挑选。看来您真正想要的是单独合并每一列。为了区分重复项,在这种情况下您可以使用索引,但我假设这只是侥幸,所以我将进行枚举。
cols_merged = []
for col1, col2 in ('A', 'C'), ('B', 'D'):
col_merged = pd.merge(
df1[col1],
df2[col2],
left_on=[col1, df1.groupby(col1).cumcount()],
right_on=[col2, df2.groupby(col2).cumcount()],
how='left',
)[col2]
cols_merged.append(col_merged)
joined = pd.concat([df1, *cols_merged], axis=1)
joined
A B C D
0 1 1 1.0 1.0
1 1 2 1.0 2.0
2 2 1 2.0 1.0
3 2 3 2.0 NaN
4 3 1 NaN NaN