根据其他DataFrame的列顺序从DataFrame中选择

问题描述 投票:0回答:1
import pandas as pd

df_a = pd.DataFrame({'Number':[1,2,3,4,5,6,7,8],
                     'Column_A': ['C','E','G','L','E','N','P','R'],
                     'Column_B': ['D','F','H','M','Z','O','Q','S']})

df_b = pd.DataFrame({'Number':[1,2,3,4,5,6],
                     'Column_C': ['A','E','L','H','C','Q'],
                     'Column_D': ['B','F','M','G','F','P']})

mask = (((df_a['Column_A'].isin(df_b['Column_C'])) & (df_a['Column_B'].isin(df_b['Column_D']))) | ((df_a['Column_A'].isin(df_b['Column_D'])) & (df_a['Column_B'].isin(df_b['Column_C']))))

df_a[mask]

df_a

   Number Column_A Column_B
0       1        C        D
1       2        E        F
2       3        G        H
3       4        L        M
4       5        E        Z
5       6        N        O
6       7        P        Q
7       8        R        S

df_b

   Number Column_C Column_D
0       1        A        B
1       2        E        F
2       3        L        M
3       4        H        G
4       5        C        F
5       6        Q        P

df_a[掩码]

   Number Column_A Column_B
1       2        E        F
2       3        G        H
3       4        L        M
6       7        P        Q
  • df_aColumn_AColumn_B中找到df_bColumn_CColumn_D
  • Column_C 可以是 Column_BColumn_DColumn_A(AND 条件)

假设有更多的列要“屏蔽”,条件将变得很长。 有没有更好的合并/连接或其他解决方案?

python-3.x pandas
1个回答
0
投票

您可以将列聚合为

set
并使用相同的逻辑:

mask = (df_a[['Column_A', 'Column_B']].agg(set, axis=1)
        .isin(df_b[['Column_C', 'Column_D']].agg(set, axis=1))
       )

out = df_a[mask]

输出:

   Number Column_A Column_B
1       2        E        F
2       3        G        H
3       4        L        M
6       7        P        Q
© www.soinside.com 2019 - 2024. All rights reserved.