'Vlookup'并使用Python根据相应的行返回'匹配或不匹配'

问题描述 投票:0回答:1

我有 3 列,表 A 和表 B 的名称、日期和数量。我添加了列 D,它是列名称、日期和数量的串联结果。我想用表 B 查找表 A 中的 D 列。如果匹配,则 E 列的输出为“是”;如果没有匹配项,则为“否”。

如果 E 列中的输出为“否”,给定 D 列(串联),我想确定列名、日期或/和数量是否是不匹配的原因。例如,如果名称不匹配,则在列 F 中返回输出为“不匹配名称”,否则返回输出为“匹配名称”。

我现在遇到的问题是我发现 Name 的输出(匹配或不匹配)是正确的,但 Date 和 Quantity 不正确。我觉得主要是一对多,多对多的关系,其中Name,Date,Quantity里面有多次重复

我的代码不一致,因为我不时修改它们,因为输出不正确,尤其是日期和数量。到目前为止,这是我尝试过的:

#Concate the 3 columns

df2_A = df1_A.copy()
df2_A.loc[:, 'A_Concate'] = df2_A['Name'].astype(str) + df2_of01['Date'].astype(str) + df2_A['Quantity'].astype(str)

df2_B = df1_B.copy()
df2_B.loc[:, 'B_Concate'] = df2_B['Name'].astype(str) + df2_Name['Date'].astype(str) + df2_B['Quantity'].astype(int).astype(str)

#Vlookup concatenated column for Table A and B
df2_A ['Match with B?'] = df2_A ['A_Concate'].isin(df2_B['B_Concate']).map({True: 'Yes', False: 'No'})

#Find reason of not match
df2_A ['Match name?'] = df2_A .apply(lambda row: 'Not match name' if row['Match with B?'] == 'No' and row['name'] not in df2_B['Name'].unique() else 'Match name', axis=1)

df2_A ['Match date?'] = df2_A .apply(lambda row: 'Match date' if row['Match with B?'] == 'Yes' else ('Not match date' if row['Date'] not in df2_B.loc[df2_B['B_Concate'] == row['A_Concate'], 'Date'].values else 'Match date'), axis=1)

df2_A ['Match quantity?'] = df2_A .apply(lambda row: 'Not match quantity' if row['Match with B?'] == 'No' and row['Match part?'] == 'Not match part' else ('Not match quantity' if row['Match with B?'] == 'No' and row['SUGGESTED QTY'] not in df2_B['Quantity'].unique() else 'Match quantity'), axis=1)

哪一部分可以改进,以便根据连接的行返回输出?

python match matching
1个回答
0
投票

IIUC,你可以这样使用

merge
:

out = (pd.merge(df2_A, df2_B, on=list(df2_A.columns), how="left", indicator="Match with B?")
           .replace({"Match with B?": {"both": "Yes", "left_only": "No"}}))

out["Why ?"] = (pd.concat([pd.merge(df2_A[[col]].drop_duplicates(), df2_B[[col]].drop_duplicates(),
                                    on=col, how="left", indicator=f"check_{i}")
                           for i, col in enumerate(df2_A.columns)], axis=1).filter(like="check")
                   .set_axis(df2_A.columns, axis=1).replace({"both": True, "left_only": False})
                   .apply(lambda x: np.where(x.eq(False), x.name, None)).stack().groupby(level=0).agg(list)
      )

输出:

print(out)
​
  Name        Date  Quantity Match with B?                   Why ?
0  foo  2023-02-11         1            No        [Date, Quantity]
1  bar  2023-03-22         2           Yes                     NaN
2  baz  2023-01-05         3            No  [Name, Date, Quantity]
3  qux  2023-04-18         4            No            [Name, Date]
4  bar  2023-05-01         5            No                  [Date]

如果您对每一列进行匹配检查,请使用:

tmp = (pd.merge(df2_A, df2_B, on=list(df2_A.columns), how="left", indicator="Match with B?")
           .replace({"Match with B?": {"both": "Yes", "left_only": "No"}}))

out = tmp.join(pd.concat([pd.merge(df2_A[[col]].drop_duplicates(), df2_B[[col]].drop_duplicates(),
                                    on=col, how="left", indicator=f"check_{i}")
                           for i, col in enumerate(df2_A.columns)], axis=1).filter(like="check")
                   .set_axis(df2_A.columns, axis=1).replace({"both": True, "left_only": False})
                   .add_prefix("Match ").add_suffix(" ?").replace({True: "Yes", False: "No"}).fillna("Yes")
      )

输出:

print(out)

  Name        Date  Quantity Match with B? Match Name ? Match Date ? Match Quantity ?
0  foo  2023-02-11         1            No          Yes           No               No
1  bar  2023-03-22         2           Yes          Yes          Yes              Yes
2  baz  2023-01-05         3            No           No           No               No
3  qux  2023-04-18         4            No           No           No              Yes
4  bar  2023-05-01         5            No          Yes           No              Yes

高亮显示结果:

使用的输入:

df2_A = pd.DataFrame({
    "Name": ["foo", "bar", "baz", "qux", "bar"],
    "Date": ["2023-02-11", "2023-03-22", "2023-01-05", "2023-04-18", "2023-05-01"],
    "Quantity": [1, 2, 3, 4, 5]
})

df2_B = pd.DataFrame({
    "Name": ["foo", "xyz", "foo", "bar"],
    "Date": ["2023-02-30", "2023-02-25", "2023-03-10", "2023-03-22"],
    "Quantity": [5, 4, 6, 2]
})
© www.soinside.com 2019 - 2024. All rights reserved.