根据超过 1 列比较两个数据帧并交付更改

问题描述 投票:0回答:1

所以考虑我有两个数据框:

旧信息:

    Name       Id      Club     Number
0   Ronaldo    12414   Al-Nassr 7
1   Messi      4344134 Miami    30
2   Shevchenko 1234435 Milan    7
3   Maradona   37346   Retired  None

old = {'Name': {0: 'Ronaldo', 1: 'Messi', 2: 'Shevchenko', 3: 'Maradona'},
 'Id': {0: 12414, 1: 4344134, 2: 1234435, 3: 37346},
 'Club': {0: 'Al-Nassr', 1: 'Miami', 2: 'Milan', 3: 'Retired'},
 'Number': {0: 7, 1: 30, 2: 7, 3: None}}

新信息:

    Name       Id       Club        Number
0   Ronaldo    12414    Al-Nassr    7
1   Messi      4344134  Miami       10
2   Shevchenko 1234435  Retired     None
3   Neymar     423552   Al Hilal    10

new = {'Name': {0: 'Ronaldo', 1: 'Messi', 2: 'Shevchenko', 3: 'Neymar'},
 'Id': {0: 12414, 1: 4344134, 2: 1234435, 3: 423552},
 'Club': {0: 'Al-Nassr', 1: 'Miami', 2: 'Retired', 3: 'Al Hilal'},
 'Number': {0: 7, 1: 10, 2: None, 3: 10}}

我想将新的 df(基于名称 + ID)与旧的 df 进行比较,并返回一个新的 df,其中包含指示更改的列:

      Name       Id      Club       Number  Changes
0     Ronaldo    12414   Al-Nassr   7       No change
1     Messi      4344134 Miami      10      Number
2     Shevchenko 1234435 Retired    None    Club, Number
3     Neymar     423552  Al Hilal   10      New entry

我主要关心新的添加和更改,但如果不太难包括删除的内容,那就太好了。如果变化很大那就没必要了。所以结果会是这样的:

      Name       Id      Club       Number  Changes
0     Ronaldo    12414   Al-Nassr   7       No change
1     Messi      4344134 Miami      10      Number
2     Shevchenko 1234435 Retired    None    Club, Number
3     Neymar     423552  Al Hilal   10      New entry
4     Maradona   37346   Retired    None    Removed
pandas dataframe comparison
1个回答
0
投票

您可以合并数据框并创建一个查看案例的函数

import pandas as pd

old = {'Name': {0: 'Ronaldo', 1: 'Messi', 2: 'Shevchenko', 3: 'Maradona'},
       'Id': {0: 12414, 1: 4344134, 2: 1234435, 3: 37346},
       'Club': {0: 'Al-Nassr', 1: 'Miami', 2: 'Milan', 3: 'Retired'},
       'Number': {0: 7, 1: 30, 2: 7, 3: None}}

new = {'Name': {0: 'Ronaldo', 1: 'Messi', 2: 'Shevchenko', 3: 'Neymar'},
       'Id': {0: 12414, 1: 4344134, 2: 1234435, 3: 423552},
       'Club': {0: 'Al-Nassr', 1: 'Miami', 2: 'Retired', 3: 'Al Hilal'},
       'Number': {0: 7, 1: 10, 2: None, 3: 10}}

old_df = pd.DataFrame(old)
new_df = pd.DataFrame(new)

merged_df = pd.merge(old_df, new_df, on=['Name', 'Id'], how='outer', suffixes=('_old', '_new'))

def determine_changes(row):
    changes = []
    if pd.isna(row['Club_old']):
        return 'New entry'
    if pd.isna(row['Club_new']):
        return 'Removed'
    if row['Club_old'] != row['Club_new']:
        changes.append('Club')
    if row['Number_old'] != row['Number_new']:
        changes.append('Number')
    return ', '.join(changes) if changes else 'No change'

merged_df['Changes'] = merged_df.apply(determine_changes, axis=1)

result_df = merged_df[['Name', 'Id', 'Club_new', 'Number_new', 'Changes']].rename(
    columns={'Club_new': 'Club', 'Number_new': 'Number'})


print(result_df)


这会给你

         Name       Id      Club  Number       Changes
0     Ronaldo    12414  Al-Nassr     7.0     No change
1       Messi  4344134     Miami    10.0        Number
2  Shevchenko  1234435   Retired     NaN  Club, Number
3    Maradona    37346       NaN     NaN       Removed
4      Neymar   423552  Al Hilal    10.0     New entry
© www.soinside.com 2019 - 2024. All rights reserved.