所以考虑我有两个数据框:
旧信息:
Name Id Club Number
0 Ronaldo 12414 Al-Nassr 7
1 Messi 4344134 Miami 30
2 Shevchenko 1234435 Milan 7
3 Maradona 37346 Retired None
old = {'Name': {0: 'Ronaldo', 1: 'Messi', 2: 'Shevchenko', 3: 'Maradona'},
'Id': {0: 12414, 1: 4344134, 2: 1234435, 3: 37346},
'Club': {0: 'Al-Nassr', 1: 'Miami', 2: 'Milan', 3: 'Retired'},
'Number': {0: 7, 1: 30, 2: 7, 3: None}}
新信息:
Name Id Club Number
0 Ronaldo 12414 Al-Nassr 7
1 Messi 4344134 Miami 10
2 Shevchenko 1234435 Retired None
3 Neymar 423552 Al Hilal 10
new = {'Name': {0: 'Ronaldo', 1: 'Messi', 2: 'Shevchenko', 3: 'Neymar'},
'Id': {0: 12414, 1: 4344134, 2: 1234435, 3: 423552},
'Club': {0: 'Al-Nassr', 1: 'Miami', 2: 'Retired', 3: 'Al Hilal'},
'Number': {0: 7, 1: 10, 2: None, 3: 10}}
我想将新的 df(基于名称 + ID)与旧的 df 进行比较,并返回一个新的 df,其中包含指示更改的列:
Name Id Club Number Changes
0 Ronaldo 12414 Al-Nassr 7 No change
1 Messi 4344134 Miami 10 Number
2 Shevchenko 1234435 Retired None Club, Number
3 Neymar 423552 Al Hilal 10 New entry
我主要关心新的添加和更改,但如果不太难包括删除的内容,那就太好了。如果变化很大那就没必要了。所以结果会是这样的:
Name Id Club Number Changes
0 Ronaldo 12414 Al-Nassr 7 No change
1 Messi 4344134 Miami 10 Number
2 Shevchenko 1234435 Retired None Club, Number
3 Neymar 423552 Al Hilal 10 New entry
4 Maradona 37346 Retired None Removed
您可以合并数据框并创建一个查看案例的函数
import pandas as pd
old = {'Name': {0: 'Ronaldo', 1: 'Messi', 2: 'Shevchenko', 3: 'Maradona'},
'Id': {0: 12414, 1: 4344134, 2: 1234435, 3: 37346},
'Club': {0: 'Al-Nassr', 1: 'Miami', 2: 'Milan', 3: 'Retired'},
'Number': {0: 7, 1: 30, 2: 7, 3: None}}
new = {'Name': {0: 'Ronaldo', 1: 'Messi', 2: 'Shevchenko', 3: 'Neymar'},
'Id': {0: 12414, 1: 4344134, 2: 1234435, 3: 423552},
'Club': {0: 'Al-Nassr', 1: 'Miami', 2: 'Retired', 3: 'Al Hilal'},
'Number': {0: 7, 1: 10, 2: None, 3: 10}}
old_df = pd.DataFrame(old)
new_df = pd.DataFrame(new)
merged_df = pd.merge(old_df, new_df, on=['Name', 'Id'], how='outer', suffixes=('_old', '_new'))
def determine_changes(row):
changes = []
if pd.isna(row['Club_old']):
return 'New entry'
if pd.isna(row['Club_new']):
return 'Removed'
if row['Club_old'] != row['Club_new']:
changes.append('Club')
if row['Number_old'] != row['Number_new']:
changes.append('Number')
return ', '.join(changes) if changes else 'No change'
merged_df['Changes'] = merged_df.apply(determine_changes, axis=1)
result_df = merged_df[['Name', 'Id', 'Club_new', 'Number_new', 'Changes']].rename(
columns={'Club_new': 'Club', 'Number_new': 'Number'})
print(result_df)
这会给你
Name Id Club Number Changes
0 Ronaldo 12414 Al-Nassr 7.0 No change
1 Messi 4344134 Miami 10.0 Number
2 Shevchenko 1234435 Retired NaN Club, Number
3 Maradona 37346 NaN NaN Removed
4 Neymar 423552 Al Hilal 10.0 New entry