我在列表上的一个类别内有多个 df 形式的财务报表,我想合并每个类别的所有财务报表,同时保留信息,并且不重复同一年的值来说明问题,我在 Excel 上做了一个例子关于我想要实现的目标:在此处输入图像描述
老实说我不知道该怎么办
不确定一般方法,但我尝试通过参考下面的屏幕截图来实现一些代码,
import pandas as pd
data1 = {
"Keys": ["Apple", "Grapes", "Banana"],
"2023": [4, 3, 1],
"2022" : [2, 5, 2],
"2021" : [8, 7, 3]
}
data2 = {
"Keys": ["Apple", "Orange", "Grapes","Mandarine"],
"2022": [2, 3, 5, 7],
"2021" : [8, 2, 7, 3],
"2020" : [5, 2, 4, 8]
}
data3 = {
"Keys": ["Apple", "Orange", "Grapes","Mandarine"],
"2021": [8, 2, 7, 3],
"2020" : [5, 2, 4, 8],
"2019" : [3, 6, 4, 9]
}
data4 = {
"Keys": [None,None],
"": [None, None]
}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df3 = pd.DataFrame(data3)
df4 = pd.DataFrame(data4)
def try_merge(ls_dfs,key_col):
final_df=ls_dfs[0]
for i in ls_dfs[1:]:
final_df=pd.merge(final_df,i,on=key_col, how='outer', suffixes=('', '_dup'))
for j in final_df.columns:
if '_dup' in j:
mycol=j.replace('_dup','')
final_df[mycol]=final_df[mycol].combine_first(final_df[j])
final_df.drop(columns=[j],inplace=True)
return final_df
refined_data=try_merge([df1,df2,df3,df4],"Keys")
refined_data=refined_data.dropna(axis=0,how="all").dropna(axis=1,how="all")
refined_data
这给了我几乎与最终数据框中预期的结果
我使用的参考链接:-
https://www.w3schools.com/python/pandas/ref_df_merge.asp
https://pandas.pydata.org/docs/reference/api/pandas.merge.html