遍历并合并具有相同索引,相同列的DataFrame(但是每个DataFrame唯一的几列)

问题描述 投票:1回答:1

所需任务说明

我使用以下代码合并dfdf1(显示示例数据),并且它可以很好地满足我的需求。但是,我需要循环访问大量的DataFrame(例如df2,但将是df3df4等),并且不确定如何修改代码。我的DataFrame具有相同的索引,相同的列,但是每个DataFrame都有一些单独的列。我使用下面的代码,并且效果很好,但是我想对其进行修改,以便可以遍历dfdf1,将它们合并在一起,创建requireddata,然后在requireddata与[C0合并在一起时重复此步骤]。 df2requireddata合并,将继续相同的逻辑,依此类推。任何帮助都是极好的!! :)

df3

df

ID AA TA TL Date 2001 AAPL 1.0 44 50 2002 AAPL 3.0 33 51 2003 AAPL 2.0 22 53 2004 AAPL 5.0 11 76 2005 AAPL 2.0 33 44 2006 AAPL 3.0 22 12

df1

ID AA TA ML Date 2001 MSFT 3.5 44 12 2002 MSFT 6.7 33 15 2003 MSFT 2.3 22 19 2004 MSFT 5.5 11 20 2005 MSFT 2.2 33 43 2006 MSFT 3.2 22 23 示例

df2

使用的代码

       ID    AA  TA  PP
Date                      
2001  TSLA   3.3  48  18
2002  TSLA   6.3  38  18
2003  TSLA   2.6  28  18
2004  TSLA   5.3  18  28
2005  TSLA   2.3  38  48
2006  TSLA   3.3  28  28

创建此:

dfdates['Date'] # this has dates required for index
df
df1

cols_to_use = df.columns.difference(df1.columns) #compare column difference df and df1
cols_to_use1 = df1.columns.difference(df.columns) #compare column difference df1 and df

dataframe = pd.DataFrame(columns = cols_to_use, index = df['Date']) #dataframe with columns in df1 but not in df
dataframe1 = pd.DataFrame(columns = cols_to_use1, index = df1['Date']) #dataframe with columns in df but not in df1

datatesting = pd.concat([dataframe, df], axis=1) #merge missing columns into df
datatesting1 = pd.concat([dataframe1, df1], axis=1) #merge missing columns into df1

diff = datatesting1.columns.difference(datatesting.columns) #check difference (is 0)
print (diff)
frames = [datatesting, datatesting1] #list of dataframes 
requireddata = pd.concat(frames) #merge dataframes

具有循环代码,喜欢这样的东西:

       ID    AA   TA   TL  ML
Date                      
2001  AAPL   1.0  44  50  NaN
2002  AAPL   3.0  33  51  NaN
2003  AAPL   2.0  22  53  NaN
2004  AAPL   5.0  11  76  NaN
2005  AAPL   2.0  33  44  NaN
2006  AAPL   3.0  22  12  NaN                    
2001  MSFT   3.5  44  NaN  12
2002  MSFT   6.7  33  NaN  15
2003  MSFT   2.3  22  NaN  19
2004  MSFT   5.5  11  NaN  20
2005  MSFT   2.2  33  NaN  43
2006  MSFT   3.2  22  NaN  23
pandas loops dataframe merge concat
1个回答
0
投票

我相信这里没有必要区别列,仅使用 ID AA TA TL ML PP Date 2001 AAPL 1.0 44 50 NaN NaN 2002 AAPL 3.0 33 51 NaN NaN 2003 AAPL 2.0 22 53 NaN NaN 2004 AAPL 5.0 11 76 NaN NaN 2005 AAPL 2.0 33 44 NaN NaN 2006 AAPL 3.0 22 12 NaN NaN 2001 MSFT 3.5 44 NaN 12 NaN 2002 MSFT 6.7 33 NaN 15 NaN 2003 MSFT 2.3 22 NaN 19 NaN 2004 MSFT 5.5 11 NaN 20 NaN 2005 MSFT 2.2 33 NaN 43 NaN 2006 MSFT 3.2 22 NaN 23 NaN 2001 TSLA 3.3 48 NaN NaN 18 2002 TSLA 6.3 38 NaN NaN 18 2003 TSLA 2.6 28 NaN NaN 18 2004 TSLA 5.3 18 NaN NaN 28 2005 TSLA 2.3 38 NaN NaN 48 2006 TSLA 3.3 28 NaN NaN 28 ,列正确对齐:

concat
© www.soinside.com 2019 - 2024. All rights reserved.