基于一个键将数据从df复制到多列中的另一个df

问题描述 投票:1回答:2

我有两个数据帧df1和df2。每个数据帧的唯一标识符为“ ID”和“ Prop_Number”。我需要将df1的Num1、2和3列复制到df2、1_Num中的相应列...但是我不确定如何对多个列进行合并。我想将df2保持为df2,而不是创建将保持原样的新df(因为我的实际数据在df2中具有更多列)。

cols1 = ['ID', 'Num1', 'Num2', 'Num3']
data1 = [['33', '.853', '9834', '234'],
        ['87', '.372', '2345', '843'],
        ['15', '1.234','742', '821'],
        ['92', '1.957', '1234', '123'],
        ['13', '.943', '8427', '493'],
        ['67', '.852', '3421', '439']
       ]
df1 = pd.DataFrame(data=data1, columns=cols1)

cols2 = ['Prop_Number', '1_Num', '2_Num', '3_Num']
data2 = [['87', '', '', ''],
        ['33', '', '', ''],
        ['67', '','', ''],
        ['13', '', '', ''],
        ['92', '', '', ''],
        ['15', '', '', '']
       ]
df2 = pd.DataFrame(data=data2, columns=cols2)

我尝试过的是

df2['1_Num'] = np.where(df1['ID'] == df2['Prop_Number'], df1['Num1'],np.nan)

python pandas dataframe merge
2个回答
1
投票

斯科特提供了一个很好的答案,但我对您通过数字匹配列很感兴趣,并认为这可以帮助解决您的问题。

想法是让正则表达式匹配数据框中的所有数字类型的列,然后按数字对它们进行排序,这使我们能够将df1到df2的列进行匹配:

也由于您使用了不同名称的索引,因此索引将返回空白,您可以手动更新它。

def match_numeric_columns(dataframe1, dataframe2):

"""
the first argument will be the dataframe you want to rename
takes in two dataframes and returns their alphanumeric 
values as matches. e.g col1a = 1cola or Data_225 = 225_Info
"""


   cola = (
        dataframe1.filter(regex="\d").columns)

   colb = (
    dataframe2.filter(regex="\d").columns)

   all_matches = {
    (k if int(re.findall("\d+", k)[0]) == int(re.findall("\d+", v)[0]) else None): 
    (v if int(re.findall("\d+", v)[0]) == int(re.findall("\d+", k)[0]) else None
    )
    for (k, v) in zip(cola, colb)
   }


    matching_cols = {k: v for k, v in all_matches.items() if v is not None}

    return matching_cols

print(matching_cols(df1,df2))
{'1_Num': 'Num1', '2_Num': 'Num2', '3_Num': 'Num3'}

df2_v2 = (
    df2.set_index("ID")
    .rename(columns=match_numeric_columns(df2, df1))
    .replace("", np.nan)
    .combine_first(df1.set_index("ID"))
)

print(df2_v2)
     Num1    Num2   Num3
13  0.943  8427.0  493.0
15  1.234   742.0  821.0
33  0.853  9834.0  234.0
67  0.852  3421.0  439.0
87  0.372  2345.0  843.0
92  1.957  1234.0  123.0

0
投票

您可以尝试以下方法:

cols1 = ['ID', 'Num1', 'Num2', 'Num3']
data1 = [['33', '.853', '9834', '234'],
        ['87', '.372', '2345', '843'],
        ['15', '1.234','742', '821'],
        ['92', '1.957', '1234', '123'],
        ['13', '.943', '8427', '493'],
        ['67', '.852', '3421', '439']
       ]
df1 = pd.DataFrame(data=data1, columns=cols1)

cols2 = ['Prop_Number', '1_Num', '2_Num', '3_Num']
data2 = [['87', '', '', ''],
        ['33', '', '', ''],
        ['67', '','', ''],
        ['13', '', '', ''],
        ['92', '', '', ''],
        ['15', '', '', '']
       ]
df2 = pd.DataFrame(data=data2, columns=cols2)

df2 = df2.set_index('Prop_Number')
df2.update(df1.rename(columns=dict(zip(df1.columns[1:],
                                       ['1_Num','2_Num','3_Num'])))
              .set_index('ID'))
df2 = df2.reset_index()
print(df2)

输出:

  Prop_Number  1_Num 2_Num 3_Num
0          87   .372  2345   843
1          33   .853  9834   234
2          67   .852  3421   439
3          13   .943  8427   493
4          92  1.957  1234   123
5          15  1.234   742   821

Details:renamedf1列以匹配df2列并使用set_index,并用update修改df2。

© www.soinside.com 2019 - 2024. All rights reserved.