我有两个数据帧df1和df2。每个数据帧的唯一标识符为“ ID”和“ Prop_Number”。我需要将df1的Num1、2和3列复制到df2、1_Num中的相应列...但是我不确定如何对多个列进行合并。我想将df2保持为df2,而不是创建将保持原样的新df(因为我的实际数据在df2中具有更多列)。
cols1 = ['ID', 'Num1', 'Num2', 'Num3']
data1 = [['33', '.853', '9834', '234'],
['87', '.372', '2345', '843'],
['15', '1.234','742', '821'],
['92', '1.957', '1234', '123'],
['13', '.943', '8427', '493'],
['67', '.852', '3421', '439']
]
df1 = pd.DataFrame(data=data1, columns=cols1)
cols2 = ['Prop_Number', '1_Num', '2_Num', '3_Num']
data2 = [['87', '', '', ''],
['33', '', '', ''],
['67', '','', ''],
['13', '', '', ''],
['92', '', '', ''],
['15', '', '', '']
]
df2 = pd.DataFrame(data=data2, columns=cols2)
我尝试过的是
df2['1_Num'] = np.where(df1['ID'] == df2['Prop_Number'], df1['Num1'],np.nan)
斯科特提供了一个很好的答案,但我对您通过数字匹配列很感兴趣,并认为这可以帮助解决您的问题。
想法是让正则表达式匹配数据框中的所有数字类型的列,然后按数字对它们进行排序,这使我们能够将df1到df2的列进行匹配:
也由于您使用了不同名称的索引,因此索引将返回空白,您可以手动更新它。
def match_numeric_columns(dataframe1, dataframe2):
"""
the first argument will be the dataframe you want to rename
takes in two dataframes and returns their alphanumeric
values as matches. e.g col1a = 1cola or Data_225 = 225_Info
"""
cola = (
dataframe1.filter(regex="\d").columns)
colb = (
dataframe2.filter(regex="\d").columns)
all_matches = {
(k if int(re.findall("\d+", k)[0]) == int(re.findall("\d+", v)[0]) else None):
(v if int(re.findall("\d+", v)[0]) == int(re.findall("\d+", k)[0]) else None
)
for (k, v) in zip(cola, colb)
}
matching_cols = {k: v for k, v in all_matches.items() if v is not None}
return matching_cols
print(matching_cols(df1,df2))
{'1_Num': 'Num1', '2_Num': 'Num2', '3_Num': 'Num3'}
df2_v2 = (
df2.set_index("ID")
.rename(columns=match_numeric_columns(df2, df1))
.replace("", np.nan)
.combine_first(df1.set_index("ID"))
)
print(df2_v2)
Num1 Num2 Num3
13 0.943 8427.0 493.0
15 1.234 742.0 821.0
33 0.853 9834.0 234.0
67 0.852 3421.0 439.0
87 0.372 2345.0 843.0
92 1.957 1234.0 123.0
您可以尝试以下方法:
cols1 = ['ID', 'Num1', 'Num2', 'Num3']
data1 = [['33', '.853', '9834', '234'],
['87', '.372', '2345', '843'],
['15', '1.234','742', '821'],
['92', '1.957', '1234', '123'],
['13', '.943', '8427', '493'],
['67', '.852', '3421', '439']
]
df1 = pd.DataFrame(data=data1, columns=cols1)
cols2 = ['Prop_Number', '1_Num', '2_Num', '3_Num']
data2 = [['87', '', '', ''],
['33', '', '', ''],
['67', '','', ''],
['13', '', '', ''],
['92', '', '', ''],
['15', '', '', '']
]
df2 = pd.DataFrame(data=data2, columns=cols2)
df2 = df2.set_index('Prop_Number')
df2.update(df1.rename(columns=dict(zip(df1.columns[1:],
['1_Num','2_Num','3_Num'])))
.set_index('ID'))
df2 = df2.reset_index()
print(df2)
输出:
Prop_Number 1_Num 2_Num 3_Num
0 87 .372 2345 843
1 33 .853 9834 234
2 67 .852 3421 439
3 13 .943 8427 493
4 92 1.957 1234 123
5 15 1.234 742 821
Details:rename
df1列以匹配df2列并使用set_index
,并用update
修改df2。