我想对数据框中的列进行表面加权平均。我有两个表面柱和两个U值柱。我想创建一个额外的列“ U_av”(表面加权平均U值),并且U_av =(A1 * U1 + A2 * U2)/(A1 + A2)。如果在其中一列中出现NaN,则应返回NaN。
初始df:
ID A1 A2 U1 U2
0 14 2 1.0 10.0 11
1 16 2 2.0 12.0 12
2 18 2 3.0 24.0 13
3 20 2 NaN 8.0 14
4 22 4 5.0 84.0 15
5 24 4 6.0 84.0 16
所需的输出:
ID A1 A2 U1 U2 U_av
0 14 2 1.0 10.0 11 10.33
1 16 2 2.0 12.0 12 12
2 18 2 3.0 24.0 13 17.4
3 20 2 NaN 8.0 14 NaN
4 22 4 5.0 84.0 15 45.66
5 24 4 6.0 84.0 16 43.2
代码:
import numpy as np
import pandas as pd
df = pd.DataFrame({"ID": [14,16,18,20,22,24],
"A1": [2,2,2,2,4,4],
"U1": [10,12,24,8,84,84],
"A2": [1,2,3,np.nan,5,6],
"U2": [11,12,13,14,15,16]})
print(df)
#the mean of two columns U1 and U2 and dropping NaN is easy (U1+U2/2 in this case)
#but what to do for the surface-weighted mean (U_av = (A1*U1 + A2*U2) / (A1+A2))?
df.loc[:,'Umean'] = df[['U1','U2']].dropna().mean(axis=1)
希望我正确了:
df['U_av'] = df['A1']*df['U1'] + df['A2']*df['U2'] / (df['A1']+df['A2'])
df
ID A1 U1 A2 U2 U_av
0 14 2 10 1.0 11 23.666667
1 16 2 12 2.0 12 30.000000
2 18 2 24 3.0 13 55.800000
3 20 2 8 NaN 14 NaN
4 22 4 84 5.0 15 344.333333
5 24 4 84 6.0 16 345.600000
IIUC,
numerator = df.A1.mul(df.U1) + (df.A2.mul(df.U2))
denominator = df.A1.add(df.A2)
df["U_av"] = numerator.div(denominator)
df
ID A1 A2 U1 U2 U_av
0 14 2 1.0 10.0 11 10.333333
1 16 2 2.0 12.0 12 12.000000
2 18 2 3.0 24.0 13 17.400000
3 20 2 NaN 8.0 14 NaN
4 22 4 5.0 84.0 15 45.666667
5 24 4 6.0 84.0 16 43.200000