获取熊猫数据框中多列的表面加权平均值

问题描述 投票:0回答:2

我想对数据框中的列进行表面加权平均。我有两个表面柱和两个U值柱。我想创建一个额外的列“ U_av”(表面加权平均U值),并且U_av =(A1 * U1 + A2 * U2)/(A1 + A2)。如果在其中一列中出现NaN,则应返回NaN。

初始df:

   ID  A1   A2    U1  U2
0  14   2  1.0  10.0  11
1  16   2  2.0  12.0  12
2  18   2  3.0  24.0  13
3  20   2  NaN   8.0  14
4  22   4  5.0  84.0  15
5  24   4  6.0  84.0  16

所需的输出:

   ID  A1   A2    U1  U2   U_av
0  14   2  1.0  10.0  11  10.33
1  16   2  2.0  12.0  12     12
2  18   2  3.0  24.0  13   17.4
3  20   2  NaN   8.0  14    NaN
4  22   4  5.0  84.0  15  45.66
5  24   4  6.0  84.0  16   43.2

代码:

import numpy as np

import pandas as pd



df = pd.DataFrame({"ID": [14,16,18,20,22,24],
                   "A1": [2,2,2,2,4,4],
  
                   "U1": [10,12,24,8,84,84],
   
                   "A2": [1,2,3,np.nan,5,6],
   
                   "U2": [11,12,13,14,15,16]})



print(df)

#the mean of two columns U1 and U2 and dropping NaN is easy (U1+U2/2 in this case) 
#but what to do for the surface-weighted mean (U_av = (A1*U1 + A2*U2) / (A1+A2))?
df.loc[:,'Umean'] = df[['U1','U2']].dropna().mean(axis=1)





python pandas average weighted-average
2个回答
0
投票

希望我正确了:

df['U_av'] = df['A1']*df['U1'] + df['A2']*df['U2'] / (df['A1']+df['A2'])
df


    ID  A1  U1  A2  U2  U_av
0   14  2   10  1.0 11  23.666667
1   16  2   12  2.0 12  30.000000
2   18  2   24  3.0 13  55.800000
3   20  2   8   NaN 14  NaN
4   22  4   84  5.0 15  344.333333
5   24  4   84  6.0 16  345.600000

0
投票

IIUC,

numerator = df.A1.mul(df.U1) + (df.A2.mul(df.U2))
denominator = df.A1.add(df.A2)
df["U_av"] = numerator.div(denominator)

df

    ID  A1  A2  U1  U2  U_av
0   14  2   1.0 10.0    11  10.333333
1   16  2   2.0 12.0    12  12.000000
2   18  2   3.0 24.0    13  17.400000
3   20  2   NaN 8.0     14  NaN
4   22  4   5.0 84.0    15  45.666667
5   24  4   6.0 84.0    16  43.200000
© www.soinside.com 2019 - 2024. All rights reserved.