带有条件的百分比列

问题描述 投票:0回答:2
I have this df:

data = {'A':[102, 102, 102, 102, 312, 312, 312], 
        'B':[1001,1001,1001,1001,1001,1001,1001],
        'C':[3005,3005,3005,3005,3005,3005,3005],
        'D':[2004,2004,2004,2004,2002,2002,2002],
        'E':[1,3,5,999,1,5,999],
        'F':[300,1,192,837,19,1,1037]} 

df = pd.DataFrame (data, columns = ['A','B','C','D','E','F'])

df.head(7)

一行代码计算了一个百分比,该百分比不同于我希望它排除E列中的行值为(999)的计数值:

df['Percentage'] = 100 * df['F'] / df.groupby('A')['F'].transform('sum')

百分比应显示:

Percentage
60.85193
0.20284
38.94523
(Blank)
95
5
(Blank)

任何帮助将不胜感激

python pandas dataframe pandas-groupby percentage
2个回答
0
投票
应用蒙版并将选定的行设置为无。按照您提供的示例执行所有操作,然后添加以下行:

mask = data['E'] == '999' df['Percentage'][mask] = np.nan


0
投票
您可以细分框架,然后将transform细分为该特定部分,然后将结果重新分配回去:

>>> grp = df[df['E'].ne(999)] >>> grp['F'].mul(100).div(grp.groupby('A')['F'].transform('sum')) 0 60.851927 1 0.202840 2 38.945233 4 95.000000 5 5.000000 Name: F, dtype: float64 >>> df['Percentage'] = grp['F'].mul(100).div(grp.groupby('A')['F'].transform('sum')) >>> df A B C D E F Percentage 0 102 1001 3005 2004 1 300 60.851927 1 102 1001 3005 2004 3 1 0.202840 2 102 1001 3005 2004 5 192 38.945233 3 102 1001 3005 2004 999 837 NaN 4 312 1001 3005 2002 1 19 95.000000 5 312 1001 3005 2002 5 1 5.000000 6 312 1001 3005 2002 999 1037 NaN

© www.soinside.com 2019 - 2024. All rights reserved.