I have this df:
data = {'A':[102, 102, 102, 102, 312, 312, 312],
'B':[1001,1001,1001,1001,1001,1001,1001],
'C':[3005,3005,3005,3005,3005,3005,3005],
'D':[2004,2004,2004,2004,2002,2002,2002],
'E':[1,3,5,999,1,5,999],
'F':[300,1,192,837,19,1,1037]}
df = pd.DataFrame (data, columns = ['A','B','C','D','E','F'])
df.head(7)
一行代码计算百分比,除了我想让它排除E列中行值为(999)的计数值之外,它还可以工作。
df['Percentage'] = 100 * df['F'] / df.groupby('A')['F'].transform('sum')
百分比应该显示。
Percentage
60.85193
0.20284
38.94523
(Blank)
95
5
(Blank)
如果有任何帮助,我将非常感激
您可以将您的框架和 transform
然后将结果重新分配回来。
# Get the sub group
>>> grp = df[df['E'].ne(999)]
# Not required: this shows the Intermediate state of the transformed percentage
>>> grp['F'].mul(100).div(grp.groupby('A')['F'].transform('sum'))
0 60.851927
1 0.202840
2 38.945233
4 95.000000
5 5.000000
Name: F, dtype: float64
# Apply the result to your main frame
>>> df['Percentage'] = grp['F'].mul(100).div(grp.groupby('A')['F'].transform('sum'))
结果。
>>> df
A B C D E F Percentage
0 102 1001 3005 2004 1 300 60.851927
1 102 1001 3005 2004 3 1 0.202840
2 102 1001 3005 2004 5 192 38.945233
3 102 1001 3005 2004 999 837 NaN
4 312 1001 3005 2002 1 19 95.000000
5 312 1001 3005 2002 5 1 5.000000
6 312 1001 3005 2002 999 1037 NaN
使用掩码来忽略你想要的行。
import pandas as pd
data = {'A': [102, 102, 102, 102, 312, 312, 312],
'B': [1001, 1001, 1001, 1001, 1001, 1001, 1001],
'C': [3005, 3005, 3005, 3005, 3005, 3005, 3005],
'D': [2004, 2004, 2004, 2004, 2002, 2002, 2002],
'E': [1, 3, 5, 999, 1, 5, 999],
'F': [300, 1, 192, 837, 19, 1, 1037]}
df = pd.DataFrame(data, columns=['A', 'B', 'C', 'D', 'E', 'F'])
mask = ~(df['E'] == 999)
df['Percentage'] = 100 * df[mask]['F'] / df[mask].groupby('A')['F'].transform('sum')
输出:使用掩码忽略你想要的行。
A B C D E F Percentage
0 102 1001 3005 2004 1 300 60.851927
1 102 1001 3005 2004 3 1 0.202840
2 102 1001 3005 2004 5 192 38.945233
3 102 1001 3005 2004 999 837 NaN
4 312 1001 3005 2002 1 19 95.000000
5 312 1001 3005 2002 5 1 5.000000
6 312 1001 3005 2002 999 1037 NaN