我有一个数据帧train
,我已经从train
数据帧中过滤了一定数量的行,以形成promoted
数据帧:
print(train.department.value_counts(),'\n')
promoted=train[train.is_promoted==1]
print(promoted.department.value_counts())
上面代码的输出是:
Sales & Marketing 16840
Operations 11348
Technology 7138
Procurement 7138
Analytics 5352
Finance 2536
HR 2418
Legal 1039
R&D 999
Name: department, dtype: int64
Sales & Marketing 1213
Operations 1023
Technology 768
Procurement 688
Analytics 512
Finance 206
HR 136
R&D 69
Legal 53
Name: department, dtype: int64
我想显示train
数据框中promoted
出现的每个类别部门的百分比,即代替数字1213,1023,768,688等。我应该得到一个百分比,如:1213/16840 * 100 = 7.2等。请注意,我不想要标准化值。
尝试:
promoted.department.value_counts()/train.department.value_counts()*100
它应该给你想要的输出:
Sales & Marketing 7.2030
Operations 9.0148
Technology 10.7593
..... ...
Name: department, dtype: int64
这个怎么样?示例有一个玩具数据集,但关键的想法是简单地将一个值计数除以另一个。
import pandas as pd
import numpy as np
data = pd.DataFrame({
'department': list(range(10)) * 100,
'is_promoted': np.random.randint(0, 2, size = 1000)
})
# Slice out promoted data.
data_promoted = data[data['is_promoted'] == 1]
# Calculate share of each department that is present in data_promoted.
data_promoted['department'].value_counts().sort_index() / data['department'].value_counts().sort_index()
得到:
0 0.50
1 0.52
2 0.45
3 0.54
4 0.41
5 0.50
6 0.45
7 0.52
8 0.60
9 0.52
Name: department, dtype: float64
import pandas as pd
df = pd.read_csv("/home/spaceman/my_work/Most-Recent-Cohorts-Scorecard-Elements.csv")
df=df[['STABBR']] #each values is appearing in dataframe with multiple
#after that i got
CA 717
TX 454
NY 454
FL 417
PA 382
OH 320
IL 280
MI 189
NC 189
.........
.........
print df['STABBR'].value_counts(normalize=True) #returns the relative
frequency by dividing all values by the sum of values
CA 0.099930
TX 0.063275
NY 0.063275
FL 0.058118
PA 0.053240
OH 0.044599
IL 0.039024
MI 0.026341
NC 0.026341
..............
..............