如何使用字符串快速标记int范围?

问题描述 投票:0回答:1

我要自动标记“票价”量化范围,如下所示。

我的数据如下:

df.head()


PassengerId Survived    Pclass  Name    Sex Age SibSp   Parch   Ticket  Fare    Cabin   Embarked
0   1   0   3   Braund, Mr. Owen Harris male    22.0    1   0   A/5 21171   7.2500  NaN S
1   2   1   1   Cumings, Mrs. John Bradley (Florence Briggs Th...   female  38.0    1   0   PC 17599    71.2833 C85 C
2   3   1   3   Heikkinen, Miss. Laina  female  26.0    0   0   STON/O2. 3101282    7.9250  NaN S
3   4   1   1   Futrelle, Mrs. Jacques Heath (Lily May Peel)    female  35.0    1   0   113803  53.1000 C123    S
4   5   0   3   Allen, Mr. William Henry    male    35.0    0   0   373450  8.0500  NaN S

我做了:

df['FareBin'] = pd.qcut(df['Fare'], 4)
df[['FareBin', 'Survived']].groupby(['FareBin'], as_index=False).mean().sort_values(by='FareBin', ascending=True)


FareBin Survived
0   (-0.001, 7.896] 0.197309
1   (7.896, 14.454] 0.303571
2   (14.454, 31.275]    0.441048
3   (31.275, 512.329]   0.600000

现在,我想以某种智能的方式用字符串标签替换(-0.001,7.896]之类的带。

我尝试过:

df.loc[ df['Fare'] <= 7.91, 'Fare'] = 'Low'
df.loc[(df['Fare'] > 7.91) & (df['Fare'] <= 14.454), 'Fare'] = 'Mid low'
...

有没有一种方法可以做到,所以我不需要列出所有这样的条件?谢谢。

python pandas numpy
1个回答
2
投票

您可以在labels功能中使用参数qcut()

pd.qcut(range(5), 3, labels=["good", "medium", "bad"])

输出:

[good, good, medium, bad, bad]
© www.soinside.com 2019 - 2024. All rights reserved.