将值分组到自定义箱中

问题描述 投票:0回答:1

我有一个带有'教育'属性的数据框。值是离散的1-16。出于交叉制表的目的,我想将这个“教育”变量装箱,但要使用自定义箱柜(1:8、9:11、12、13:15、16)。

我一直在鬼混pd.cut(),但收到无效的语法错误

adult_df_educrace['education_bins'] = pd.cut(x=adult_df_educrace['education'], bins=[1:8, 9, 10:11, 12, 13:15, 16], labels = ['Middle School or less', 'Some High School', 'High School Grad', 'Some College', 'College Grad'])
python pandas numpy bins
1个回答
1
投票

尝试使垃圾桶落在阈值之间:

bins = [0.5, 8.5, 11.5, 12.5, 15.5, 16.5]
labels=['Middle School or less', 'Some High School', 
        'High School Grad', 'Some College', 'College Grad']

adult_df_educrace['education_bins'] = pd.cut(x=adult_df_educrace['education'],
                                             bins=bins,
                                             labels=labels)

测试:

adult_df_educrace = pd.DataFrame({'education':np.arange(1,17)})

输出:

    education         education_bins
0           1  Middle School or less
1           2  Middle School or less
2           3  Middle School or less
3           4  Middle School or less
4           5  Middle School or less
5           6  Middle School or less
6           7  Middle School or less
7           8  Middle School or less
8           9       Some High School
9          10       Some High School
10         11       Some High School
11         12       High School Grad
12         13           Some College
13         14           Some College
14         15           Some College
15         16           College Grad
© www.soinside.com 2019 - 2024. All rights reserved.