分类列到多个计数列

问题描述 投票:0回答:1

假设我们有一个 DataFrame:

data = {'person_id': ['person_a', 'person_a', 'person_b','person_b', 'person_c','person_c'],
        'categorical_data': ['new', 'new', 'ok', 'bad', 'new', 'bad']}
df = pd.DataFrame(data)

    person_id   categorical_data
0   person_a    new
1   person_a    new
2   person_b    ok
3   person_b    bad
4   person_c    new
5   person_c    bad

我想将分类数据扩展到多列,其中包含每个类别的计数。

我们可以按人员 ID 进行分组来获取计数:

count_categories = df.groupby('person_id')['categorical_data'].value_counts().reset_index(name='count')

person_id   categorical_data    count
0   person_a    new 2
1   person_b    bad 1
2   person_b    ok  1
3   person_c    bad 1
4   person_c    new 1

然后我尝试这样做来创建新列:

pivoted = count_categories.set_index(['person_id','categorical_data']).unstack('categorical_data')


count
categorical_data    bad new ok
person_id           
person_a    NaN 2.0 NaN
person_b    1.0 NaN 1.0
person_c    1.0 1.0 NaN

这是我想要的形式,但我对多重索引感到困惑

我怎样才能摆脱索引,或者有更好的方法来做到这一点?尝试重置索引产量:

pivoted.reset_index() 

    person_id   count
categorical_data        bad new ok
0   person_a    NaN 2.0 NaN
1   person_b    1.0 NaN 1.0
2   person_c    1.0 1.0 NaN
pandas data-science pivot-table analytics multi-index
1个回答
0
投票

代码

使用

crosstab

out = pd.crosstab(df['person_id'], df['categorical_data'])

categorical_data  bad  new  ok
person_id                     
person_a            0    2   0
person_b            1    0   1
person_c            1    1   0

或者

out1 = (pd.crosstab(df['person_id'], df['categorical_data'])
          .reset_index()
          .rename_axis(None, axis=1)
)

输出1

    person_id   bad new ok
0   person_a    0   2   0
1   person_b    1   0   1
2   person_c    1   1   0
© www.soinside.com 2019 - 2024. All rights reserved.