列中每个类别的唯一ID

问题描述 投票:1回答:1

我想将一个数据框拆分为两个数据框,其中包括2种数据类型(品牌所有者和产品)。

原始数据框:

>>> products
        product_id     brand_owner     product_name
0       344606         Cargill         A
1       344607         Red Gold        B
2       344608         FooBar          C
3       344609         Red Gold        D
4       344610         Cargill         E

我想将brand_owner提取到另一个数据框中,就像规范化数据库一样:

>>> brand_owners = pd.DataFrame(branded_foods['brand_owner'].unique())
>>> brand_owners
                        0
0                 Cargill
1      Kellogg Company Us
2                Kashi Us
3                Red Gold
4          Conagra Brands
...                   ...

我给它的行一个ID(同样,作为数据库主键)

>>> brand_owners.index += 1
>>> brand_owners['id'] = brand_owners.index
>>> brand_owners
                        0     id
1                 Cargill      1
2      Kellogg Company Us      2
3                Kashi Us      3
4                Red Gold      4
5          Conagra Brands      5
...                   ...    ...

[25202 rows x 2 columns]
>>> brand_owners.columns = ['name', 'id']
>>> brand_owners
                     name     id
1                 Cargill      1
2      Kellogg Company Us      2
3                Kashi Us      3
4                Red Gold      4
5          Conagra Brands      5
...                   ...    ...

现在我想将此ID返回到原始数据框中,所以它看起来像:

        product_id     brand_owner     product_name
0       344606         1               A
1       344607         4               B
2       344608         45              C
3       344609         4               D
4       344610         1               E

我如何在熊猫中进行此更新:更新产品p设置p.brand_owner =(从brand_owners b中选择id,其中b.name = p.brand_owner)

python pandas dataframe
1个回答
2
投票

您可以直接用brand_ownerpd.factorize中的类别进行编码:

pd.factorize

df['brand_owner'] = pd.factorize(df.brand_owner)[0]
© www.soinside.com 2019 - 2024. All rights reserved.