列中每个类别的唯一ID

Question

我想将一个数据框拆分为两个数据框，其中包括2种数据类型（品牌所有者和产品）。

原始数据框：

>>> products
        product_id     brand_owner     product_name
0       344606         Cargill         A
1       344607         Red Gold        B
2       344608         FooBar          C
3       344609         Red Gold        D
4       344610         Cargill         E

我想将brand_owner提取到另一个数据框中，就像规范化数据库一样：

>>> brand_owners = pd.DataFrame(branded_foods['brand_owner'].unique())
>>> brand_owners
                        0
0                 Cargill
1      Kellogg Company Us
2                Kashi Us
3                Red Gold
4          Conagra Brands
...                   ...

我给它的行一个ID（同样，作为数据库主键）

>>> brand_owners.index += 1
>>> brand_owners['id'] = brand_owners.index
>>> brand_owners
                        0     id
1                 Cargill      1
2      Kellogg Company Us      2
3                Kashi Us      3
4                Red Gold      4
5          Conagra Brands      5
...                   ...    ...

[25202 rows x 2 columns]
>>> brand_owners.columns = ['name', 'id']
>>> brand_owners
                     name     id
1                 Cargill      1
2      Kellogg Company Us      2
3                Kashi Us      3
4                Red Gold      4
5          Conagra Brands      5
...                   ...    ...

现在我想将此ID返回到原始数据框中，所以它将看起来像：

        product_id     brand_owner     product_name
0       344606         1               A
1       344607         4               B
2       344608         45              C
3       344609         4               D
4       344610         1               E

我如何在熊猫中进行此更新：更新产品p设置p.brand_owner =（从brand_owners b中选择id，其中b.name = p.brand_owner）

Answer 1

您可以直接用brand_owner对pd.factorize中的类别进行编码：

pd.factorize

df['brand_owner'] = pd.factorize(df.brand_owner)[0]

列中每个类别的唯一ID

问题描述投票：1回答：1

1个回答

最新问题

列中每个类别的唯一ID

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1