Python 创建分类变量错误

问题描述 投票:0回答:1

我需要为 RAM 类别创建分类变量。

基本:RAM [0-4]

中级:RAM [5-8]

高级:RAM [8-12]

命令:

df['Memory']=pd.cut(df['RAM '], [0,4,8,12], include_lowest=True, labels=['Basic','Intermediate', 'Advaced'])

错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-58-5c93d7c00ba2> in <cell line: 1>()
----> 1 df['Memory']=pd.cut(df['RAM '], [0,4,8,12], include_lowest=True, labels=['Basic', 'Intermediate', 'Advaced'])

1 frames
/usr/local/lib/python3.9/dist-packages/pandas/core/reshape/tile.py in _bins_to_cuts(x, bins, right, labels, precision, include_lowest, dtype, duplicates, ordered)
    425 
    426     side: Literal["left", "right"] = "left" if right else "right"
--> 427     ids = ensure_platform_int(bins.searchsorted(x, side=side))
    428 
    429     if include_lowest:

TypeError: '<' not supported between instances of 'int' and 'str'

你能帮我解决这个问题吗?我是 Python 的新手。

python dataset categorical-data
1个回答
1
投票

看起来你的列中有类似数值的值

RAM
,所以使用
to_numeric

df['Memory']= pd.cut(pd.to_numeric(df['RAM '], errors="coerce"), bins=[0,4,8,12],
                     include_lowest=True, labels=['Basic', 'Intermediate', 'Advaced'])

有一个例子

df = pd.DataFrame({"RAM": np.random.randint(low=1, high=12, size=100).astype(str)})
​
​
df["Memory"] = pd.cut(pd.to_numeric(df["RAM"], errors="coerce"),
                          bins=[0, 4, 8, 12], labels=["Basic", "Intermediate", "Advaced"])

输出:

   RAM        Memory
0    2         Basic
1    2         Basic
2    6  Intermediate
..  ..           ...
97   6  Intermediate
98   1         Basic
99   7  Intermediate

[100 rows x 2 columns]
© www.soinside.com 2019 - 2024. All rights reserved.