我需要为 RAM 类别创建分类变量。
基本:RAM [0-4]
中级:RAM [5-8]
高级:RAM [8-12]
命令:
df['Memory']=pd.cut(df['RAM '], [0,4,8,12], include_lowest=True, labels=['Basic','Intermediate', 'Advaced'])
错误:
TypeError Traceback (most recent call last)
<ipython-input-58-5c93d7c00ba2> in <cell line: 1>()
----> 1 df['Memory']=pd.cut(df['RAM '], [0,4,8,12], include_lowest=True, labels=['Basic', 'Intermediate', 'Advaced'])
1 frames
/usr/local/lib/python3.9/dist-packages/pandas/core/reshape/tile.py in _bins_to_cuts(x, bins, right, labels, precision, include_lowest, dtype, duplicates, ordered)
425
426 side: Literal["left", "right"] = "left" if right else "right"
--> 427 ids = ensure_platform_int(bins.searchsorted(x, side=side))
428
429 if include_lowest:
TypeError: '<' not supported between instances of 'int' and 'str'
你能帮我解决这个问题吗?我是 Python 的新手。
看起来你的列中有类似数值的值
RAM
,所以使用to_numeric
:
df['Memory']= pd.cut(pd.to_numeric(df['RAM '], errors="coerce"), bins=[0,4,8,12],
include_lowest=True, labels=['Basic', 'Intermediate', 'Advaced'])
有一个例子:
df = pd.DataFrame({"RAM": np.random.randint(low=1, high=12, size=100).astype(str)})
df["Memory"] = pd.cut(pd.to_numeric(df["RAM"], errors="coerce"),
bins=[0, 4, 8, 12], labels=["Basic", "Intermediate", "Advaced"])
输出:
RAM Memory
0 2 Basic
1 2 Basic
2 6 Intermediate
.. .. ...
97 6 Intermediate
98 1 Basic
99 7 Intermediate
[100 rows x 2 columns]