import pandas as pd
import numpy as np
df = pd.read_csv("dirtydata.csv")
dfn = df.convert_dtypes()
bike_sales_ds = dfn.copy()
# Create new age column with general age range groups
age_conditions = [
(bike_sales_ds['Age'] <= 30),
(bike_sales_ds['Age'] >= 31) & (bike_sales_ds['Age'] <= 40),
(bike_sales_ds['Age'] >= 41) & (bike_sales_ds['Age'] <= 55),
(bike_sales_ds['Age'] >= 56) & (bike_sales_ds['Age'] <= 69),
(bike_sales_ds['Age'] >= 70)
]
age_choices = ['30 or Less', '31 to 40', '41 to 55', '56 to 69', '70 or Older']
bike_sales_ds['Age_Range'] = np.select(age_conditions, age_choices, default='error')
我没有创建这个数据集。我前一段时间从 youtube 视频中得到它。该视频与熊猫无关。
错误
回溯(最后一次通话): 文件“C:\Users\dmcfa\PycharmProjects\Bike Sales Data Cleaning 01\main.py”,第 43 行,位于 bike_sales_ds['Age_Range'] = np.select(age_conditions, age_choices, default=0) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 选择文件“<array_function internals>”,第 200 行 文件“C:\Users\dmcfa\PycharmProjects\Bike Sales Data Cleaning 01 env\Lib\site-packages umpy\lib unction_base.py”,第 845 行,在选择中 提高类型错误( TypeError:condlist 中的无效条目 0:应该是 boolean ndarray
这解决了我的错误:
df.convert_dtypes(convert_integer=False)
但首先是什么原因造成的呢? pd.info() 表示该列是 Int64 是否使用 df.convert_dtypes()。
您的代码适用于我的输入数据框。但是,您可以使用
pd.cut
来检查问题是否仍然存在:
age_conditions = [0, 30, 40, 55, 69, np.inf]
age_choices = ['30 or Less', '31 to 40', '41 to 55', '56 to 69', '70 or Older']
bike_sales_ds['Age_Range'] = pd.cut(bike_sales_ds['Age'],
bins=age_conditions,
labels=age_choices)
输出:
>>> bike_sales_ds
Age Age_Range
0 87 70 or Older
1 25 30 or Less
2 70 70 or Older
3 55 41 to 55
4 33 31 to 40
.. ... ...
95 89 70 or Older
96 79 70 or Older
97 67 56 to 69
98 71 70 or Older
99 78 70 or Older
[100 rows x 2 columns]
输入:
import pandas as pd
import numpy as np
np.random.seed(2023)
bike_sales_ds = pd.DataFrame({'Age': np.random.randint(0, 100, 100)})