我正在使用 KNN 方法查找数据集的缺失值,这是实现代码;
from numpy import isnan
from pandas import read_csv
from sklearn.impute import SimpleImputer
from sklearn.impute import KNNImputer
# split into input and output elements
dataframe = data.values
ix = [i for i in range(dataframe.shape[1]) if i != 23]
X, y = dataframe[:, ix], dataframe[:, 12]
# print total missing
#print('Missing: %d' % sum(isnan(X).flatten()))
# define imputer
imputer = KNNImputer()
# fit on the dataset
imputer.fit(X)
# transform the dataset
Xtrans = imputer.transform(X)
# print total missing
print('Missing: %d' % sum(isnan(Xtrans).flatten()))
我收到以下错误:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-31-c1a2d7b9e1a6> in <cell line: 15>()
13 imputer = KNNImputer()
14 # fit on the dataset
---> 15 imputer.fit(X)
16 # transform the dataset
17 Xtrans = imputer.transform(X)
3 frames /usr/local/lib/python3.9/dist-packages/sklearn/utils/_array_api.py in _asarray_with_order(array, dtype, order, copy, xp)
183 if xp.__name__ in {"numpy", "numpy.array_api"}:
184 # Use NumPy API to support order
--> 185 array = numpy.asarray(array, order=order, dtype=dtype)
186 return xp.asarray(array, copy=copy)
187 else:
ValueError: could not convert string to float: 'LP001002'
下图是我的问题的数据集: