One Hot Encoding:ValueError:无法将字符串转换为float:'是'

问题描述 投票:0回答:2

我在类别值上尝试oneHotEncoder

然而,它失败了以下错误。什么可能是错的?请帮助,任何评论都欢迎。

以下是代码段

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
print(X.shape)
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
X[:, 1] = labelencoder_X.fit_transform(X[:, 1])
print(X)
print(X.shape)
print(y)
#X = X.reshape(len(X[:, 0]), 7)
print(X.shape)
onehotencoder = OneHotEncoder(categorical_features = [0])
X = onehotencoder.fit_transform(X).toarray()
print(X.shape)
print(X)

================================================== =================代码的输出如下所示看起来问题是数组格式

 I am a getting following ouput 
(17, 7)
[[2 0 0 'Offline' 'Low' 'Cold' 'No']
 [0 0 0 'Offline' 'High' 'Cold' 'No']
 [3 0 1 'Online' 'High' 'Cold' 'Yes']
 [2 0 1 'Offline' 'Low' 'Hot' 'Yes']
 [2 0 1 'Offline' 'High' 'Hot' 'Yes']
 [2 0 0 'Online' 'High' 'Cold' 'Yes']
 [2 1 1 'Offline' 'Low' 'Hot' 'No']
 [2 1 0 'Offline' 'Low' 'Cold' 'No']
 [0 1 0 'Online' 'Low' 'Cold' 'Yes']
 [3 1 1 'Online' 'Low' 'Hot' 'Yes']
 [1 1 0 'Offline' 'Low' 'Hot' 'No']
 [2 1 1 'Offline' 'Low' 'Hot' 'Yes']
 [3 1 1 'Online' 'High' 'Hot' 'Yes']
 [2 1 0 'Online' 'High' 'Hot' 'No']
 [2 2 2 'Offline' 'Low' 'Hot' 'Yes']
 [2 2 1 'Offline' 'Low' 'Cold' 'No']
 [1 2 0 'Offline' 'High' 'Cold' 'Yes']]
(17, 7)
['Low' 'Low' 'High' 'High' 'High' 'Low' 'Low' 'Low' 'Low' 'High' 'Low'
 'High' 'High' 'High' 'High' 'Low' 'Low']
(17, 7)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-42-84bec98371d4> in <module>()
     28 print(X.shape)
     29 onehotencoder = OneHotEncoder(categorical_features = [0])
---> 30 X = onehotencoder.fit_transform(X).toarray()
     31 print(X.shape)
     32 print(X)

C:\Users\patilsi\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\sklearn\preprocessing\data.py in fit_transform(self, X, y)
   2017         """
   2018         return _transform_selected(X, self._fit_transform,
-> 2019                                    self.categorical_features, copy=True)
   2020 
   2021     def _transform(self, X):

C:\Users\patilsi\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\sklearn\preprocessing\data.py in _transform_selected(X, transform, selected, copy)
   1807     X : array or sparse matrix, shape=(n_samples, n_features_new)
   1808     """
-> 1809     X = check_array(X, accept_sparse='csc', copy=copy, dtype=FLOAT_DTYPES)
   1810 
   1811     if isinstance(selected, six.string_types) and selected == "all":

C:\Users\patilsi\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    431                                       force_all_finite)
    432     else:
--> 433         array = np.array(array, dtype=dtype, order=order, copy=copy)
    434 
    435         if ensure_2d:

(17, 7)
[[2 0 0 'Offline' 'Low' 'Cold' 'No']
 [0 0 0 'Offline' 'High' 'Cold' 'No']
 [3 0 1 'Online' 'High' 'Cold' 'Yes']
 [2 0 1 'Offline' 'Low' 'Hot' 'Yes']
 [2 0 1 'Offline' 'High' 'Hot' 'Yes']
 [2 0 0 'Online' 'High' 'Cold' 'Yes']
 [2 1 1 'Offline' 'Low' 'Hot' 'No']
 [2 1 0 'Offline' 'Low' 'Cold' 'No']
 [0 1 0 'Online' 'Low' 'Cold' 'Yes']
 [3 1 1 'Online' 'Low' 'Hot' 'Yes']
 [1 1 0 'Offline' 'Low' 'Hot' 'No']
 [2 1 1 'Offline' 'Low' 'Hot' 'Yes']
 [3 1 1 'Online' 'High' 'Hot' 'Yes']
 [2 1 0 'Online' 'High' 'Hot' 'No']
 [2 2 2 'Offline' 'Low' 'Hot' 'Yes']
 [2 2 1 'Offline' 'Low' 'Cold' 'No']
 [1 2 0 'Offline' 'High' 'Cold' 'Yes']]
(17, 7)
['Low' 'Low' 'High' 'High' 'High' 'Low' 'Low' 'Low' 'Low' 'High' 'Low'
 'High' 'High' 'High' 'High' 'Low' 'Low']
(17, 7)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-42-84bec98371d4> in <module>()
     28 print(X.shape)
     29 onehotencoder = OneHotEncoder(categorical_features = [0])
---> 30 X = onehotencoder.fit_transform(X).toarray()
     31 print(X.shape)
     32 print(X)

C:\Users\patilsi\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\sklearn\preprocessing\data.py in fit_transform(self, X, y)
   2017         """
   2018         return _transform_selected(X, self._fit_transform,
-> 2019                                    self.categorical_features, copy=True)
   2020 
   2021     def _transform(self, X):

C:\Users\patilsi\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\sklearn\preprocessing\data.py in _transform_selected(X, transform, selected, copy)
   1807     X : array or sparse matrix, shape=(n_samples, n_features_new)
   1808     """
-> 1809     X = check_array(X, accept_sparse='csc', copy=copy, dtype=FLOAT_DTYPES)
   1810 
   1811     if isinstance(selected, six.string_types) and selected == "all":

C:\Users\patilsi\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    431                                       force_all_finite)
    432     else:
--> 433         array = np.array(array, dtype=dtype, order=order, copy=copy)
    434 
    435         if ensure_2d:

ValueError: could not convert string to float: 'Yes'
python-3.x machine-learning scikit-learn
2个回答
0
投票

您应该在您想要的列上应用OneHotEncoder:

from sklearn.preprocessing import LabelEncoder, OneHotEncoder

onehotencoder = OneHotEncoder()
X_0 = onehotencoder.fit_transform(X[:, 0]).toarray()
X_1 = onehotencoder.fit_transform(X[:, 1]).toarray()

这将返回2个与X相同行数的矩阵和基于X[:, 0]X[:, 1]中不同值的数量的列数

在您自由合并矩阵或其他任何内容之后。如果您想知道列或特定类别,可以访问onehotencoder.feature_indices_ 但是当您使用相同的OHE时,您将丢失功能X0的信息。

我希望它有所帮助,


0
投票

即使您指定categorical_features = [0],OneHotEncoder仍会检查所有数据(所有列)与scikit-learn兼容,因此当其他列包含字符串数据时会抛出错误。

那么你在这里可以做的是只发送你想要虚拟编码的数据: -

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
print(X.shape)
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
X[:, 1] = labelencoder_X.fit_transform(X[:, 1])
print(X)
print(X.shape)
print(y)
#X = X.reshape(len(X[:, 0]), 7)
print(X.shape)

onehotencoder = OneHotEncoder()

categorical_features = [0]
# Send only the first column to onehotencoder
X_oneHotEncoded = onehotencoder.fit_transform(X[:, categorical_features]).toarray()

# Combine the two arrays back together
X_final = np.hstack((X_oneHotEncoded, X[:,1:]))
© www.soinside.com 2019 - 2024. All rights reserved.