dtype='numeric' 与字节/字符串数组不兼容

问题描述 投票:0回答:1

我尝试对以下数据执行线性回归

   Country   Age   Salary Purchased
0   France  44.0  72000.0        No
1    Spain  27.0  48000.0       Yes
2  Germany  30.0  54000.0        No
3    Spain  38.0  61000.0        No
4  Germany  40.0      NaN       Yes
5   France  35.0  58000.0       Yes
6    Spain   NaN  52000.0        No
7   France  48.0  79000.0       Yes
8  Germany  50.0  83000.0        No
9   France  37.0  67000.0       Yes*

我试过这个:

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
oHe=OneHotEncoder()
ct=ColumnTransformer(transformers=[('encoder',oHe,[0])],remainder='passthrough')
X=np.array(ct.fit_transform(X),dtype=np.str)

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=1)


from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train,y_train)

我收到如下错误消息我什至尝试过重塑它们但没有成功:

ValueError: dtype='numeric' is not compatible with arrays of bytes/strings.Convert your data to numeric values explicitly instead.
python machine-learning linear-regression
1个回答
0
投票

这里我用你的代码做了一个例子:

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

data = {'Country': ['France', 'Spain', 'Germany', 'Spain', 'Germany', 'France', 'Spain', 'France', 'Germany', 'France'],
        'Age': [44.0, 27.0, 30.0, 38.0, 40.0, 35.0, None, 48.0, 50.0, 37.0],
        'Salary': [72000.0, 48000.0, 54000.0, 61000.0, None, 58000.0, 52000.0, 79000.0, 83000.0, 67000.0],
        'Purchased': ['No', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes']}
df = pd.DataFrame(data)
df = df.dropna()

X = df[["Country","Age"]]
y = df["Salary"]

ohe=OneHotEncoder()
ct=ColumnTransformer(transformers=[('encoder',ohe,[0])],remainder='passthrough')
X=np.array(ct.fit_transform(X))

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=1)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train,y_train)
regressor.score(X_test, y_test) #0.705

我对此没有错误。

© www.soinside.com 2019 - 2024. All rights reserved.