我正在处理Titanic数据集。我正在尝试使用以下代码将SVM应用于许多单独功能:
quanti_vars = ['Age','Pclass','Fare','Parch']
imp_med = SimpleImputer(missing_values=np.nan, strategy='median')
imp_med.fit(titanic[['Age']])
for i in (X_train, X_test):
i[['Age']] = imp_med.transform(i[['Age']])
svm_clf = SVC()
svm_clf.fit(X_train[quanti_vars], y_train)
y_pred = svm_clf.predict(X_test[quanti_vars])
svm_accuracy = accuracy_score(y_pred, y_test)
svm_accuracy
for i in quanti_vars:
svm_clf.fit(X_train[i], y_train)
y_pred = svm_clf.predict(X_test[i])
svm_accuracy = accuracy_score(y_pred, y_test)
print(i,': ',svm_accuracy)
那个最后的for
循环抛出ValueError: Expected 2D array, got 1D array instead
错误,我不知道为什么-SVM不能在单个功能上运行吗?
我意识到,很简单,我需要将i
放在括号中以正确地子集化。因此:
for i in quanti_vars:
svm_clf.fit(X_train[[i]], y_train)
y_pred = svm_clf.predict(X_test[[i]])
svm_accuracy = accuracy_score(y_pred, y_test)
print(i,': ',svm_accuracy)
产生
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)
Age : 0.5874125874125874
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)
Pclass : 0.5874125874125874
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)
Fare : 0.42657342657342656
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)
Parch : 0.6153846153846154
((我不会假装它很好,但是至少它能起作用。)
很简单只需写成这样:
y_pred = svm_clf.predict([X_test[i]])
添加[]会将其转换为2D数组