带有MNIST数据集的Sklearn SVC:始终与数字5错误吗?

问题描述 投票:0回答:1

我已经建立了一个非常简单的SVC对MNIST数字进行分类。出于某种原因,分类器非常一致地[[不正确]]预测数字5,但是在尝试所有其他数字时,它不会遗漏单个数字。是否有人知道我是否可能对此设置错误,或者在预测数字5时真的不好?]import numpy as np from sklearn.model_selection import train_test_split from sklearn import datasets from sklearn.svm import SVC from sklearn.metrics import confusion_matrix data = datasets.load_digits() images = data.images targets = data.target # Split into train and test sets images_train, images_test, imlabels_train, imlabels_test = train_test_split(images, targets, test_size=.2, shuffle=False) # Re-shape data so that it's 2D images_train = np.reshape(images_train, (np.shape(images_train)[0], 64)) images_test = np.reshape(images_test, (np.shape(images_test)[0], 64)) svm_classifier = SVC(gamma='auto').fit(images_train, imlabels_train) number_correct_svc = 0 preds = [] for label_index in range(len(imlabels_test)): pred = svm_classifier.predict(images_test[label_index].reshape(1,-1)) if pred[0] == imlabels_test[label_index]: number_correct_svc += 1 preds.append(pred[0]) print("Support Vector Classifier...") print(f"\tPercent correct for all test data: {100*number_correct_svc/len(imlabels_test)}%") confusion_matrix(preds,imlabels_test)

这里是结果混淆矩阵:

array([[22, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 15, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 15, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 21, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 21, 0, 0, 0, 0, 0], [13, 21, 20, 16, 16, 37, 23, 20, 31, 16], [ 0, 0, 0, 0, 0, 0, 14, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 16, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 2, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 21]], dtype=int64)

我一直在阅读SVC的sklearn页面,但无法确定我在做什么错

更新:

我尝试使用SCV(gamma ='scale'),这似乎更合理。知道为什么“ auto”不起作用仍然很高兴。带有比例尺:

array([[34, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 36, 0, 0, 0, 0, 0, 0, 1, 0], [ 0, 0, 35, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 27, 0, 0, 0, 0, 0, 1], [ 1, 0, 0, 0, 34, 0, 0, 0, 0, 0], [ 0, 0, 0, 2, 0, 37, 0, 0, 0, 1], [ 0, 0, 0, 0, 0, 0, 37, 0, 0, 0], [ 0, 0, 0, 2, 0, 0, 0, 35, 0, 1], [ 0, 0, 0, 6, 1, 0, 0, 1, 31, 1], [ 0, 0, 0, 0, 2, 0, 0, 0, 1, 33]], dtype=int64)

我已经建立了一个非常简单的SVC对MNIST数字进行分类。出于某种原因,分类器始终会错误地预测数字5,但是尝试其他所有数字时,分类器不会......>
python scikit-learn svm confusion-matrix
1个回答
0
投票
第二个问题更容易处理。事情是在RBF内核中,γ表示决策边界的摆动程度。 “摆动”是什么意思?伽玛值越高,决策边界将越精确。 SVM的决策边界。

如果传递了gamma='scale'(默认值),则它将1 / (n_features *X.var())用作伽玛值,
© www.soinside.com 2019 - 2024. All rights reserved.