KNN在平衡数据后找不到类

Question

我有一个奇怪的问题，我有一个包含4个聚类的模型，数据按以下比例不平衡：75％，15％，7％和3％。我将其分成火车并以80/20的比例进行测试，然后与5个邻居一起训练KNN，这给我带来了1的实时性。

sss = StratifiedShuffleSplit(n_splits=5, test_size=0.2, random_state=42)

train_index, test_index = next(sss.split(X, y))

x_train, y_train = X[train_index], y[train_index]
x_test, y_test = X[test_index], y[test_index]

KNN_final = KNeighborsClassifier()
KNN_final.fit(x_train, y_train)

y_pred = KNN_final.predict(x_test)

print('Avg. accuracy for all classes:', metrics.accuracy_score(y_test, y_pred))
print('Classification report: \n',metrics.classification_report(y_test, y_pred, digits=2))

Avg. accuracy for all classes: 1.0
Classification report: 
               precision    recall  f1-score   support

           0       1.00      1.00      1.00       140
           1       1.00      1.00      1.00        60
           2       1.00      1.00      1.00       300
           3       1.00      1.00      1.00      1500

    accuracy                           1.00      2000
   macro avg       1.00      1.00      1.00      2000
weighted avg       1.00      1.00      1.00      2000
尽管看起来很奇怪，但我继续研究，获取新数据并尝试根据此模型对其进行分类，但是它始终找不到百分比较小的类别，但始终将其分类为第二低的类别。因此，我尝试使用带有SMOTEENN算法的不平衡学习库来平衡数据：

Original dataset shape Counter({3: 7500, 2: 1500, 0: 700, 1: 300})

sme = SMOTEENN(sampling_strategy='all', random_state=42)
X_res, y_res = sme.fit_resample(X, y)
print('Resampled dataset shape %s' % Counter(y_res))

Resampled dataset shape Counter({0: 7500, 1: 7500, 2: 7500, 3: 7500})
然后，我做同样的事情，将其分成训练并以80/20的相同比例进行测试，并训练一个具有5个邻居的新KNN分类器。但是分类报告现在似乎更加糟糕：

Avg. accuracy for all classes: 1.0 Classification report: precision recall f1-score support 0 1.00 1.00 1.00 1500 1 1.00 1.00 1.00 500 accuracy 1.00 2000 macro avg 1.00 1.00 1.00 2000 weighted avg 1.00 1.00 1.00 2000

我看不到我做错了什么，在训练新的分类器之前，除了对数据进行重采样之外，除了拆分和混洗之外，我还需要做任何其他处理吗？为什么我的KNN现在看不到4个课程？

我有一个奇怪的问题，我有一个包含4个聚类的模型，数据按以下比例不平衡：75％，15％，7％和3％。我将其分成火车并以80/20的比例进行测试，然后我...

Answer 1

尽管全面调查需要您提供的数据，但您没有提供，但这种行为（至少部分地）与以下情况一致：

KNN在平衡数据后找不到类

问题描述投票：0回答：1

1个回答

最新问题

KNN在平衡数据后找不到类

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1