我试图使用不同的分类器来训练机器来预测信用卡持有人的违约概率,但高斯朴素贝叶斯给出的准确性非常低
python代码如下:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import model_selection
from sklearn import linear_model
from sklearn import svm
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.tree import DecisionTreeClassifier
from sklearn.neural_network import MLPClassifier
data=pd.read_csv('D:\\Data.csv')
ccard=data.copy()
X = ccard.drop('default payment next month',axis=1)
X=X.drop('ID',axis=1)
y = ccard['default payment next month']
tX,tsX,ty,tsy=model_selection.train_test_split(X,y,test_size=0.2,random_state=57)
clf_lgr=linear_model.LogisticRegression(fit_intercept=True,max_iter=8000)
clf_lgr.fit(tX,ty)
print("The train accuracy of the Logistic Regression Model is:",clf_lgr.score(tX,ty))
print("The test accuracy of the Logistic Regression Model is:",clf_lgr.score(tsX,tsy))
neighbors = np.arange(1,9)
train_accuracy =np.empty(len(neighbors))
test_accuracy = np.empty(len(neighbors))
clf_knn = KNeighborsClassifier(n_neighbors=10)
clf_knn.fit(tX,ty)
print("The train accuracy of the Kth Nearest Neighbour Model is:",clf_knn.score(tX,ty))
print("The test accuracy of the Kth Nearest Neighbour Model is:",clf_knn.score(tsX,tsy))
clf_lda=LinearDiscriminantAnalysis()
clf_lda.fit(tX,ty)
print("The train accuracy of the Discriminant Analysis Model is:",clf_lda.score(tX,ty))
print("The test accuracy of the Discriminant Analysis Model is:",clf_lda.score(tsX,tsy))
clf_nbc=GaussianNB()
clf_nbc.fit(tX,ty)
print("The train accuracy of the Naive Bayes Model is:",clf_nbc.score(tX,ty))
print("The test accuracy of the Naive Bayes Model is:",clf_nbc.score(tsX,tsy))
高斯朴素贝叶斯的准确率在训练和测试中都只有 35% 左右。任何人都可以建议改进它
高斯朴素贝叶斯是基于特征是独立的假设,而这个假设在实践中可能并不总是成立。
有几种方法可以尝试提高高斯朴素贝叶斯解决问题的性能:
值得注意的是,准确性并不总是评估分类器性能的最佳指标。您可能需要考虑使用其他指标,例如精度、召回率或 F1 分数,具体取决于您的问题的具体情况。