Naive Bayes在亚马逊精美食品评论数据集上实施的问题

问题描述 投票:0回答:1

cv accuracy cv accuracy graph test accuracy

我正在尝试在亚马逊的精美食品评论数据集上实施Naive bayes。您能否查看代码并说明为什么交叉验证准确性和测试准确性之间存在如此大的差异?

从概念上讲,下面的代码有什么问题吗?

#BOW()

from sklearn.feature_extraction.text import CountVectorizer
bow = CountVectorizer(ngram_range = (2,3))
bow_vect = bow.fit(X_train["F_review"].values)
bow_sparse = bow_vect.transform(X_train["F_review"].values)
X_bow = bow_sparse
y_bow = y_train



roc = []
accuracy = []
f1 = []
k_value = []
for i in range(1,50,2):
  BNB =BernoulliNB(alpha =i)

  print("************* for alpha = ",i,"*************")
  x = (cross_validate(BNB, X_bow,y_bow, scoring = ['accuracy','f1','roc_auc'], return_train_score = False, cv = 10))
  print(x["test_roc_auc"].mean())
  print("-----c------break------c-------break-------c-----------")
  roc.append(x['test_roc_auc'].mean())#This is the ROC metric
  accuracy.append(x['test_accuracy'].mean())#This is the accuracy metric
  f1.append(x['test_f1'].mean())#This is the F1 score

  k_value.append(i)


#BOW Test prediction
BNB =BernoulliNB(alpha= 1)
BNB.fit(X_bow, y_bow)
y_pred = BNB.predict(bow_vect.transform(X_test["F_review"]))
print("Accuracy Score: ",accuracy_score(y_test,y_pred))
print("ROC: ", roc_auc_score(y_test,y_pred))
print("Confusion Matrix: ", confusion_matrix(y_test,y_pred))
machine-learning nlp data-science bayesian naivebayes
1个回答
0
投票

使用其中一个指标来查找最佳Alpha值。然后训练BernoulliNB测试数据。

并且不考虑性能测量的准确性,因为它容易出现不平衡的数据集。

在做任何事情之前,请在评论中更改Kalsi提到的循环中给出的值。

  • 如上所述在列表中具有alpha值
  • 找到最大AUC值及其索引。
  • 使用上面的索引来查找最佳alpha。
© www.soinside.com 2019 - 2024. All rights reserved.