机器学习后指标结果相同的问题

问题描述 投票:0回答:1

在数据集上尝试机器学习时,我在不同的机器学习算法上得到了相同的指标结果,例如准确性和 F 分数。

我有一个数据集,用于训练我选择的算法。我在 Kaggle 网站上找到了它:source

以下是 Jupiter 文件中的代码片段及其执行结果:

连接的图书馆列表

中:

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from nltk.corpus import stopwords
from sklearn.metrics import accuracy_score, f1_score
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report
import joblib
import tensorflow as tf
import numpy as np
from tensorflow.keras import models, layers
import warnings

warnings.filterwarnings('ignore')

加载数据集

中:

df = pd.read_csv("payload_mini.csv",encoding='utf-16')
df.head(10)

加载、处理和分割数据以进一步训练分类模型

中:

df = pd.read_csv("payload_mini.csv",encoding='utf-16')

df = df[(df['attack_type'] == 'sqli') | (df['attack_type'] == 'norm')]

X = df['payload']
y = df['label']

vectorizer = CountVectorizer(min_df = 2, max_df = 0.8, stop_words = stopwords.words('english'))
X = vectorizer.fit_transform(X.values.astype('U')).toarray()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

输出:

(8040, 1585)
(8040,)
(2011, 1585)
(2011,)

朴素贝叶斯分类器

中:

nb_clf = GaussianNB()
nb_clf.fit(X_train, y_train)
y_pred = nb_clf.predict(X_test)
print(f"Accuracy of Naive Bayes on test set : {accuracy_score(y_pred, y_test)}")
print(f"F1 Score of Naive Bayes on test set : {f1_score(y_pred, y_test, pos_label='anom')}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

输出:

Accuracy of Naive Bayes on test set : 0.9806066633515664
F1 Score of Naive Bayes on test set : 0.9735234215885948

Classification Report:
              precision    recall  f1-score   support

        anom       0.97      0.98      0.97       732
        norm       0.99      0.98      0.98      1279

    accuracy                           0.98      2011
   macro avg       0.98      0.98      0.98      2011
weighted avg       0.98      0.98      0.98      2011

随机森林算法:

中:

rf_clf = RandomForestClassifier()
rf_clf.fit(X_train, y_train)
y_pred_rf = rf_clf.predict(X_test)
print(f"Accuracy of Random Forest on test set : {accuracy_score(y_pred, y_test)}")
print(f"F1 Score of Random Forest on test set : {f1_score(y_pred, y_test, pos_label='anom')}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred_rf))

输出:

Accuracy of Random Forest on test set : 0.9806066633515664
F1 Score of Random Forest on test set : 0.9735234215885948

Classification Report:
              precision    recall  f1-score   support

        anom       1.00      0.96      0.98       732
        norm       0.98      1.00      0.99      1279

    accuracy                           0.99      2011
   macro avg       0.99      0.98      0.99      2011
weighted avg       0.99      0.99      0.99      2011

支持向量机

中:

svm_clf = SVC(gamma = 'auto')
svm_clf.fit(X_train, y_train)
y_pred = svm_clf.predict(X_test)
print(f"Accuracy of SVM on test set : {accuracy_score(y_pred, y_test)}")
print(f"F1 Score of SVM on test set: {f1_score(y_pred, y_test, pos_label='anom')}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

输出:

Accuracy of SVM on test set : 0.9189457981103928
F1 Score of SVM on test set: 0.8658436213991769

Classification Report:
              precision    recall  f1-score   support

        anom       1.00      0.76      0.87       689
        norm       0.89      1.00      0.94      1322

    accuracy                           0.92      2011
   macro avg       0.95      0.88      0.90      2011
weighted avg       0.93      0.92      0.92      2011

正如您所看到的,在使用不同的机器学习算法进行训练时,我们在随机森林和朴素贝叶斯分类器的情况下得到了相同的结果。 我希望你能帮助我修复代码中可能存在的错误或以某种方式改进它。

python tensorflow machine-learning scikit-learn sql-injection
1个回答
0
投票

在随机森林代码中,您将预测存储为

y_pred_rf
,但在
y_pred

上调用指标
© www.soinside.com 2019 - 2024. All rights reserved.