GridSearchCV 机器学习

问题描述 投票:0回答:0

我使用 GridSearch 来查找此决策树的相对最佳超参数(以及 K-Fold CV,以评估模型的性能)。请查看代码和输出结果中的“最佳结果”行。

为什么它没有给我任何有关标准的信息(例如是否使用熵或基尼)?

当我用我编写的其他代码运行测试时,它起作用了,但提供的信息不正确(例如,根据 GridSearch,熵更适合该模型,而实际上,当我运行手动测试时,基尼提供了更好的结果)准确性和召回率(但是,对于精度而言,熵更好,但结果应基于代码中指定的准确性)。对于最大深度,它建议使用值 7,而实际上 9 或更多会给出更好的结果。

import pandas as pd
from sklearn import tree
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score, classification_report
from matplotlib import pyplot as plt
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import StratifiedKFold
import numpy as np
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
column_names = ['file_path', '50', '100', '250', '500', '1000', 'r50', 'r100', 'r250', 'r500', 'r1000', 'rfile', 'class2']
df = pd.read_csv("C:/Folder/deftxt - copy.csv", sep = ';', header = 0, names = column_names)
    
x = df.drop(['class2', 'file_path'], axis=1)
df['class2'] = df['class2'].astype(int)
y = df['class2'].values
    
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, shuffle = True, random_state = 100)
print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)
    
model = DecisionTreeClassifier(random_state=100)
model.fit(x_train, y_train)
model.get_params()
    
k_fold_acc = cross_val_score(model, x_train, y_train, cv=10)
k_fold_mean = k_fold_acc.mean()
for i in k_fold_acc:
    print(i)
print("accuracy K Fold CV:" + str(k_fold_mean))
    
param_dist={
    "criterion":["gini", "entropy"],
    "max_depth":[1,2,3,4,5,6,7, None],
    "min_samples_split":[2,3,4,5],
}
grid = GridSearchCV(model, param_grid=param_dist, cv=10, n_jobs=-1, scoring='accuracy', verbose=1)
grid.fit(x_train, y_train)
    
print("The best results:" + str(grid.best_estimator_))
    
fn = ['50', '100', '250', '500', '1000', '-50', '-100', '-250', '-500', '-1000', 'total']
cn = ['ClassA', 'ClassB']
    
grid_predictions = grid.predict(x_test)
print(classification_report(y_test, grid_predictions))

输出:

(1369, 11) (587, 11) (1369,) (587,)
0.9927007299270073
0.9927007299270073
0.9781021897810219
0.9927007299270073
0.9927007299270073
0.9854014598540146
0.9854014598540146
0.9927007299270073
0.9781021897810219
0.9779411764705882
accuracy K Fold CV:0.9868452125375698
Fitting 10 folds for each of 64 candidates, totalling 640 fits
The best results:DecisionTreeClassifier(max_depth=7, random_state=100)
                precision    recall  f1-score   support
    
            0       0.98      0.97      0.97       174
            1       0.99      0.99      0.99       413
    
    accuracy                           0.98       587
    macro avg       0.98      0.98      0.98       587
weighted avg       0.98      0.98      0.98       587
    
    
Process finished with exit code 0
python machine-learning scikit-learn decision-tree gridsearchcv
© www.soinside.com 2019 - 2024. All rights reserved.