如何评估xgboost分类模型的稳定性

Question

我有：

a）一个Python xgboost分类模型

b）自2018年初以来的每周数据集（分类的基础）。每个数据集约有10万行和70列（功能）。

c）通过xgboost模型（使用逻辑回归）对数据集的每周预测结果，格式为：

建模日期；

-项目;

-test_auc_mean每个项目（以百分比为单位）。

自2018年1月以来，总共约有100个数据集和100个prediction_results。

为了评估模型，我使用以下指标：

-auc

-混淆矩阵

-准确性

param = {
    'num_parallel_tree':num_parallel_tree,
    'subsample':subsample,
    'colsample_bytree':colsample_bytree,
    'objective':objective, 
    'learning_rate':learning_rate, 
    'eval_metric':eval_metric, 
    'max_depth':max_depth,
    'scale_pos_weight':scale_pos_weight,
    'min_child_weight':min_child_weight,
    'nthread':nthread,
    'seed':seed
}

bst_cv = xgb.cv(
    param, 
    dtrain,  
    num_boost_round=n_estimators, 
    nfold = nfold,
    early_stopping_rounds=early_stopping_rounds,
    verbose_eval=verbose,
    stratified = stratified
)

test_auc_mean = bst_cv['test-auc-mean']
best_iteration = test_auc_mean[test_auc_mean == max(test_auc_mean)].index[0]

bst = xgb.train(param, 
                dtrain, 
                num_boost_round = best_iteration)

best_train_auc_mean = bst_cv['train-auc-mean'][best_iteration]
best_train_auc_mean_std = bst_cv['train-auc-std'][best_iteration]

best_test_auc_mean = bst_cv['test-auc-mean'][best_iteration]
best_test_auc_mean_std = bst_cv['test-auc-std'][best_iteration]

print('''XGB CV model report
Best train-auc-mean {}% (std: {}%) 
Best test-auc-mean {}% (std: {}%)'''.format(round(best_train_auc_mean * 100, 2), 
                                          round(best_train_auc_mean_std * 100, 2), 
                                          round(best_test_auc_mean * 100, 2), 
                                          round(best_test_auc_mean_std * 100, 2)))

y_pred = bst.predict(dtest)
tn, fp, fn, tp = confusion_matrix(y_test, y_pred>0.9).ravel()


print('''
     | neg | pos |
__________________
true_| {}  | {}  |
false| {}  | {}  |
__________________

'''.format(tn, tp, fn, fp))

predict_accuracy_on_test_set = (tn + tp)/(tn + fp + fn + tp)
print('Test Accuracy: {}%'.format(round(predict_accuracy_on_test_set * 100, 2)))

该模型为我提供了大致的图像（通常，auc在.94和.96之间）问题是某些特定项目的预测变异性非常高（今天一个项目为正，明天一个项目为负，后天-再次为正）]

我想评估模型的稳定性。换句话说，我想知道它生成了多少个结果可变的项目。最后，我想确保模型能够以最小的波动产生稳定的结果。您对如何执行此操作有任何想法吗？

Answer 1

这正是交叉验证的目标。由于您已经做过，因此您只能评估评估指标的标准差，因此您也已经进行过...

您可以尝试一些新的指标，例如精度，召回率，f1得分或fn得分，以不同的方式衡量成功和失败的程度，但看起来您几乎无法解决问题。您依赖于此处输入的数据：s

您可以花一些时间来训练人口分布，并尝试确定人口的哪个部分随时间波动。

您也可以尝试预测proba而不是分类，以评估模型是否远远超过其阈值。

这两个解决方案更像是侧面解决方案。：（

Answer 2

Gwendal，谢谢。您能指定您提到的2种方法吗？1）如何训练人口分布？通过K聚类还是其他无监督学习方法？2）例如我预测了_proba（1个特定项目的图表-在附件中）。我如何评估模型是否远远超过其阈值？通过比较带有真实标签的每个项目的Forecast_proba（例如，predict_proba = 0.5和label = 1）？

如何评估xgboost分类模型的稳定性

问题描述投票：0回答：2

2个回答

最新问题

如何评估xgboost分类模型的稳定性

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2