根据各个树获取 XGBoost 预测

问题描述 投票:0回答:1

它可能与如何在 xgboost 中获取每棵树的预测? 重复,但该解决方案不再有效(可能是 XGBoost 库上的更改)。我的想法是以原始格式转储模型

model.get_booster().get_dump()
并在不同的平台中实现它(仅预测)。不过,我首先尝试用 python 来实现它。运行以下代码,使用所有单独的增强器进行预测并将它们组合起来,不会返回与
model.predict()
函数相同的结果。 有什么办法可以将
model.predict()
与助推器的组合相匹配吗?我错过了什么?

import numpy as np
import xgboost as xgb
from sklearn import datasets
from scipy.special import expit as sigmoid, logit as inverse_sigmoid

# Load data
iris = datasets.load_iris()
X, y = iris.data, (iris.target == 1).astype(int)

# Fit a model
model = xgb.XGBClassifier(
    n_estimators=10,
    max_depth=10,
    use_label_encoder=False,
    objective='binary:logistic'
)
model.fit(X, y)
booster_ = model.get_booster()

# Extract indivudual predictions
individual_preds = []
for tree_ in booster_:
    individual_preds.append(
        tree_.predict(xgb.DMatrix(X))
    )
individual_preds = np.vstack(individual_preds)

# Aggregated individual predictions to final predictions
indivudual_logits = inverse_sigmoid(individual_preds)
final_logits = indivudual_logits.sum(axis=0)
final_preds = sigmoid(final_logits)

# Verify correctness
xgb_preds = booster_.predict(xgb.DMatrix(X))
np.testing.assert_almost_equal(final_preds, xgb_preds)

AssertionError:数组不几乎等于小数点后 7 位 不匹配的元素:150 / 150 (100%) 最大绝对差:0.90511334 最大相对差值:0.99744916 x: 数组([7.4847587e-05, 7.4847587e-05, 7.4847587e-05, 7.4847587e-05, 7.4847587e-05、7.4847587e-05、7.4847587e-05、7.4847587e-05、 7.4847587e-05, 7.4847587e-05, 7.4847587e-05, 7.4847587e-05,... y: 数组([0.0293127, 0.0293127, 0.0293127, 0.0293127, 0.0293127, 0.0293127, 0.0293127, 0.0293127, 0.0293127, 0.0293127, 0.0293127, 0.0293127, 0.0293127, 0.0293127, 0.0293127, 0.0293127, 0.0293127, 0.0293127,...

python machine-learning xgboost
1个回答
0
投票

问题似乎可能与您如何聚合各个树的预测有关。您应该考虑对它们进行平均,而不是直接对 logits 求和。由于您正在处理二元分类,因此平均 logits 更有意义。下面的代码可能对你有帮助:

import numpy as np
import xgboost as xgb
from sklearn import datasets
from scipy.special import expit as sigmoid, logit as inverse_sigmoid

# Load data
iris = datasets.load_iris()
X, y = iris.data, (iris.target == 1).astype(int)

# Fit a model
model = xgb.XGBClassifier(
    n_estimators=10,
    max_depth=10,
    use_label_encoder=False,
    objective='binary:logistic'
)
model.fit(X, y)
booster_ = model.get_booster()

# Extract individual predictions
individual_preds = []
for tree_ in booster_:
    individual_preds.append(
        tree_.predict(xgb.DMatrix(X))
    )
individual_preds = np.vstack(individual_preds)

# Aggregated individual predictions to final predictions
individual_logits = inverse_sigmoid(individual_preds)
final_logits = np.mean(individual_logits, axis=0)  # Use mean instead of sum
final_preds = sigmoid(final_logits)

# Verify correctness
xgb_preds = booster_.predict(xgb.DMatrix(X))
np.testing.assert_almost_equal(final_preds, xgb_preds)

通过取对数的平均值,您将确保预测与模型的预测函数更加一致。

© www.soinside.com 2019 - 2024. All rights reserved.