我正在尝试将 XGBoost 形状值转换为 SHAP 解释器对象。将[此处][1]的示例与内置 SHAP 库一起使用需要几天的时间(即使在二次采样数据集上),而 XGBoost 库则需要几分钟。然而。我想输出一个类似于[此处][2]示例中显示的蜂群图。
我的想法是,我可以使用 XGBoost 库来恢复形状值,然后使用 SHAP 库绘制它们,但蜂群图需要一个解释器对象。如何将我的 XGBoost 助推器对象转换为解释器对象?
这是我尝试过的:
import shap
booster = model.get_booster()
d_test = xgboost.DMatrix(X_test[0:100], y_test[0:100])
shap_values = booster.predict(d_test, pred_contribs=True)
shap.plots.beeswarm(shap_values)
返回:
TypeError: The beeswarm plot requires an `Explanation` object as the `shap_values` argument.
为了澄清,如果可能的话,我想用 xgboost 内置库生成的值创建解释器对象。避免 shap.explainer 或 shap.TreeExplainer 函数调用是一个优先事项,因为它们需要更长的时间(几天)而不是几分钟才能返回。 [1]:https://shap.readthedocs.io/en/latest/example_notebooks/tabular_examples/tree_based_models/Python%20Version%20of%20Tree%20SHAP.html [2]:https://shap.readthedocs.io/en/latest/example_notebooks/api_examples/plots/beeswarm.html#A-simple-beeswarm-summary-plot
您需要将从XGBoost模型获得的SHAP值转换为SHAP Explanation对象。解释对象是 SHAP 库中的标准格式,它不仅包括 SHAP 值,还包括附加信息,例如特征名称和基本值,如下所示:
import shap
# Assuming shap_values, X_test are defined as in your code
# Create an explainer with your model
explainer = shap.Explainer(booster, X_test[0:100])
# Alternatively, create the explainer using the TreeExplainer if the above line gives trouble
# explainer = shap.TreeExplainer(booster)
# Get the expected value (base value) - it's often the output value for the background dataset
expected_value = explainer.expected_value
# If your model is a multi-class model, you will have multiple expected values
if isinstance(expected_value, np.ndarray):
expected_value = expected_value[0]
# Create the SHAP Explanation object
shap_explanation = shap.Explanation(shap_values,
base_values=expected_value,
data=X_test.iloc[0:100], # assuming X_test is a DataFrame
feature_names=X_test.columns.tolist())
现在您已经有了 Explanation 对象,您现在可以使用它来创建蜂群图。
shap.plots.beeswarm(shap_explanation)
如果您正在构建一个
Explanation
对象(而不是像您在问题中所述的 Explainer
),那么您可以执行以下操作:
import xgboost as xgb
import shap
from sklearn.model_selection import train_test_split
X, y = shap.datasets.california()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
d_train = xgb.DMatrix(X_train, y_train)
d_test = xgb.DMatrix(X_test, y_test)
params = {"objective": "reg:squarederror", "tree_method": "hist", "device":"cuda"}
model = xgb.train(params, d_train, 100)
shap_values = model.predict(d_test, pred_contribs=True)
exp = shap.Explanation(shap_values[:,:-1], data = X_test, feature_names=X.columns)
shap.summary_plot(exp)