我正在使用以下 MultiOutputRegressor:
from xgboost import XGBRegressor
from sklearn.multioutput import MultiOutputRegressor
#Define the estimator
estimator = XGBRegressor(
objective = 'reg:squarederror'
)
# Define the model
my_model = MultiOutputRegressor(estimator = estimator, n_jobs = -1).fit(X_train, y_train)
我想使用验证集来评估我的 XGBRegressor 的性能,但是我相信
MultiOutputRegressor
不支持将 eval_set
传递给拟合函数。
在这种情况下如何使用验证集?是否有任何解决方法可以调整 XGBRegressor 以具有多个输出?
您可以尝试像这样编辑
fit
对象的 MultiOutputRegressor
方法:
from sklearn.utils.validation import _check_fit_params
from sklearn.base import is_classifier
from sklearn.utils.fixes import delayed
from joblib import Parallel
from sklearn.multioutput import _fit_estimator
class MyMultiOutputRegressor(MultiOutputRegressor):
def fit(self, X, y, sample_weight=None, **fit_params):
""" Fit the model to data.
Fit a separate model for each output variable.
Parameters
----------
X : {array-like, sparse matrix} of shape (n_samples, n_features)
Data.
y : {array-like, sparse matrix} of shape (n_samples, n_outputs)
Multi-output targets. An indicator matrix turns on multilabel
estimation.
sample_weight : array-like of shape (n_samples,), default=None
Sample weights. If None, then samples are equally weighted.
Only supported if the underlying regressor supports sample
weights.
**fit_params : dict of string -> object
Parameters passed to the ``estimator.fit`` method of each step.
.. versionadded:: 0.23
Returns
-------
self : object
"""
if not hasattr(self.estimator, "fit"):
raise ValueError("The base estimator should implement"
" a fit method")
X, y = self._validate_data(X, y,
force_all_finite=False,
multi_output=True, accept_sparse=True)
if is_classifier(self):
check_classification_targets(y)
if y.ndim == 1:
raise ValueError("y must have at least two dimensions for "
"multi-output regression but has only one.")
if (sample_weight is not None and
not has_fit_parameter(self.estimator, 'sample_weight')):
raise ValueError("Underlying estimator does not support"
" sample weights.")
fit_params_validated = _check_fit_params(X, fit_params)
[(X_test, Y_test)] = fit_params_validated.pop('eval_set')
self.estimators_ = Parallel(n_jobs=self.n_jobs)(
delayed(_fit_estimator)(
self.estimator, X, y[:, i], sample_weight,
**fit_params_validated, eval_set=[(X_test, Y_test[:, i])])
for i in range(y.shape[1]))
return self
然后将
eval_set
传递给 fit
方法:
fit_params = dict(
eval_set=[(X_test, Y_test)],
early_stopping_rounds=10
)
model.fit(X_train, Y_train, **fit_params)
通过进行一些小的编辑/更改,@itamar-kanter 的解决方案对我有用。评论有点长,所以最好写成答案而不是评论。
注意到@itamar-kanter的解决方案可能从darts.utils.multioutput.MultiOutputRegressor的fit()函数中获得灵感:unit8co.github.io/darts/_modules/darts/utils/multioutput.html
这一行有一个错字,[(X_test, Y_test)] = fit_params_validated.pop('eval_set'),应该是这样的:
[X_test, Y_test] = fit_params_validated.pop('eval_set')
即使用 [X_test, Y_test] 或 (X_test, Y_test) 提取验证集的训练和测试数据。
或者,可以在 darts.utils.multioutput.MultiOutputRegressor 中使用与 fix() 相同的语法:
eval_set = fit_params_validated.pop("eval_set")
self.estimators_ = Parallel(n_jobs=self.n_jobs)(
delayed(_fit_estimator)(
self.estimator,
X,
y[:, i],
sample_weight,
# eval set may be a list (for XGBRegressor), in which case we have to keep it as a list
eval_set=[(eval_set[0][0], eval_set[0][1][:, i])]
if isinstance(eval_set, list)
else (eval_set[0], eval_set[1][:, i]),
**fit_params_validated
)
for i in range(y.shape[1])
导入Pallel和延迟函数的正确方法应该与darts.utils.multioutput.MultiOutputRegressor相同:
try:
# delayed was moved from sklearn.utils.fixes to sklearn.utils.parallel in v1.3
from sklearn.utils.parallel import Parallel, delayed
except ImportError:
from joblib import Parallel
from sklearn.utils.fixes import delayed
这里,使用try ... except ...导入Pallel和delayed可以避免来自parallel.py的不必要的UserWarning:“sklearn.utils.parallel.delayed应该与sklearn.utils.parallel.Parallel一起使用以使得可以将当前线程的 scikit-learn 配置传播到 joblib 工作人员”。
同时,使用sklearn的内部函数“from sklearn.utils.parallel import Parallel,delayed”,而不是“from joblib import Parallel,delayed”,可以使ML训练更快。我猜 sklearn 的内部实现有一些独特的功能,可以更好地与并行 sklearn ML 模型配合使用?
最后,@itamar-kanter 代码中的导入部分缺少“from sklearn.utils.multiclass import check_classification_targets”?