Scikit学习自定义估计器“估计器参数无效”错误

问题描述 投票:0回答:1

我正在为我的大学项目使用不同的投票方案为SVM集成实现自定义分类器。我的估算器代码:

svm_possible_args = {"C", "kernel", "degree", "gamma", "coef0", "shrinking", "probability", "tol", "cache_size",
                     "class_weight", "max_iter", "decision_function_shape", "break_ties"}

bagging_possible_args = {"n_estimators", "max_samples", "max_features", "bootstrap", "bootstrap_features",
                         "oob_score", "warm_start", "n_jobs"}

common_possible_args = {"random_state", "verbose"}


class SVMEnsemble(BaggingClassifier):
    def __init__(self, kernel="linear", voting_method=None, **kwargs):
        if voting_method not in {None, "hard", "soft"}:
            raise ValueError(f"voting_method {voting_method} is not recognized.")

        svm_args = dict()
        bagging_args = dict()
        for arg_name, arg_val in kwargs.items():
            if arg_name in svm_possible_args:
                svm_args[arg_name] = arg_val
            elif arg_name in bagging_possible_args:
                bagging_args[arg_name] = arg_val
            elif arg_name in common_possible_args:
                svm_args[arg_name] = arg_val
                bagging_args[arg_name] = arg_val
            else:
                raise ValueError(f"argument {voting_method} is not recognized.")

        probability = True if voting_method == "soft" else False
        svm_args = dict() if not svm_args else svm_args
        base_estimator = SVC(kernel=kernel, probability=probability, **svm_args)

        super().__init__(base_estimator=base_estimator, **bagging_args)
        self.voting_method = voting_method

    def predict(self, X):
        if self.voting_method in {None, "hard"}:
            return super().predict(X)
        elif self.voting_method == "soft":
            probabilities = np.zeros((X.shape[0], self.classes_.shape[0]))
            for estimator in self.estimators_:
                estimator_probabilities = estimator.predict_proba(X)
                probabilities += estimator_probabilities
            return self.classes_[probabilities.argmax(axis=1)]
        else:
            raise ValueError(f"voting_method {self.voting_method} is not recognized.")

我想继承BaggingClassifier的大多数功能并插入SVC。用户应该能够同时指定SVM和装袋超参数,因此我已经使用循环和svm_possible_args等来过滤传递给SVCBaggingClassifier的参数。参数集几乎是可分离的(它们只有random_stateverbose相同,这不是问题)。

我正在尝试使用GridSearchCV查找最佳超参数:

def get_best_ensemble(X_train, y_train):
    parameters = {
        "voting_method": ["hard", "soft"],

        "max_samples": np.linspace(0.5, 1, 6, endpoint=True).round(1),
        "max_features": [0.7, 0.8, 0.9, 1],
        "n_estimators": [5, 10, 15],

        "kernel": ["linear", "poly", "rbf", "sigmoid"],
        "C": [0.01, 0.1, 0.5, 1, 10],
        "gamma": [0.01, 0.1, 0.3, 0.6, 1]
    }

    model = SVMEnsemble()
    grid = GridSearchCV(model, parameters, verbose=2, cv=5, n_jobs=-1)
    grid.fit(X_train, y_train)

    print("Best hyperparameters:")
    print(grid.best_params_)

    return grid.best_estimator_

我遇到以下错误:

ValueError: Invalid parameter C for estimator SVMEnsemble(kernel=None, voting_method=None). Check the list of available parameters with `estimator.get_params().keys()`.

使用print(model.get_params().keys()),我得到dict_keys(['kernel', 'voting_method'])。这是否意味着我必须在SVC中为BaggingClassifier__init__中明确列出SVMEnsembleGridSearchCVall参数,才能“看到”它们并可以正常工作?还是有更清洁的解决方案?

python scikit-learn
1个回答
0
投票

您可以覆盖get_paramsset_params方法,或者将实际的SVM对象作为初始化参数。您需要做一些事情,以便当网格搜索尝试到set_params时,实例内部的estimator得到正确更新(不仅仅是实例中的参数;请注意,__init__不会重新运行)。

关于简化继承的类参数发现有一些讨论,但这很棘手,并且不能解决第二个问题:https://github.com/scikit-learn/scikit-learn/issues/13555

© www.soinside.com 2019 - 2024. All rights reserved.