我正在为我的大学项目使用不同的投票方案为SVM集成实现自定义分类器。我的估算器代码:
svm_possible_args = {"C", "kernel", "degree", "gamma", "coef0", "shrinking", "probability", "tol", "cache_size",
"class_weight", "max_iter", "decision_function_shape", "break_ties"}
bagging_possible_args = {"n_estimators", "max_samples", "max_features", "bootstrap", "bootstrap_features",
"oob_score", "warm_start", "n_jobs"}
common_possible_args = {"random_state", "verbose"}
class SVMEnsemble(BaggingClassifier):
def __init__(self, kernel="linear", voting_method=None, **kwargs):
if voting_method not in {None, "hard", "soft"}:
raise ValueError(f"voting_method {voting_method} is not recognized.")
svm_args = dict()
bagging_args = dict()
for arg_name, arg_val in kwargs.items():
if arg_name in svm_possible_args:
svm_args[arg_name] = arg_val
elif arg_name in bagging_possible_args:
bagging_args[arg_name] = arg_val
elif arg_name in common_possible_args:
svm_args[arg_name] = arg_val
bagging_args[arg_name] = arg_val
else:
raise ValueError(f"argument {voting_method} is not recognized.")
probability = True if voting_method == "soft" else False
svm_args = dict() if not svm_args else svm_args
base_estimator = SVC(kernel=kernel, probability=probability, **svm_args)
super().__init__(base_estimator=base_estimator, **bagging_args)
self.voting_method = voting_method
def predict(self, X):
if self.voting_method in {None, "hard"}:
return super().predict(X)
elif self.voting_method == "soft":
probabilities = np.zeros((X.shape[0], self.classes_.shape[0]))
for estimator in self.estimators_:
estimator_probabilities = estimator.predict_proba(X)
probabilities += estimator_probabilities
return self.classes_[probabilities.argmax(axis=1)]
else:
raise ValueError(f"voting_method {self.voting_method} is not recognized.")
我想继承BaggingClassifier
的大多数功能并插入SVC
。用户应该能够同时指定SVM和装袋超参数,因此我已经使用循环和svm_possible_args
等来过滤传递给SVC
和BaggingClassifier
的参数。参数集几乎是可分离的(它们只有random_state
和verbose
相同,这不是问题)。
我正在尝试使用GridSearchCV
查找最佳超参数:
def get_best_ensemble(X_train, y_train):
parameters = {
"voting_method": ["hard", "soft"],
"max_samples": np.linspace(0.5, 1, 6, endpoint=True).round(1),
"max_features": [0.7, 0.8, 0.9, 1],
"n_estimators": [5, 10, 15],
"kernel": ["linear", "poly", "rbf", "sigmoid"],
"C": [0.01, 0.1, 0.5, 1, 10],
"gamma": [0.01, 0.1, 0.3, 0.6, 1]
}
model = SVMEnsemble()
grid = GridSearchCV(model, parameters, verbose=2, cv=5, n_jobs=-1)
grid.fit(X_train, y_train)
print("Best hyperparameters:")
print(grid.best_params_)
return grid.best_estimator_
我遇到以下错误:
ValueError: Invalid parameter C for estimator SVMEnsemble(kernel=None, voting_method=None). Check the list of available parameters with `estimator.get_params().keys()`.
使用print(model.get_params().keys())
,我得到dict_keys(['kernel', 'voting_method'])
。这是否意味着我必须在SVC
中为BaggingClassifier
的__init__
中明确列出SVMEnsemble
和GridSearchCV
的all参数,才能“看到”它们并可以正常工作?还是有更清洁的解决方案?
您可以覆盖get_params
和set_params
方法,或者将实际的SVM对象作为初始化参数。您需要做一些事情,以便当网格搜索尝试到set_params
时,实例内部的estimator
得到正确更新(不仅仅是实例中的参数;请注意,__init__
不会重新运行)。
关于简化继承的类参数发现有一些讨论,但这很棘手,并且不能解决第二个问题:https://github.com/scikit-learn/scikit-learn/issues/13555