我有一堆表格数据,我设法用它们训练随机森林、梯度增强分类器和深度学习模型(来自 fastai 的表格学习器)。我在结果中注意到,每个模型在特定标签上都比其他模型做得更好,每个模型都不同。我想知道是否可以将所有模型放入投票分类器(来自 sklearn 的模型)中。我对随机森林和梯度提升没有任何问题,但我没有找到任何关于将表格学习器放入投票分类器中的信息。可以这样做吗?
应该可以使用包装器:
from fastai.tabular.all import *
from sklearn.base import BaseEstimator, ClassifierMixin
class FastAITabularClassifier(BaseEstimator, ClassifierMixin):
def __init__(self, dls, layers, metrics):
self.dls = dls
self.layers = layers
self.metrics = metrics
self.learn = None
def fit(self, X, y):
# Convert X, y into a FastAI DataLoaders
dls = self.dls.new(X, y)
self.learn = tabular_learner(dls, layers=self.layers, metrics=self.metrics)
self.learn.fit_one_cycle(5) # You can adjust the training method and epochs
return self
def predict(self, X):
dl = self.dls.test_dl(X, with_labels=False)
preds, _ = self.learn.get_preds(dl=dl)
return preds.argmax(dim=1).numpy()
然后你应该能够将它与 VotingClassifier 一起使用:
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, VotingClassifier
# Define other models
rf = RandomForestClassifier(n_estimators=100)
gbm = GradientBoostingClassifier(n_estimators=100)
# Define FastAI model (assuming you already have a 'dls' and you set 'layers', 'metrics')
fastai_model = FastAITabularClassifier(dls, layers=[200,100], metrics=accuracy)
# Create voting classifier
voting_clf = VotingClassifier(estimators=[
('rf', rf),
('gbm', gbm),
('fastai', fastai_model)
], voting='soft')
# Fit the voting classifier
voting_clf.fit(X_train, y_train)