scikit-learn中的目标转换和特征选择

问题描述 投票:3回答:2

我正在使用RFECV在scikit-learn中进行功能选择。我想将简单线性模型(X,y)的结果与对数转换模型(使用X, log(y))的结果进行比较

简单模型RFECVcross_val_score提供相同的结果(我们需要比较所有特征交叉验证的平均得分与所有特征的RFECV得分:0.66 = 0.66,没问题,结果是可靠)

日志模型问题:似乎RFECV没有提供转换y的方法。在这种情况下,分数是0.550.53。不过,这是完全可以预期的,因为我必须手动应用np.log以适合数据:log_seletor = log_selector.fit(X,np.log(y))。 r2分数适用于y = log(y),没有inverse_func,而我们需要的是一种将模型拟合到log(y_train)并使用exp(y_test)计算分数的方法。或者,如果尝试使用TransformedTargetRegressor,则会收到代码中显示的错误:分类器不公开“ coef_”或“ feature_importances_”属性

如何解决该问题并确保功能选择过程可靠?

from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFECV
from sklearn import linear_model
from sklearn.model_selection import cross_val_score
from sklearn.compose import TransformedTargetRegressor
import numpy as np

X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
estimator = linear_model.LinearRegression()
log_estimator = TransformedTargetRegressor(regressor=linear_model.LinearRegression(),
                                                func=np.log,
                                                inverse_func=np.exp)
selector = RFECV(estimator, step=1, cv=5, scoring='r2')
selector = selector.fit(X, y)
###
# log_selector = RFECV(log_estimator, step=1, cv=5, scoring='r2')
# log_seletor = log_selector.fit(X,y) 
# #RuntimeError: The classifier does not expose "coef_" or "feature_importances_" attributes
###
log_selector = RFECV(estimator, step=1, cv=5, scoring='r2')
log_seletor = log_selector.fit(X,np.log(y))

print("**Simple Model**")
print("RFECV, r2 scores: ", np.round(selector.grid_scores_,2))
scores = cross_val_score(estimator, X, y, cv=5)
print("cross_val, mean r2 score: ", round(np.mean(scores),2), ", same as RFECV score with all features") 
print("no of feat: ", selector.n_features_ )

print("**Log Model**")
log_scores = cross_val_score(log_estimator, X, y, cv=5)
print("RFECV, r2 scores: ", np.round(log_selector.grid_scores_,2))
print("cross_val, mean r2 score: ", round(np.mean(log_scores),2)) 
print("no of feat: ", log_selector.n_features_ )

输出:

**Simple Model**
RFECV, r2 scores:  [0.45 0.6  0.63 0.68 0.68 0.69 0.68 0.67 0.66 0.66]
cross_val, mean r2 score:  0.66 , same as RFECV score with all features
no of feat:  6

**Log Model**
RFECV, r2 scores:  [0.39 0.5  0.59 0.56 0.55 0.54 0.53 0.53 0.53 0.53]
cross_val, mean r2 score:  0.55
no of feat:  3
python scikit-learn cross-validation feature-selection rfe
2个回答
0
投票

也许this article将帮助您解决有关以下问题的错误:分类器不公开“ coef_”或“ feature_importances_”属性


0
投票

此问题的一种解决方法是创建自己的coef_属性并公开它。因此,基本上,您需要修改TransformedTargetRegressor类并向其添加coef_属性,您可以找到修改后的代码_target.py here,并且可以使用相同的代码。我已经运行了您的代码并显示了示例输出。

from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFECV
from sklearn import linear_model
from sklearn.model_selection import cross_val_score
from sklearn.compose import TransformedTargetRegressor
import numpy as np


X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
estimator = linear_model.LinearRegression()
log_estimator = TransformedTargetRegressor(regressor=LinearRegression(),
                                                func=np.log,
                                                inverse_func=np.exp)
selector = RFECV(estimator, step=1, cv=5, scoring='r2')
selector = selector.fit(X, y)
log_selector = RFECV(log_estimator, step=1, cv=5, scoring='r2')
log_seletor = log_selector.fit(X,y) 

print("**Simple Model**")
print("RFECV, r2 scores: ", np.round(selector.grid_scores_,2))
scores = cross_val_score(estimator, X, y, cv=5)
print("cross_val, mean r2 score: ", round(np.mean(scores),2), ", same as RFECV score with all features") 
print("no of feat: ", selector.n_features_ )

print("**Log Model**")
log_scores = cross_val_score(log_estimator, X, y, cv=5)
print("RFECV, r2 scores: ", np.round(log_selector.grid_scores_,2))
print("cross_val, mean r2 score: ", round(np.mean(log_scores),2)) 
print("no of feat: ", log_selector.n_features_ )

样本输出:

**Simple Model**
RFECV, r2 scores:  [0.45 0.6  0.63 0.68 0.68 0.69 0.68 0.67 0.66 0.66]
cross_val, mean r2 score:  0.66 , same as RFECV score with all features
no of feat:  6
**Log Model**
RFECV, r2 scores:  [0.41 0.51 0.59 0.59 0.58 0.56 0.54 0.53 0.55 0.55]
cross_val, mean r2 score:  0.55
no of feat:  4

希望这会有所帮助!

© www.soinside.com 2019 - 2024. All rights reserved.