使用拟合正则化模型预测新数据

问题描述 投票:0回答:0

我正在对需要使用正则化的 python 多元回归模型进行编程。我正在使用 sklearn 函数 Ridge、Lasso、ElasticNet y HuberRegressor 和 GridSearchCV 来找到最佳拟合参数,然后我提取最佳参数(代码的最后 3 行)。但是现在,对于拟合模型,我有两个主要疑虑:

  1. 我如何仅使用拟合参数预测新数据?它与OLS线性回归相同,我将系数与数据相乘吗? 注意:我需要将拟合模型的信息存储在 SQL 中,然后在其他笔记本中使用它。
  2. 我必须在执行此操作之前对数据进行归一化吗?我在拟合模型之前对数据进行归一化(Normalize=True)。为此,我是否需要存储用于拟合模型的数据的均值和标准差,然后用它对新数据进行归一化?

Python的脚本

for Reg_model in ['Ridge','Lasso','ElasticNet','HuberRegressor']: #,'SCAD'
                            if Reg_model == 'Ridge':
                                #I need normalize data to do ridge, will normalize=True
                                model = Ridge(normalize = True)
                                param_grid = {'alpha': np.logspace(-5, 3, 10)}
                                model = GridSearchCV(estimator  = model, param_grid = param_grid,scoring = 'neg_root_mean_squared_error',cv=time_split,verbose= 0, refit=True)
                                #We cand add this to GridSearchCV   n_jobs     = multiprocessing.cpu_count() - 1, return_train_score = True

                            elif Reg_model == 'Lasso':
                                #I need normalize data to do Lasso, will normalize=True
                                model = Lasso(normalize = True)
                                param_grid = {'alpha': np.logspace(-5, 3, 10)}
                                model = GridSearchCV(estimator  = model, param_grid = param_grid,scoring = 'neg_root_mean_squared_error',cv=time_split,verbose= 0, refit=True)
                                #We cand add this to GridSearchCV   n_jobs     = multiprocessing.cpu_count() - 1, return_train_score = True

                            elif Reg_model == 'ElasticNet':
                                #I need normalize data to do ElasticNet, will normalize=True
                                model = ElasticNet(normalize = True)
                                param_grid = {'alpha': np.logspace(-5, 3, 10), 'l1_ratio': np.logspace(0, 3, 10)}
                                model = GridSearchCV(estimator  = model, param_grid = param_grid,scoring = 'neg_root_mean_squared_error',cv=time_split,verbose= 0, refit=True)
                                #We cand add this to GridSearchCV   n_jobs     = multiprocessing.cpu_count() - 1, return_train_score = True

                            elif Reg_model == 'HuberRegressor':
                                #I need normalize data to do HubberRegressor, will normalize=True
                                model = HuberRegressor()
                                param_grid = {'alpha': np.logspace(-5, 3, 10), 'epsilon': np.logspace(0, 3, 10)}
                                model = GridSearchCV(estimator  = model, param_grid = param_grid,scoring = 'neg_root_mean_squared_error',cv=time_split,verbose= 0, refit=True)
                                #We cand add this to GridSearchCV   n_jobs     = multiprocessing.cpu_count() - 1, return_train_score = True
                                #print(model.outliers_) This Attribute belongs to HuberRegressor
                               
                            model.fit(X = X_train, y = y_train)
print(model.best_estimator_)
print(model.best_estimator_.coef_)
print(model.best_estimator_.intercept_)
python scikit-learn regression normalization regularized
© www.soinside.com 2019 - 2024. All rights reserved.