在Python中的ElasticNetCV在R中的vc cvglmnet。

问题描述 投票:1回答:1

有谁尝试过通过在Python中实现ElasticNetCV和在R中实现cvglmnet来实现同样的结果吗?我已经找到了如何在Python中的ElasticNet和R中的glmnet上实现它,但无法用交叉验证方法重现......

在Python中重现的步骤。

预处理:

from sklearn.datasets import make_regression
from sklearn.linear_model import ElasticNet, ElasticNetCV
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import pandas as pd

data = make_regression(
    n_samples=100000,
    random_state=0
)
X, y = data[0], data[1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.25)

pd.DataFrame(X_train).to_csv('X_train.csv', index=None)
pd.DataFrame(X_test).to_csv('X_test.csv', index=None)
pd.DataFrame(y_train).to_csv('y_train.csv', index=None)
pd.DataFrame(y_test).to_csv('y_test.csv', index=None)

模型。

model = ElasticNet(
    alpha=1.0,
    l1_ratio=0.5,
    fit_intercept=True,
    normalize=True,
    precompute=False,
    max_iter=100000,
    copy_X=True,
    tol=0.0000001,
    warm_start=False,
    positive=False,
    random_state=0,
    selection='cyclic'
)

model.fit(
    X=X_train,
    y=y_train
)

y_pred = model.predict(
    X=X_test
)

print(
    mean_squared_error(
        y_true=y_test,
        y_pred=y_pred
    )
)

输出:42399.4981518 42399.49815189786

model = ElasticNetCV(
    l1_ratio=0.5,
    eps=0.001,
    n_alphas=100,
    alphas=None,
    fit_intercept=True,
    normalize=True,
    precompute=False,
    max_iter=100000,
    tol=0.0000001,
    cv=10,
    copy_X=True,
    verbose=0,
    n_jobs=-1,
    positive=False,
    random_state=0,
    selection='cyclic'
)

model.fit(
    X=X_train,
    y=y_train
)

y_pred = model.predict(
    X=X_test
)

print(
    mean_squared_error(
        y_true=y_test,
        y_pred=y_pred
    )
)

产量:39354.729173913176 39354.729173913176

在R中重现的步骤。

Preprocssing:

library(glmnet)
X_train <- read.csv(path)
X_test <- read.csv(path)
y_train <- read.csv(path)
y_test <- read.csv(path)
fit <- glmnet(x=as.matrix(X_train), y=as.matrix(y_train))
y_pred <- predict(fit, newx = as.matrix(X_test))
y_error = y_test - y_pred
mean(as.matrix(y_error)^2)

输出: 42399.5

fit <- cv.glmnet(x=as.matrix(X_train), y=as.matrix(y_train))
y_pred <- predict(fit, newx = as.matrix(X_test))
y_error <- y_test - y_pred
mean(as.matrix(y_error)^2)

输出:42399.37.00207

python r machine-learning regression glmnet
1个回答
1
投票

非常感谢你提供的例子,我是在笔记本电脑上,所以我不得不将样本数量减少到100个。

from sklearn.datasets import make_regression
from sklearn.linear_model import ElasticNet, ElasticNetCV
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import pandas as pd

data = make_regression(
    n_samples=100,
    random_state=0
)
X, y = data[0], data[1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.25)

当你用glmnet进行预测时,你需要指定lambda,否则它将返回所有lambdas的预测,所以在R中,当你运行cv:

fit <- glmnet(x=as.matrix(X_train), y=as.matrix(y_train))
y_pred <- predict(fit, newx = as.matrix(X_test))
dim(y_pred)
[1] 25 89

当你运行cv.glmnet时, 它会从cv中选择最好的lambda, 也就是lambda. 1se, 所以它只给你一组, 也就是你想要的rmse:

fit <- cv.glmnet(x=as.matrix(X_train), y=as.matrix(y_train))
y_pred <- predict(fit, newx = as.matrix(X_test))
y_error <- y_test - y_pred
mean(as.matrix(y_error)^2)
[1] 22.03504

dim(y_error)
[1] 25  1
fit$lambda.1se
[1] 1.278699

如果我们在glmnet中选择最接近cv.glmnet所选的lambda, 你就会得到正确范围内的东西:

fit <- glmnet(x=as.matrix(X_train), y=as.matrix(y_train))
sel = which.min(fit$lambda-1.278699)
y_pred <- predict(fit, newx = as.matrix(X_test))[,sel]
mean((y_test - y_pred)^2)
dim(y_error)

mean(as.matrix((y_test - y_pred)^2))
[1] 20.0775

在我们和sklearn比较之前,我们需要确保我们测试的lambdas范围是一样的。

L = c(0.01,0.05,0.1,0.2,0.5,1,2)
fit <- cv.glmnet(x=as.matrix(X_train), y=as.matrix(y_train),lambda=L)
y_pred <- predict(fit, newx = as.matrix(X_test))
y_error <- y_test - y_pred
mean(as.matrix(y_error)^2)
[1] 0.003065869

所以我们希望在0.003065869的范围内得到一些结果。我们用相同的lambda运行它,lambda在ElasticNet中被称为alpha。glmnet中的alpha实际上就是你的L1_ratio,请参见 小品. 而归一化选项应该设置为False,因为。

如果为True,回归者X将在回归前被归一化,方法是减去平均数,然后除以l2-norm. 如果你想标准化,请在调用fit之前使用sklearn.preprocessing.StandardScaler对normalize=False的估计器进行标准化。

所以我们就用CV来运行。

model = ElasticNetCV(l1_ratio=1,fit_intercept=True,alphas=[0.01,0.05,0.1,0.2,0.5,1,2])
model.fit(X=X_train,y=y_train)
y_pred = model.predict(X=X_test)
mean_squared_error(y_true=y_test,y_pred=y_pred)

0.0018007824874741929

它和R的结果差不多.

而如果你为ElasticNet做,你会得到同样的结果,如果你指定alpha。

© www.soinside.com 2019 - 2024. All rights reserved.