我最近开始使用
LongituRF
包。
我正在将其拟合到一些数据,并且我想使用 iml
包来评估变量的重要性。
我已经使用过iml
,并且我喜欢它的特性。但是,当我使用 LongituRF
时,我无法评估变量的重要性。
在下面的代码中,我创建了一些数据,并将
REEMforest
包中的 MERF
和 LongituRF
安装到数据中。然后我尝试评估变量的重要性,但收到此错误消息:
初始化时出错(...): 请使用 y 目标向量调用 Predictor$new()。
很明显,
Predictor$new()
在我的代码中没有正确定义。
在示例代码的末尾,我还为数据添加了
randomForest
并评估变量重要性。正如你所看到的,它在那里工作得很好。
你知道我该如何解决这个问题吗?
示例代码:
# libraries ---------------------------------------------------------------
install.packages("LongituRF")
# #(S)REEMforest is an adaptation of the random forest regression method to longitudinal data introduced by Capitaine et. al. (2020) <doi:10.1177/0962280220946080>
library(LongituRF)
install.packages("iml")
# for assessing variable importance
library(iml)
# -------------------------------------------------------------------------
# a function that creates some data for me
dgp_math_s <- function(ni,nj, RI_sd, sigma2 = 1,
gamma00 = 0, gamma01 = 0, gamma10 = 0, gamma02 = 0, gamma20 = 0){
dgp_grid <- expand.grid(
ni = 1:ni,
nj = 1:nj,
studying = NA,
atmosphere = NA,
motivation = NA,
math_score = NA,
Rij = NA,
U0j = NA
)
dgp_grid$atmosphere <- rep(rbinom(nj,1,0.5), each = length(1:ni))
#create a random factorial level 2 predictor, same value for the whole cluster
dgp_grid$U0j <- rep(rnorm(nj, mean = 3, sd = RI_sd), each = ni)
#create level 2 residual
dgp_grid$Rij <- rnorm(ni*nj, mean = 3, sd = sqrt(sigma2))
# create level 1 residual with sigma2 = 1
dgp_grid$studying <-sample(0:5, ni*nj, replace = TRUE)
# create level 1 explanatory/predictor variable (draw from standard normal)
dgp_grid$motivation <-sample(0:5, ni*nj, replace = TRUE)
# create level 1 explanatory/predictor variable (draw from standard normal)
dgp_grid$math_score <-
gamma00 + gamma10 * dgp_grid$studying + gamma20 * dgp_grid$motivation + gamma01 * dgp_grid$atmosphere +
dgp_grid$U0j + dgp_grid$Rij
#create math_score
return(dgp_grid)
}
# -------------------------------------------------------------------------
dgp_math<-dgp_math_s(ni = 20, nj = 20, RI_sd = 2, gamma10 = 0, gamma01 = 0)
#create data
# Fitting REEMforest ------------------------------------------------------
predictors <- dgp_math[, c("studying", "atmosphere","motivation")]
outcome <- dgp_math$math_score
outcome <- as.vector(outcome)
SREEMF <- LongituRF::REEMforest(X=predictors,Y=dgp_math$math_score,Z=matrix(rep(1, nrow(dgp_math)), ncol = 1),
id=dgp_math$nj,time=dgp_math$ni,ntree=100,sto="none", mtry = 2)
#Fitting REEMforest
# Fitting MERF ------------------------------------------------------------
MERF <- LongituRF::MERF(X=predictors,Y=dgp_math$math_score,Z=matrix(rep(1, nrow(dgp_math)), ncol = 1),
id=dgp_math$nj,time=dgp_math$ni,ntree=100,sto="none", mtry = 2)
#Fitting MERF
# Assessing variable importance using "iml" -------------------------------
pred <- Predictor$new(SREEMF$forest, data = cbind(predictors, dgp_math$math_score))
imp <- iml::FeatureImp$new(pred, loss = "mse", compare = "difference")$results
# Variable importance of REEMforest
pred <- Predictor$new(MERF$forest, data = cbind(predictors, dgp_math$math_score))
imp <- iml::FeatureImp$new(pred, loss = "mse", compare = "difference")$results
# Variable importance of MERF
# example using CARTforest ------------------------------------------------
install.packages("randomForest")
library(randomForest)
mybreimanforest <- randomForest::randomForest(math_score ~ studying + motivation + atmosphere, data = dgp_math, ntree= 500)
## Variable importance using iml -------------------------------------------
pred_breimanforest <- Predictor$new(mybreimanforest, data = dgp_math)
imp_breimanforest <- FeatureImp$new(pred_breimanforest, loss = "mse", compare = "difference")$results
#this works for the randomforest
问题解决了吗?我还有纵向数据,并尝试将特征选择应用于我的数据。