我正在寻找最佳的超参数设置,我意识到我可以在MLR中以两种方式做到这一点。基准功能和重采样功能。两者有什么区别?
如果要通过基准测试,我可以比较多个模型,并提取调整后的参数,这比重采样更具优势。相反,如果我使用重采样,则一次只能调整一个模型,而且我还注意到我的CPU飞速发展。
我应该如何以及何时使用另一种?
data(BostonHousing, package = "mlbench")
BostonHousing$chas <- as.integer(levels(BostonHousing$chas))[BostonHousing$chas]
library('mlr')
library('parallel')
library("parallelMap")
# ---- define learning tasks -------
regr.task = makeRegrTask(id = "bh", data = BostonHousing, target = "medv")
# ---- tune Hyperparameters --------
set.seed(1234)
# Define a search space for each learner'S parameter
ps_xgb = makeParamSet(
makeIntegerParam("nrounds",lower=5,upper=50),
makeIntegerParam("max_depth",lower=3,upper=15),
# makeNumericParam("lambda",lower=0.55,upper=0.60),
# makeNumericParam("gamma",lower=0,upper=5),
makeNumericParam("eta", lower = 0.01, upper = 1),
makeNumericParam("subsample", lower = 0, upper = 1),
makeNumericParam("min_child_weight",lower=1,upper=10),
makeNumericParam("colsample_bytree",lower = 0.1,upper = 1)
)
# Choose a resampling strategy
rdesc = makeResampleDesc("CV", iters = 5L)
# Choose a performance measure
meas = rmse
# Choose a tuning method
ctrl = makeTuneControlRandom(maxit = 30L)
# Make tuning wrappers
tuned.lm = makeLearner("regr.lm")
tuned.xgb = makeTuneWrapper(learner = "regr.xgboost", resampling = rdesc, measures = meas,
par.set = ps_xgb, control = ctrl, show.info = FALSE)
# -------- Benchmark experiements -----------
# Four learners to be compared
lrns = list(tuned.lm, tuned.xgb)
#setup Parallelization
parallelStart(mode = "socket", #multicore #socket
cpus = detectCores(),
# level = "mlr.tuneParams",
mc.set.seed = TRUE)
# Conduct the benchmark experiment
bmr = benchmark(learners = lrns,
tasks = regr.task,
resamplings = rdesc,
measures = rmse,
keep.extract = T,
models = F,
show.info = F)
parallelStop()
# ------ Extract HyperParameters -----
bmr_hp <- getBMRTuneResults(bmr)
bmr_hp$bh$regr.xgboost.tuned[[1]]
res <-
resample(
tuned.xgb,
regr.task,
resampling = rdesc,
extract = getTuneResult, #getFeatSelResult, getTuneResult
show.info = TRUE,
measures = meas
)
res$extract
如果要通过基准测试,我可以比较多个模型,并且提取调整后的参数,这比重采样更具优势。
您也可以使用resample()
执行此操作。
[benchmark()
只是resample()
的包装,可以更轻松地对多个任务/学习者/重采样进行实验。