使用并行处理优化包randomForest速度

Question

我如何获得以下代码（替代代码也会很棒），以便在并行方法中使用多个核心来提高回归方程的randomForest分析的速度？

#Parallelized Random Forest Model
RFcores <- detectCores()/3 + 4
RFcores
RFtrees <- 1000/RFcores
RFtrees
cl <- makeCluster(RFcores)
registerDoParallel(cl)
timer <- proc.time()
form <- as.formula(paste(a, "~", b))
fit <- foreach(ntree = rep(RFtrees, RFcores), .combine = gtable_combine, .packages = 'randomForest') %dopar% 
   { 
                randomForest(form, data = maindf, mtry = 4, 
                             keep.forest = FALSE, nodesize = 10000, do.trace = TRUE, maxnodes = 5, 
                            improve = 0.01, doBest = TRUE, importance = TRUE, ntree = ntree)}
proc.time() - timer
stopCluster(cl)
}

我一直在.combine argument函数中得到与foreach相关的以下错误。

error calling combine function:
<simpleError in align_2(x, y, along = along, join = join): Both gtables must have names along dimension to be aligned>

我期待着对这个问题的任何想法。

Answer 1

看看Parallel Statistical Computing with R: An Illustration on Two Architectures，它提供了两种方法来并行化随机森林计算：mclapply和pbdMPI。

使用并行处理优化包randomForest速度

问题描述投票：0回答：1

1个回答

最新问题

使用并行处理优化包randomForest速度

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1