如何使用sfInit和makeCluster类型“MPI” /消息传递在R /并行化簇上

问题描述 投票:0回答:1

我试图适应this R script for a speed test在群集上运行。

当使用sfInitmakecluster功能与类型"SOCK",那么脚本在集群上运行成功,但没有任何速度的提高 - 不像我的电脑上:当我改变detectcores()1,脚本的运行速度比用4个核慢得多。

我敢肯定,我需要改变,以"MPI",不过,为了使节点通信的内存明智彼此的类型。

但是:如果我这样做,脚本然后用下面的错误代码将停止:

Loading required package: Rmpi
Error: package or namespace load failed for ‘Rmpi’:
 .onLoad failed in loadNamespace() for 'Rmpi', details:
  call: dyn.load(file, DLLpath = DLLpath, ...)
  error: unable to load shared object '/cluster/sfw/R/3.5.1-gcc73-base/lib64/R/library/Rmpi/libs/Rmpi.so':
  libmpi.so.20: cannot open shared object file: No such file or directory
Failed to load required library: Rmpi for parallel mode MPI
Fallback to sequential execution
snowfall 1.84-6.1 initialized: sequential execution, one CPU.

我想:“一块蛋糕,容易”,并补充以下几行:

install.packages('Rmpi', repos = "http://cran.us.r-project.org",
dependencies = TRUE, lib = '/personalpath') install.packages('doMPI',
repos = "http://cran.us.r-project.org", dependencies = TRUE, lib = '/personalpath') library(topicmodels, lib.loc = '/personalpath')
library(Rmpi, lib.loc = '/personalpath')

这将导致安装成功,但是:

Error in library(Rmpi, lib.loc = "/personalpath") :
there is no package called ‘Rmpi’

1.如何安装这些包?

2.我是否真的需要安装它们或这是一个完全错误的做法?

任何帮助,不胜感激!我知道有几个问题在这里(见thisthisthis)。但我不熟悉Linux中的电话,更重要的是我没有这样的集群上的任何权利。所以,我需要拿出R中的解决方案...

所以..这是我的代码:

sfInit(parallel=TRUE, cpus=detectCores(), type="MPI")

cl <- makeCluster(detectCores(), type = "MPI")
registerDoSNOW(cl) 

sfExport('dtm_stripped', 'control_LDA_Gibbs')
sfLibrary(topicmodels)

clusterEvalQ(cl, library(topicmodels))
clusterExport(cl, c("dtm_stripped", "control_LDA_Gibbs"))

BASE <- system.time(best.model.BASE <<- lapply(seq, function(d){LDA(dtm_stripped, control = control_LDA_Gibbs, method ='Gibbs', d)}))
PLYR_S <- system.time(best.model.PLYR_S <<- llply(seq, function(d){LDA(dtm_stripped, control = control_LDA_Gibbs, method ='Gibbs', d)}, .progress = "text"))

wrapper <- function (d) topicmodels:::LDA(dtm_stripped, control = control_LDA_Gibbs, method ='Gibbs', d)
PARLAP <- system.time(best.model.PARLAP <<- parLapply(cl, seq, wrapper))
DOPAR <- system.time(best.model.DOPAR <<- foreach(i = seq, .export = c("dtm_stripped", "control_LDA_Gibbs"), .packages = "topicmodels", .verbose = TRUE) %dopar% (LDA(dtm_stripped, control = control_LDA_Gibbs, method ='Gibbs', k=i)))
SFLAPP <- system.time(best.model.SFLAPP <<- sfLapply(seq, function(d){topicmodels:::LDA(dtm_stripped, control = control_LDA_Gibbs, method ='Gibbs', d)})) 
SFCLU <- system.time(best.model.SFCLU <<- sfClusterApplyLB(seq, function(d){topicmodels:::LDA(dtm_stripped, control = control_LDA_Gibbs, method ='Gibbs', d)})) 
PLYRP <- system.time(best.model.PLYRP <<- llply(seq, function(d){topicmodels:::LDA(dtm_stripped, control = control_LDA_Gibbs, method ='Gibbs', d)}, .parallel = TRUE))

results_speedtest <- rbind(BASE, PLYR_S, PARLAP, DOPAR, SFLAPP, SFCLU, PLYRP)
print(results_speedtest)
r mpi cluster-computing snow snowfall
1个回答
0
投票

还有其他的方式R.并行也许这个链接将帮助,因为第二页解释说,这是什么类型的集群,如插座,MPI和叉做:https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf

否则,我也可以推荐寻找到包qazxsw POI,因为语法是很多像一个普通的for循环。需要注意的是一些平行化的包装并不适用于所有操作系统。

热门问题
推荐问题
最新问题