了解 doSNOW 和 foreach 的并行化,我是否必须将对象从主环境导出到每个“R”核心/会话?

问题描述 投票:0回答:1

我有一个关于 doSNOW 和 foreach 并行化的问题, 我看过许多关于并行化的教程(也使用其他包),这些教程使用 clusterExport() 函数将对象从主环境传递到每个“R”核心/会话。 我想知道这是否也应该用 doSNOW 和 foreach 来完成.. 我认为这是没有必要的,但我想与比我对并行处理更有信心的人仔细检查。

例如在下面的示例中,我按大陆分割世界,并将每个大陆与空间点数据框相交。我检查了系统时间并将其与简单的 for 循环进行了比较,似乎代码要快得多,而无需导出大陆列表(Listsplit)和空间点(点)。

您能给我反馈吗?

我利用这篇文章来询问您如何检查并行化是否在 Windows 和 Linux 系统中正确实现,因为任务管理器和 top 命令并没有提供真正的信息。鉴于它们提供了有关 CPU 和内存使用情况的一般信息。理想情况下,我希望查看实际使用的核心。

谢谢你

library("doSNOW") # Parallelization 
#> Warning: il pacchetto 'doSNOW' è stato creato con R versione 4.3.2
#> Caricamento del pacchetto richiesto: foreach
#> Warning: il pacchetto 'foreach' è stato creato con R versione 4.3.2
#> Caricamento del pacchetto richiesto: iterators
#> Warning: il pacchetto 'iterators' è stato creato con R versione 4.3.2
#> Caricamento del pacchetto richiesto: snow
#> Warning: il pacchetto 'snow' è stato creato con R versione 4.3.2
library("foreach") # Parallelization 
library("tmap") # World shapefile
#> Warning: il pacchetto 'tmap' è stato creato con R versione 4.3.2
#> The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
#> which was just loaded, will retire in October 2023.
#> Please refer to R-spatial evolution reports for details, especially
#> https://r-spatial.org/r/2023/05/15/evolution4.html.
#> It may be desirable to make the sf package available;
#> package maintainers should consider adding sf to Suggests:.
#> The sp package is now running under evolution status 2
#>      (status 2 uses the sf package in place of rgdal)
#> Breaking News: tmap 3.x is retiring. Please test v4, e.g. with
#> remotes::install_github('r-tmap/tmap')
library("ggplot2") # Plot
library("sf") # Work with spatial data
#> Warning: il pacchetto 'sf' è stato creato con R versione 4.3.2
#> Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE
library("dplyr") # Data manipulation
#> Warning: il pacchetto 'dplyr' è stato creato con R versione 4.3.2
#> 
#> Caricamento pacchetto: 'dplyr'
#> I seguenti oggetti sono mascherati da 'package:stats':
#> 
#>     filter, lag
#> I seguenti oggetti sono mascherati da 'package:base':
#> 
#>     intersect, setdiff, setequal, union


# Data 
data(World) # World Shapefile 
points<-st_sample(World, size=1000)


# Plot 
ggplot() +
  geom_sf(data=World)+
  geom_sf(data=points, colour = "red", size = 0.5)+
  coord_sf(xlim=c(st_bbox(World)[1],st_bbox(World)[3]),
           ylim=c(st_bbox(World)[2],st_bbox(World)[4]))




Listsplit<-World |> group_split(continent)

cl <- makeSOCKcluster(8)
registerDoSNOW(cl)

pb <- txtProgressBar(max = 8, style = 3)
#>   |                                                                              |                                                                      |   0%
progress <- function(n) setTxtProgressBar(pb, n)
opts <- list(progress = progress)


# Parallel function 
system.time({
  
  parallelfunction<-foreach(i=1:length(Listsplit), .packages = c("sf"),
                            .options.snow = opts) %dopar% {
    
    st_intersection(points,Listsplit[[i]])
  }
  
}
)
#>   |                                                                              |=========                                                             |  12%  |                                                                              |==================                                                    |  25%  |                                                                              |==========================                                            |  38%  |                                                                              |===================================                                   |  50%  |                                                                              |============================================                          |  62%  |                                                                              |====================================================                  |  75%  |                                                                              |=============================================================         |  88%  |                                                                              |======================================================================| 100%
#>    utente   sistema trascorso 
#>      0.05      0.03      4.64

stopCluster(cl)


# Loop 

results<-list()

system.time({
  
  for(i in 1:length(Listsplit)){
    
    results[[i]]<-st_intersection(points,Listsplit[[i]])
  }
})
#>    utente   sistema trascorso 
#>      6.19      0.14      7.22

创建于 2023-12-17,使用 reprex v2.0.2

r foreach parallel-processing
1个回答
0
投票

安杰琳。

foreach
使用参数
.export
导出当前环境中不存在的变量。在您的情况下,
points
Listsplit
都在当前环境中,因此无需导出它们。

经验法则是,只要您的

foreach
运行没有错误,就不需要导出变量。

© www.soinside.com 2019 - 2024. All rights reserved.