R 嵌套函数中并行计算(%dofuture%)的变量作用域问题

问题描述 投票:0回答:1

当我尝试在嵌套函数中使用

%dofuture%
进行并行计算时,遇到变量范围问题。

这是我收到的错误消息:

eval 中的错误(引用({:未找到对象“opt”

这是我的代码(对象

opt
local_var1
替换):

myFunction1 = function(x){
  
  y = x + 1
  return(y)
}
myFunction2 = function(x, y){
  
  z = x + y
  return(z)
}
myFunction3 = function(function_var4, function_var5, function_var6){
  
  # Claim some local variables
  local_var1 = vector("list", length = ncol(function_var4))
  local_var2 = vector("list", length = ncol(function_var4))
  local_var3 = function_var5 %>% pull(function_var6)
  local_var4 = data.frame(Var = seq(min(local_var3), max(local_var3), length.out = 10000))
  
  # Do some parallel calculation
  plan(multisession, workers = parallel::detectCores() - 2)
  
  foreach (i = 1:ncol(function_var4)) %dofuture% {
    
    data_glm = data.frame(Var = local_var3,
                          PreAbs = function_var4[,i])
    
    mod_glm = glm(PreAbs ~ poly(Var, 3), family = binomial, data = data_glm)
    
    # Result of the calculation
    local_local_var1 = predict(mod_glm, newdata = local_var4, se = F, type = "response")
    
    # Some simple calculation using local_local_var1
    # Save the result
    local_var1[[i]] = mean(local_local_var1)  # <---- I guess this cause the error
    local_var2[[i]] = myFunction1(local_local_var1)    
  }
  
  # Close multisession workers by switching plan
  plan(sequential)
  
  local_var5 =  myFunction2(local_local_var1, function_var5)
  
  return(list(opt = unlist(local_var1),
              nw = unlist(local_var2),
              miv = unlist(local_var5)))
}

运行代码时收到错误消息:

library(foreach)
library(doFuture)
library(dplyr)

global_var1 = matrix(sample(c(0, 1), size = 10000, replace = T), ncol = 100) %>%
  as.data.frame()
global_var2 = data.frame(C1 = rnorm(100))
global_var3 = "C1"
  
result = myFunction3(function_var4 = global_var1, function_var5 = global_var2, function_var6 = global_var3)

我猜这个错误是因为

%dofuture%
只能从全局环境中获取变量和设置,但无法从本地环境中获取变量和设置。有什么办法可以解决这个问题吗?

r function parallel-processing scope
1个回答
0
投票

对我来说,这看起来像是

doFuture
中的一个错误。
%dofuture%
运算符确实记录了调用它的环境,并将其传递给
doFuture:::doFuture2
,后者实际上完成了所有工作,但在整个过程中的某个地方,环境并未被使用。 Henrik Bengtsson(
doFuture
的作者)知道他在做什么,因此如果您在 https://github.com/HenrikBengtsson/doFuture/issues 将其作为“问题”向他报告,他可能会解决此问题。然而,这可能是
foreach
中的设计缺陷,
doFuture
无法解决。

我尝试构建一个解决方法,这是一种运行代码而不会出现您看到的错误消息的方法。通过将以下内容更改为

myFunction3
foreach
循环中执行的代码放入本地函数中。您还需要在该函数中使用“超级赋值”(即
<<-
)来对
myFunction3
变量进行赋值:

myFunction3 = function(function_var4, function_var5, function_var6){
  
  # Claim some local variables
  local_var1 = vector("list", length = ncol(function_var4))
  local_var2 = vector("list", length = ncol(function_var4))
  local_var3 = function_var5 %>% pull(function_var6)
  local_var4 = data.frame(Var = seq(min(local_var3), max(local_var3), length.out = 10000))
  
  # Do some parallel calculation
  plan(multisession, workers = parallel::detectCores() - 2)
  
  local_local_var1 <- NULL
  
  loopcode <- function(i) {
    
    data_glm = data.frame(Var = local_var3,
                          PreAbs = function_var4[,i])
    
    mod_glm = glm(PreAbs ~ poly(Var, 3), family = binomial, data = data_glm)
    
    # Result of the calculation
    local_local_var1 <<- predict(mod_glm, newdata = local_var4, se = F, type = "response")
    
    # Some simple calculation using local_local_var1
    # Save the result
    local_var1[[i]] <<- mean(local_local_var1)  # <---- I guess this cause the error
    local_var2[[i]] <<- myFunction1(local_local_var1)    
  }
  
  foreach (i = 1:ncol(function_var4)) %dofuture% loopcode(i)
  
  # Close multisession workers by switching plan
  plan(sequential)
  
  local_var5 =  myFunction2(local_local_var1, function_var5)
  
  return(list(opt = unlist(local_var1),
              nw = unlist(local_var2),
              miv = unlist(local_var5)))
}

但是,仍然存在问题,因为该函数的返回始终为空:

$opt
NULL

$nw
NULL

$miv
numeric(0)

我不太了解

foreach
doFuture
软件包,不知道是否可以修复此问题。

© www.soinside.com 2019 - 2024. All rights reserved.