R中的lm(公式)在parLapply中表现不同

问题描述 投票:0回答:2

首先,我创建一对示例数据帧:

df = data.frame("sample1" = runif(10), "sample2" = runif(10), "sample3" = runif(10), "sample4" = runif(10))
traits = data.frame("var1" = c(rep("group1", 2), rep("group2", 2)))
rownames(traits) = colnames(df)

如果我创建一个公式作为文本字符串,我可以将其插入lm()

> row = t(df[1,])
> ModString = "row ~ traits$var1"
> Mod = lm(as.formula(ModString))
> Mod

Call:
lm(formula = as.formula(ModString))

Coefficients:
      (Intercept)  traits$var1group2  
           0.7799             0.1788  

但是如果我尝试用parLapply做同样的事情,我会收到一个错误,表明“traits”参数没有按预期工作:

> num_cores <- detectCores() - 1
> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits){
+     row = df[i,]
+     ModString = "vector ~ traits$factor1"
+     Mod = lm(ModString)
+     return(Mod)
+ }, df = df, traits = traits)
Error in checkForRemoteErrors(val) : 
  9 nodes produced errors; first error: object 'traits' not found

但奇怪的是,“traits”参数正在使其成为我正在使用的parLapply,它似乎是关于lm()工作方式的问题。我可以输入并返回“特征”就好了:

> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits){
+     row = df[i,]
+     traits2 = traits
+     ModString = "vector ~ traits$factor1"
+     return(list(traits2, row, ModString))
+ }, df = df, traits = traits)
> results
[[1]]
[[1]][[1]]
          var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[1]][[2]]
    sample1   sample2   sample3  sample4
1 0.6941108 0.8656177 0.9807334 0.936609

[[1]][[3]]
[1] "vector ~ traits$factor1"


[[2]]
[[2]][[1]]
          var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[2]][[2]]
    sample1   sample2   sample3   sample4
2 0.1007983 0.5599374 0.0208095 0.8082196

[[2]][[3]]
[1] "vector ~ traits$factor1"


[[3]]
[[3]][[1]]
          var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[3]][[2]]
    sample1   sample2  sample3   sample4
3 0.9633059 0.7564143 0.913617 0.4179525

[[3]][[3]]
[1] "vector ~ traits$factor1"


[[4]]
[[4]][[1]]
          var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[4]][[2]]
     sample1  sample2  sample3   sample4
4 0.06625104 0.390351 0.511572 0.8386714

[[4]][[3]]
[1] "vector ~ traits$factor1"


[[5]]
[[5]][[1]]
          var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[5]][[2]]
    sample1   sample2    sample3  sample4
5 0.6135228 0.4926991 0.08513074 0.105647

[[5]][[3]]
[1] "vector ~ traits$factor1"


[[6]]
[[6]][[1]]
          var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[6]][[2]]
    sample1   sample2   sample3   sample4
6 0.7121677 0.6554129 0.6409468 0.4906039

[[6]][[3]]
[1] "vector ~ traits$factor1"


[[7]]
[[7]][[1]]
          var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[7]][[2]]
    sample1  sample2   sample3   sample4
7 0.4651641 0.546514 0.4039608 0.1758802

[[7]][[3]]
[1] "vector ~ traits$factor1"


[[8]]
[[8]][[1]]
          var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[8]][[2]]
    sample1   sample2   sample3   sample4
8 0.5121237 0.4950444 0.9662431 0.6851582

[[8]][[3]]
[1] "vector ~ traits$factor1"


[[9]]
[[9]][[1]]
          var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[9]][[2]]
    sample1  sample2   sample3   sample4
9 0.2486208 0.135422 0.2128657 0.7332921

[[9]][[3]]
[1] "vector ~ traits$factor1"


[[10]]
[[10]][[1]]
          var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[10]][[2]]
      sample1   sample2   sample3   sample4
10 0.06203028 0.7916495 0.3528376 0.2259685

[[10]][[3]]
[1] "vector ~ traits$factor1"

我在这里错过了什么令人尴尬的琐碎细节?

r parallel-processing apply lm
2个回答
1
投票

我会这样做的;请注意完全不同的数据组织:

library(dplyr)
library(tidyr)
library(tibble)
library(parallel)

#You seem to have rows of data that should be columns,
# this puts things in a form more suitable for work in R
df_new <- df %>% 
    mutate(row = 1:n()) %>% 
    gather(key = sample,value = val,sample1:sample4) %>% 
    arrange(row,sample)

#Data in rownames is not terribly useful
traits_new <- rownames_to_column(traits,"sample")

#Now we can put it all in *one* data frame
df_new <- left_join(df_new,
                    traits_new,
                    by = "sample")

#...and split it into a list representing each of the df's you
# want a lm() fit on
df_new_split <- split(df_new,df_new$row)

#Wrapper for lm with the only formula we need
fit_lm <- function(x){
    lm(val ~ var1,data = x)
}

num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)

results <- parLapply(cl = cl,df_new_split,fit_lm)

-1
投票

好吧,我觉得自己真的很傻但是我要提出这个问题,因为这是一个很好的例子,说明在复制粘贴和编辑多个版本的代码时会感到困惑。我没有在我的as.formula中一直使用parLapply,也忘了将变量名称向量更改为行并转置它。

所以。以下作品只是花花公子:

require(parallel)
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl, seq(1:10), function(i, df, traits){
    row = t(df[i,])
    ModString = "row ~ traits[,\"var1\"]"
    Mod = lm(as.formula(ModString))
    return(Mod)
}, df = df, traits = traits)
© www.soinside.com 2019 - 2024. All rights reserved.