有没有办法在 R 中向量化这些 bootstrappign 循环？

Question

我是 R 新手。我习惯了 VB，其中大量使用循环，但我知道如果我可以向量化数据，R 会更高效。我不知道是否可以对我在这里构建的内容进行矢量化。

总体思路是，对于

n=3:N

：

从原始样本（大小为
```
n
```
）中随机抽取大小为
```
N
```
的随机样本（无需放回）
使用
```
B
```
重新采样
再做
```
X
```
次
对所有
```
X
```
参数估计值进行平均并检查收敛性（即查看估计值之间的标准差或类似内容 - 待定）。

我还没有实现步骤4，所以下面的代码只执行步骤1:3。第 4 步应该足够简单，可以使用

rowMeans()

在循环外执行。

注意：我将 B 和 X 设置为 100 进行测试，但最终使用时需要两者等于 10,000（或更多）

# simulate observation of N=30
bdf <- data.frame(sample(8:13, 30, rep = TRUE)

# get number of observations
N <- length(bdf)

# set number of bootstrap replicates
B <- 100

# set number of times to repeat the estimate
X <- 100

# create empty storage container for results
result_vec <- vector(length=B)

# this loop iterates over the number of times to repeat the estimate
for (j in 1:X) {
  
  # this loop iterates sample size from n=2 to n=N
  for (i in 3:N) {
    
    # random sample of size n
    boot_samp <- bdf[sample(N, size=i, replace=FALSE)]
    
    # this loop does the bootstrap sampling
    for(b in 1:B) {
      # draw a bootstrap sample
      bsamp <- sample(boot_samp, size=i, replace=TRUE)
      
      # calculate your parameter
      p <- mean(bsamp)
      #p <- sd(bsamp)
      
      # save the calculated parameter
      result_vec[b] <- p
      
    }
    if (i==3) {
      # initiate data frame and store the results for n=2 parameter estimate
      df_res <- data.frame(result_vec)
    }
    else {
      # add the results for n=i parameter estimate to the data frame
      df_temp <- data.frame(result_vec)
      df_res <- cbind(df_res, df_temp)
    }
    
    # rename the column in the data frame as n=i
    names(df_res)[ncol(df_res)] <- paste("n = ",i)
  }
  
  # calculate the mean of the parameter estimates
  allmeans <- colMeans(df_res)
  
  if (j==1) {
    # initiate a new data frame to store the means
    df_means <- data.frame(allmeans)
  }
  else {
    # add the results to the existing data frame
    df_temp <- data.frame(allmeans)
    df_means <- cbind(df_means, df_temp)
  }
  
  # rename the column in the data frame with j
  names(df_means)[ncol(df_means)] <- j
}

Answer 1

所有循环都可以用几个

mapply

调用来替换。由于最内层的循环是通过替换完成的，因此可以一次完成所有样本并将其放入矩阵中

rowMeans

。

# simulate observation of N=30
bdf <- data.frame(sample(8:13, 30, rep = TRUE))

# get number of observations
N <- nrow(bdf)

# set number of bootstrap replicates
B <- 1e4

# set number of times to repeat the estimate
X <- 100

# this loop iterates over the number of times to repeat the estimate
system.time({
  df_means <- mapply(
    \(j) colMeans(
      mapply(
        \(i) rowMeans(matrix(sample(sample(bdf[,1], i), B*i, 1), B, i)), 3:N
      )
    ), 1:X
  )
  dimnames(df_means) <- list(paste0("n", 3:N), paste0("j", 1:X))
})
#>    user  system elapsed 
#>   21.58    1.39   22.98

此外，这个过程非常容易并行运行：

library(parallel)

X <- 1e3

system.time({
  cl <- makeCluster(detectCores() - 1) # 15 cores
  clusterExport(cl, c("bdf", "N", "B"))
  df_means <- simplify2array(
    parLapply(cl, 1:X, \(j) colMeans(
      mapply(
        \(i) rowMeans(matrix(sample(sample(bdf[,1], i), B*i, 1), B, i)), 3:N
      )
    ))
  )
  
  dimnames(df_means) <- list(paste0("n", 3:N), paste0("j", 1:X))
})
#>    user  system elapsed 
#>    0.02    0.14   22.08

在我老化的笔记本电脑上并行执行

B = X = 10000

只需不到 4 分钟。

有没有办法在 R 中向量化这些 bootstrappign 循环？

问题描述投票：0回答：1

1个回答

最新问题

有没有办法在 R 中向量化这些 bootstrappign 循环？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1