每个并行化中的 sf_distance

问题描述 投票:0回答:1

总的来说,我有一个数据框,其中包含连接了空间变量的建筑物。然后我有另一个文件,例如森林,同样

总_df:

身份证 变量 SFC 点对象 邮政编码
1 10 点(543611.8 6389285) 2324
2 15 点(513611.8 6349285) 2324
3 12 点(533611.8 6359285) 2329

大约 200 万个观察值

森林_距离:

身份证 变量 SFC 多边形对象
1 10 多边形 Z ((455302.7 6252026 9.09, 455292.6 6252034 9.09, 455274.8 6252036 9.9, 455246 6252113 14.25, 455286.1 6252124 14.15, 455293.5 6252126 14.13, 455317.8 6252068 14.13, 455331.5 6252073 14.13, 455345.5 6252044 14.78, 455302.7 6252026 9.09))

forest_distance保存在list中,其中原始forest_distance被分成10等份。

我已经计算出了要做的事情之间的距离,并且我还分割了 Total_df,以便在由邮政编码决定的较小子集上进行。

但是现在,为了加快计算速度,我想做一个并行化,我还将forest_distance细分为更小的文件。

我想进行并行化会更快,这样每个会话都会执行细分的forest_distance的一部分。

另外,是否可以从不同的会话进行打印,以查看进度?

    registerDoParallel(cores = 6)    

# Use foreach to loop over list.dfs in parallel
foreach(d = 1:length(list.dfs), .packages = "sf", .combine = 'c') %dopar% {
  # Get the data frame at position 'd' in the list
  df <- list.dfs[[d]]
  
  # Open a list to store combined inner results 
  grand_list <- list()
  
  # Initialize an empty list to store the results of the inner loop
  inner_results <- list()
  
  # zip_code 
  zipcode <- sort(unique(Total_df$zipcode))
  

  # Use a regular for loop to iterate over zipcode
  for(i in zipcode) {
    cat(i, "\n")
    start_time <- Sys.time()
    
    # Subset the data
    subset_df <- Total_df[Total_df$zipcode == i, ]
    
    if(nrow(subset_df) > 0) {
      # Calculate distances
      distances <- sf::st_distance(subset_df, df)
      
      # Define the 'miin' function, or replace it with an appropriate function
      miin <- function(x) min(x, na.rm = TRUE)
      
      # Calculate minimum distances
      min_distances <- apply(distances, 1, miin)
      
      # Store minimum distances in a new column
      subset_df$min_distances <- min_distances
    }
    
    end_time <- Sys.time()
    print(paste("Time for municipality Forest", i, ": ", end_time - start_time))
    
    # Store the updated subset_df in the inner_results list
    inner_results[[i]] <- subset_df
  }
  
  # Combine the results of the inner loop using do.call
  grand_list[[d]] <- do.call(rbind, inner_results)
  
}

它已经运行了好几个小时,不得不停止,但期间没有保存任何结果。

r geospatial distance parallel-foreach
1个回答
0
投票

这是未经尝试的,但重写类似的东西可能会起作用:


registerDoParallel(cores = 6)

# Use foreach to loop over list.dfs in parallel
grand_list <- foreach(df = list.dfs, .packages = "sf") %dopar% {

  # Initialize an empty list to store the results of the inner loop
  inner_results <- list()

  # zip_code
  zipcode <- sort(unique(Total_df$zipcode))


  # Use a regular for loop to iterate over zipcode
  for(i in zipcode) {
    cat(i, "\n")
    start_time <- Sys.time()

    # Subset the data
    subset_df <- Total_df[Total_df$zipcode == i, ]

    if(nrow(subset_df) > 0) {
      # Calculate distances
      distances <- sf::st_distance(subset_df, df)

      # Define the 'miin' function, or replace it with an appropriate function
      miin <- function(x) min(x, na.rm = TRUE)

      # Calculate minimum distances
      min_distances <- apply(distances, 1, miin)

      # Store minimum distances in a new column
      subset_df$min_distances <- min_distances
    }

    end_time <- Sys.time()
    print(paste("Time for municipality Forest", i, ": ", end_time - start_time))

    # Store the updated subset_df in the inner_results list
    inner_results[[i]] <- subset_df
  }

  # Combine the results of the inner loop using do.call
  do.call(rbind, inner_results)

}

(虽然你做的打印可能不起作用)

提示:使用 %do% 而不是 %dopar% 调试代码,并仅运行前两个值:

grand_list <- foreach(df = list.dfs[1:2], .packages = "sf") %do% { ... }

根据您的喜好填写调试语句等。当它起作用时,删除[1:2]并将其更改为dopar。

© www.soinside.com 2019 - 2024. All rights reserved.