如何在 R 中向量化嵌套循环?

问题描述 投票:0回答:1

我想向量化这个操作:

for (row1 in 1:nrow(full_df)) {
  for (row2 in 1:nrow(icd10_codes)){
    if (any(full_df[row1, "coding_19"] %in% icd10_codes[row2, "present_icd10"])){
      full_df[row1, "code_count"] <- full_df[row1, "code_count"]+1
    }
  }
}

full_df 看起来像这样:

  coding_19 code_count
  <list>             <dbl>
1 H353              0
2 <chr [8]>              0
3 <chr [2]>              0
4 E780              0

还有

> head(full_df$coding_19)
[[1]]
[1] "H353"

[[2]]
[1] "B20"  "B21"  "B22"  "B23"  "B24"  "Z21"  "F024" "O987"

[[3]]
[1] "G30" "F00"

[[4]]
[1] "E780" 

icd10_codes
看起来像这样。 eid 是该人的 ID,
present_icd10
是与该人关联的代码。

      eid present_icd10
1 1          G30
2 2          E781
3 3          E780
4 4    H401, H409
5 5          H353
6 6          E780

注意

present_icd10
coding_19
是 n 维向量。

我想统计每个人中至少存在

full_df$coding_19
(rowise) 中的一个元素的次数 (
present_icd10
)

我尝试使用这个:

full_df <- full_df %>%
  rowwise() %>%
  mutate(code_count = code_count + as.integer(any(coding_19 %in% icd10_codes$present_icd10)))

但我认为这只有在我有一个循环而不是嵌套循环时才有效。

r loops for-loop vectorization
1个回答
0
投票

基于有限样品:

library(tidyverse)

full_df %>%  
  mutate(code_count = map_dbl(coding_19, ~ sum(.x %in% unlist(str_split(pull(icd10_codes, present_icd10), ", ")))))

# A tibble: 4 x 2
  coding_19 code_count
  <list>         <dbl>
1 <chr [1]>          1
2 <chr [8]>          0
3 <chr [2]>          1
4 <chr [1]>          1

数据:

structure(list(coding_19 = list("H353", c("B20", "B21", "B22", 
"B23", "B24", "Z21", "F024", "O987"), c("G30", "F00"), "E780"), 
    code_count = c(0, 0, 0, 0)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L))

structure(list(eid = c(1, 2, 3, 4, 5, 6), present_icd10 = c("G30", 
"E781", "E780", "H401, H409", "H353", "E780")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))
© www.soinside.com 2019 - 2024. All rights reserved.