使用函数使用查找列表完成不完全链接的文档(文档树)

问题描述 投票:0回答:1

我已将链接文档(文档树)保存在列表中(

list

有些文件树有不完整的项目(用

seach=1
标记)。有些树可能有多个不完整的树,这些树被标记为
search=1
.

我想使用包含文档树的查找列表扩展/完成这些不完整的树(

list_lookup
),列表中总是只有一个匹配的树
list_lookup
。匹配文档树的
level
要调整为
list
中的文档树。

样本数据和所需的输出:

library(tidyverse)

# initial df1, aaa is incomplete (it is in fact linked to other documents, but this information is stored in the lookup list)
 
df1 <- tibble(id_from=c(NA_character_,"111","222","333","444","444","bbb"),
             id_to=c("111","222","333","444","aaa","bbb","ccc"),
             level=c(0,1,2,3,4,4,5),
             search=c(0,0,0,0,1,0,0))
df1
#> # A tibble: 7 × 4
#>   id_from id_to level search
#>   <chr>   <chr> <dbl>  <dbl>
#> 1 <NA>    111       0      0
#> 2 111     222       1      0
#> 3 222     333       2      0
#> 4 333     444       3      0
#> 5 444     aaa       4      1
#> 6 444     bbb       4      0
#> 7 bbb     ccc       5      0


# lookup dfs, df2 contains the further document links of aaa
df2 <- tibble(id_from=c(NA,"aaa","x","x"),
             id_to=c("aaa","x","x1","x2"),
             level=c(0,1,2,2))

df3 <- tibble(id_from=c(NA,"thank"),
                     id_to=c("thank","you"),
                     level=c(0,1))

#list with df
list <- list(df1)

#list with lookups
list_lookup <- list(df2,df3)

list_lookup
#> [[1]]
#> # A tibble: 4 × 3
#>   id_from id_to level
#>   <chr>   <chr> <dbl>
#> 1 <NA>    aaa       0
#> 2 aaa     x         1
#> 3 x       x1        2
#> 4 x       x2        2
#> 
#> [[2]]
#> # A tibble: 2 × 3
#>   id_from id_to level
#>   <chr>   <chr> <dbl>
#> 1 <NA>    thank     0
#> 2 thank   you       1

#what I need; an updated list of dfs where information from the lookup list are included

df1_wanted <- tibble(id_from=c(NA_character_,"111","222","333","444","444","aaa","bbb","x","x"),
                     id_to=c("111","222","333","444","aaa","bbb","x","ccc","x1","x1"),
                     level=c(0,1,2,3,4,4,5,5,6,6))

list(df1_wanted)
#> [[1]]
#> # A tibble: 10 × 3
#>    id_from id_to level
#>    <chr>   <chr> <dbl>
#>  1 <NA>    111       0
#>  2 111     222       1
#>  3 222     333       2
#>  4 333     444       3
#>  5 444     aaa       4
#>  6 444     bbb       4
#>  7 aaa     x         5  <- added from df2, level adjusted
#>  8 bbb     ccc       5  
#>  9 x       x1        6  <- added from df2, level adjusted
#> 10 x       x1        6  <- added from df2, level adjusted

创建于 2023-04-01 与 reprex v2.0.2

我的做法:

我想过用

purrr::map
将一个函数映射到
list
的每一项,但是,我不确定这个函数应该是什么样子。

r purrr lookup-tables
1个回答
0
投票

在这个解决方案中:

  1. 我首先定义一个递归函数
    get_tree()
    ,它采用单个
    id
    和查找表,并从表中返回该
    id
    的完整树。
  2. 然后,我定义了一个函数,
    complete_tree()
    ,它接受一个数据框和一个查找表列表,为每个
    get_tree()
    迭代
    id_to
    ,其中
    search == 1
    和每个查找表,调整
    level
    ,并绑定结果到初始数据框。
  3. 最后,我为
    complete_tree()
    的每个元素迭代
    list
get_tree <- function(id, lookup) {
  branch <- filter(lookup, id_from == id)
  if (nrow(branch) == 0) return()
  bind_rows(
    branch, 
    map(branch$id_to, \(x) get_tree(x, lookup))
  )
}

complete_trees <- function(data, lookups) {
  branches <- pmap(
    filter(data, search == 1),
    \(id_to, level, ...) {
      bind_rows(map(
          lookups, 
          \(lookup) get_tree(id_to, lookup)
        )) %>%
        mutate(level = level + .env$level)
    }
  )
  bind_rows(data, branches) %>%
    select(!search) %>%
    arrange(level, id_from)
}

map(list, \(x) complete_trees(x, lookups = list_lookup))

结果:

[[1]]
# A tibble: 10 × 3
   id_from id_to level
   <chr>   <chr> <dbl>
 1 <NA>    111       0
 2 111     222       1
 3 222     333       2
 4 333     444       3
 5 444     aaa       4
 6 444     bbb       4
 7 aaa     x         5
 8 bbb     ccc       5
 9 x       x1        6
10 x       x2        6
© www.soinside.com 2019 - 2024. All rights reserved.