我已将链接文档(文档树)保存在列表中(
list
)
有些文件树有不完整的项目(用
seach=1
标记)。有些树可能有多个不完整的树,这些树被标记为search=1
.
我想使用包含文档树的查找列表扩展/完成这些不完整的树(
list_lookup
),列表中总是只有一个匹配的树list_lookup
。匹配文档树的level
要调整为list
中的文档树。
样本数据和所需的输出:
library(tidyverse)
# initial df1, aaa is incomplete (it is in fact linked to other documents, but this information is stored in the lookup list)
df1 <- tibble(id_from=c(NA_character_,"111","222","333","444","444","bbb"),
id_to=c("111","222","333","444","aaa","bbb","ccc"),
level=c(0,1,2,3,4,4,5),
search=c(0,0,0,0,1,0,0))
df1
#> # A tibble: 7 × 4
#> id_from id_to level search
#> <chr> <chr> <dbl> <dbl>
#> 1 <NA> 111 0 0
#> 2 111 222 1 0
#> 3 222 333 2 0
#> 4 333 444 3 0
#> 5 444 aaa 4 1
#> 6 444 bbb 4 0
#> 7 bbb ccc 5 0
# lookup dfs, df2 contains the further document links of aaa
df2 <- tibble(id_from=c(NA,"aaa","x","x"),
id_to=c("aaa","x","x1","x2"),
level=c(0,1,2,2))
df3 <- tibble(id_from=c(NA,"thank"),
id_to=c("thank","you"),
level=c(0,1))
#list with df
list <- list(df1)
#list with lookups
list_lookup <- list(df2,df3)
list_lookup
#> [[1]]
#> # A tibble: 4 × 3
#> id_from id_to level
#> <chr> <chr> <dbl>
#> 1 <NA> aaa 0
#> 2 aaa x 1
#> 3 x x1 2
#> 4 x x2 2
#>
#> [[2]]
#> # A tibble: 2 × 3
#> id_from id_to level
#> <chr> <chr> <dbl>
#> 1 <NA> thank 0
#> 2 thank you 1
#what I need; an updated list of dfs where information from the lookup list are included
df1_wanted <- tibble(id_from=c(NA_character_,"111","222","333","444","444","aaa","bbb","x","x"),
id_to=c("111","222","333","444","aaa","bbb","x","ccc","x1","x1"),
level=c(0,1,2,3,4,4,5,5,6,6))
list(df1_wanted)
#> [[1]]
#> # A tibble: 10 × 3
#> id_from id_to level
#> <chr> <chr> <dbl>
#> 1 <NA> 111 0
#> 2 111 222 1
#> 3 222 333 2
#> 4 333 444 3
#> 5 444 aaa 4
#> 6 444 bbb 4
#> 7 aaa x 5 <- added from df2, level adjusted
#> 8 bbb ccc 5
#> 9 x x1 6 <- added from df2, level adjusted
#> 10 x x1 6 <- added from df2, level adjusted
创建于 2023-04-01 与 reprex v2.0.2
我的做法:
我想过用
purrr::map
将一个函数映射到list
的每一项,但是,我不确定这个函数应该是什么样子。
在这个解决方案中:
get_tree()
,它采用单个 id
和查找表,并从表中返回该 id
的完整树。complete_tree()
,它接受一个数据框和一个查找表列表,为每个get_tree()
迭代id_to
,其中search == 1
和每个查找表,调整level
,并绑定结果到初始数据框。complete_tree()
的每个元素迭代list
。get_tree <- function(id, lookup) {
branch <- filter(lookup, id_from == id)
if (nrow(branch) == 0) return()
bind_rows(
branch,
map(branch$id_to, \(x) get_tree(x, lookup))
)
}
complete_trees <- function(data, lookups) {
branches <- pmap(
filter(data, search == 1),
\(id_to, level, ...) {
bind_rows(map(
lookups,
\(lookup) get_tree(id_to, lookup)
)) %>%
mutate(level = level + .env$level)
}
)
bind_rows(data, branches) %>%
select(!search) %>%
arrange(level, id_from)
}
map(list, \(x) complete_trees(x, lookups = list_lookup))
结果:
[[1]]
# A tibble: 10 × 3
id_from id_to level
<chr> <chr> <dbl>
1 <NA> 111 0
2 111 222 1
3 222 333 2
4 333 444 3
5 444 aaa 4
6 444 bbb 4
7 aaa x 5
8 bbb ccc 5
9 x x1 6
10 x x2 6