我正在从 API 中提取指标数据,该 API 会返回包含不同大小的嵌套数据帧的数据帧,以保存其他元数据。我正在尝试将其矩形,但运气不佳。
每个指标都标有多个组(最多 15 个,最多 100 个),并带有组名称和组 ID。每个组都可以进一步分类为一系列组集。
我希望转换我的
indicators
数据框以添加其他列,其中包含组集的名称,值是该指标的相应组名称,如果不相关则为 NA。
数据来自 API,如下所示:
library(tibble)
indicators <- tibble::tibble(
name = c("Number of lettuces", "Number of oranges"),
groups = list(
data.frame(
name = c("Number"),
id = c("uid001")),
data.frame(
name = c("Oranges", "Citrus", "Number"),
id = c("uid003", "uid004", "uid001")
)
)
)
indicator_groups <- tibble::tibble(
name = c("Number", "Oranges", "Citrus"),
id = c("uid001", "uid003", "uid004"),
sets = list(
data.frame(name = c("Unit_of_measure"), id = c("gid003")),
data.frame(name = c("Fruit"), id = c("gid001")),
data.frame(name = c("Fruit", "Type"), id = c("gid001", "gid002"))
)
)
indicator_group_sets <- tibble::tibble(
name = c("Fruit", "Type", "Unit_of_measure"),
id = c("gid001", "gid002", "gid003"),
groups = list(
data.frame(id = c("uid003", "uid004")),
data.frame(id = c("uid003", "uid004")),
data.frame(id = c("uid001"))
)
)
我确信
dplyr::left_join()
和/或 tidyr::hoist()
的某种组合应该能够做到这一点,但我很难理解如何做到这一点。我知道我只需要 indicator_groups
或 indicator_group_sets
之一。
在之前的尝试中,我已经设法达到每组一列(但不是每组集),和/或每个指标/组组合一行和许多我无法减少的 NA。
想要的结果:
indicators
#> name Fruit Type Unit_of_measure
#> 1 Number of lettuces <NA> <NA> Number
#> 2 Number of oranges Oranges Citrus Number
有人可以指点我如何明智地解决这个问题吗?
我认为有更优雅的东西,但这似乎有效。这有点棘手,因为每个表都有多个名为“名称”的字段。
library(tidyverse)
indicators_tidy = indicators |>
unnest_longer(groups) |>
unnest_wider(groups, names_sep = "_")
indicator_groups_tidy <- indicator_groups |>
unnest_longer(sets) |>
unnest_wider(sets, names_sep = "_")
indicators_tidy |>
left_join(indicator_groups_tidy,
join_by(groups_name == name, groups_id == id))
结果
# A tibble: 5 × 5
name groups_name groups_id sets_name sets_id
<chr> <chr> <chr> <chr> <chr>
1 Number of lettuces Number uid001 Unit_of_measure gid003
2 Number of oranges Oranges uid003 Fruit gid001
3 Number of oranges Citrus uid004 Fruit gid001
4 Number of oranges Citrus uid004 Type gid002
5 Number of oranges Number uid001 Unit_of_measure gid003