我正在尝试帮助一些朋友创建我们州发现的植物物种的格式化“清单”。
数据看起来像这样(除了有超过 3,000 个分类单元):
dat<- as.data.frame(cbind(clade = c("Clade x", "Clade x", "Clade x", "Clade y", "Clade y", "Clade z", "Clade z"),
family = c("FAMILY A", "FAMILY A", "FAMILY B", "FAMILY C", "FAMILY C", "FAMILY D", "FAMILY E"),
taxon = c("Juniperus osteosperma", "Ephedra viridis", "Achillea millefolium", "Artemisia tridentata var. tridentata", "Iva axillaris", "Pleiacanthus spinosus", "Packera multilobata"),
life_history = c("tree", "shrub", "forb", "shrub", "forb", "forb", "forb"),
County = c("All counties", "WP", "WP, CK", "EU, WP, WA", "CK", "DG", "DG, CC"),
non.native = c("", "", "Non-native", "", "Non-native", "", "")))
我希望能够将其解析为一个word文档,其中每一行都成为自己的条目,并且条目按进化枝然后按家族进行分组。我还想格式化输出的文本字符串的某些部分(例如,分类单元中除“var.”之外的所有单词都应为斜体)。
我正在寻找的输出将是这样的:
我能够使用以下方法将每个条目所需的列组合成一个字符串:
entry<- paste0(dat$taxon, ". ",
dat$life_history, ". ",
dat$County, ". ",
ifelse(!is.na(dat$non.native), paste0(dat$non.native, "")))
entry
我尝试使用 dplyr 按进化枝和家族进行分组,并使用 for 循环为每行获取单独的“条目”,但似乎无法使 for 循环识别分组。
dat %>% group_by(clade, family) %>%
for (clade in unique(dat$clade)) {
cat(glue::glue("\n\n# {clade} \n \n "))
for(family in unique(dat$family)) {
cat(glue::glue("\n\n# {family} \n \n "))
for(entry in unique(dat$entry)) {
cat(glue::glue("{entry} \n \n"))
}
}
}
这会导致错误:4 个参数传递给 'for',而 'for' 需要 3 个参数。 如果我删除 group_by 行,我会得到一个输出,其中每个分支的每个家族和条目都会重复,而不仅仅是属于在一起的家族和条目。
如何让它只打印实际属于每个组的内容?
您无法将 data.frame 通过管道传输到
for()
。但您可以使用 group_map()
: 循环浏览组
library(tidyverse)
dat<- as.data.frame(cbind(clade = c("Clade x", "Clade x", "Clade x", "Clade y", "Clade y", "Clade z", "Clade z"),
family = c("FAMILY A", "FAMILY A", "FAMILY B", "FAMILY C", "FAMILY C", "FAMILY D", "FAMILY E"),
taxon = c("Juniperus osteosperma", "Ephedra viridis", "Achillea millefolium", "Artemisia tridentata var. tridentata", "Iva axillaris", "Pleiacanthus spinosus", "Packera multilobata"),
life_history = c("tree", "shrub", "forb", "shrub", "forb", "forb", "forb"),
County = c("All counties", "WP", "WP, CK", "EU, WP, WA", "CK", "DG", "DG, CC"),
non.native = c("", "", "Non-native", "", "Non-native", "", "")))
dat |>
group_by(clade) |>
group_map(\(clade_tbl, clade_grp_tbl) list(
str_glue("\n\n# {clade_grp_tbl$clade} \n\n"),
group_by(clade_tbl, family) |>
group_map(\(family_tbl, family_grp_tbl) list(
str_glue("\n\n# {family_grp_tbl$family} \n\n"),
str_glue_data(family_tbl, "{taxon}. {life_history}. {County}. {non.native}")
)
),
"\n---"
)
) |>
unlist() |>
paste0(collapse = "\n") |>
cat()
结果:
#>
#> # Clade x
#>
#>
#> # FAMILY A
#>
#> Juniperus osteosperma. tree. All counties.
#> Ephedra viridis. shrub. WP.
#>
#> # FAMILY B
#>
#> Achillea millefolium. forb. WP, CK. Non-native
#>
#> ---
#>
#> # Clade y
#>
#>
#> # FAMILY C
#>
#> Artemisia tridentata var. tridentata. shrub. EU, WP, WA.
#> Iva axillaris. forb. CK. Non-native
#>
#> ---
#>
#> # Clade z
#>
#>
#> # FAMILY D
#>
#> Pleiacanthus spinosus. forb. DG.
#>
#> # FAMILY E
#>
#> Packera multilobata. forb. DG, CC.
#>
#> ---
创建于 2024-01-31,使用 reprex v2.0.2