是否有一种方法可以将代表组的字符串解析为R中的层次结构?
说我的小组结构如下:
"1", "1.1", "1.1.1", "1.1.1.1", "1.1.2", "1.1.3", "1.1.3.1", "1.1.3.2", "1.1.3.3", "1.2",
"1.2.1", "1.2.1.1", "1.2.1.2", "1.2.1.2.1", "1.2.2", "1.2.2.1", "1.2.2.2"
我们自然可以看到,“最高”级别为“ 1”,然后是两个主要分割项“ 1.1”和“ 1.2”,依此类推。
是否可以在R中将其解析为分层结构,并轻松地检索“级别”(例如,如上所述-如果我想要第二高的级别,则R返回“ 1.1”和“ 1.2”)
一个选项可能是使用str_split
识别深度级别。
library(stringr)
library(dplyr)
library(purrr)
strings <- c("1", "1.1", "1.1.1", "1.1.1.1", "1.1.2", "1.1.3", "1.1.3.1", "1.1.3.2", "1.1.3.3", "1.2","1.2.1", "1.2.1.1", "1.2.1.2", "1.2.1.2.1", "1.2.2", "1.2.2.1", "1.2.2.2")
strings %>%
strsplit("\\.") %>%
map(~set_names(.x,paste0("DepthLevel",seq_along(.x)))) %>%
bind_rows
## A tibble: 17 x 5
# DepthLevel1 DepthLevel2 DepthLevel3 DepthLevel4 DepthLevel5
# <chr> <chr> <chr> <chr> <chr>
# 1 1 NA NA NA NA
# 2 1 1 NA NA NA
# 3 1 1 1 NA NA
# 4 1 1 1 1 NA
# 5 1 1 2 NA NA
# 6 1 1 3 NA NA
# 7 1 1 3 1 NA
# 8 1 1 3 2 NA
# 9 1 1 3 3 NA
#10 1 2 NA NA NA
#11 1 2 1 NA NA
#12 1 2 1 1 NA
#13 1 2 1 2 NA
#14 1 2 1 2 1
#15 1 2 2 NA NA
#16 1 2 2 1 NA
#17 1 2 2 2 NA
我们可以使用grep
来获得直接子代而无需进行重组。
parent <- "1"
pat <- sprintf("^%s\\.\\d+$", parent)
grep(pat, x, value = TRUE)
## [1] "1.1" "1.2"
或者如果我们需要所有第二级值:
depth <- 2
pat2 <- sprintf("^\\d+(\\.\\d+){%d}$", depth-1)
grep(pat2, x, value = TRUE)
[1] "1.1" "1.2"
或
depth <- 2
pat2 <- sprintf("^%s$", paste(rep("\\d+", depth), collapse = "\\."))
grep(pat2, x, value = TRUE)
## [1] "1.1" "1.2"
在上面的所有示例中,如果需要x中的索引而不是其值,则省略value=TRUE
。
我们可以像这样获得所有叶节点:
leaf <- x[sapply(x, function(st) sum(startsWith(x, st))) == 1]
leaf
## [1] "1.1.1.1" "1.1.2" "1.1.3.1" "1.1.3.2" "1.1.3.3" "1.2.1.1"
## [7] "1.2.1.2.1" "1.2.2.1" "1.2.2.2"
以及所有内部节点,即非叶节点,如下所示:
setdiff(x, leaf)
## [1] "1" "1.1" "1.1.1" "1.1.3" "1.2" "1.2.1" "1.2.1.2"
## [8] "1.2.2"
x <- c("1", "1.1", "1.1.1", "1.1.1.1", "1.1.2", "1.1.3", "1.1.3.1",
"1.1.3.2", "1.1.3.3", "1.2",
"1.2.1", "1.2.1.1", "1.2.1.2", "1.2.1.2.1", "1.2.2", "1.2.2.1", "1.2.2.2")