如何将字符串解析为R中的层次结构或树

问题描述 投票:1回答:2

是否有一种方法可以将代表组的字符串解析为R中的层次结构?

说我的小组结构如下:

"1", "1.1", "1.1.1", "1.1.1.1", "1.1.2", "1.1.3", "1.1.3.1", "1.1.3.2", "1.1.3.3", "1.2",       
"1.2.1", "1.2.1.1", "1.2.1.2", "1.2.1.2.1", "1.2.2", "1.2.2.1", "1.2.2.2"

我们自然可以看到,“最高”级别为“ 1”,然后是两个主要分割项“ 1.1”和“ 1.2”,依此类推。

是否可以在R中将其解析为分层结构,并轻松地检索“级别”(例如,如上所述-如果我想要第二高的级别,则R返回“ 1.1”和“ 1.2”)

r string tree hierarchical-data
2个回答
0
投票

一个选项可能是使用str_split识别深度级别。

library(stringr)
library(dplyr)
library(purrr)
strings <- c("1", "1.1", "1.1.1", "1.1.1.1", "1.1.2", "1.1.3", "1.1.3.1", "1.1.3.2", "1.1.3.3", "1.2","1.2.1", "1.2.1.1", "1.2.1.2", "1.2.1.2.1", "1.2.2", "1.2.2.1", "1.2.2.2")


strings %>%
  strsplit("\\.") %>%
  map(~set_names(.x,paste0("DepthLevel",seq_along(.x)))) %>%
  bind_rows
## A tibble: 17 x 5
#   DepthLevel1 DepthLevel2 DepthLevel3 DepthLevel4 DepthLevel5
#   <chr>       <chr>       <chr>       <chr>       <chr>      
# 1 1           NA          NA          NA          NA         
# 2 1           1           NA          NA          NA         
# 3 1           1           1           NA          NA         
# 4 1           1           1           1           NA         
# 5 1           1           2           NA          NA         
# 6 1           1           3           NA          NA         
# 7 1           1           3           1           NA         
# 8 1           1           3           2           NA         
# 9 1           1           3           3           NA         
#10 1           2           NA          NA          NA         
#11 1           2           1           NA          NA         
#12 1           2           1           1           NA         
#13 1           2           1           2           NA         
#14 1           2           1           2           1          
#15 1           2           2           NA          NA         
#16 1           2           2           1           NA         
#17 1           2           2           2           NA   

0
投票

我们可以使用grep来获得直接子代而无需进行重组。

parent <- "1"
pat <- sprintf("^%s\\.\\d+$", parent)
grep(pat, x, value = TRUE)
## [1] "1.1" "1.2"

或者如果我们需要所有第二级值:

depth <- 2
pat2 <- sprintf("^\\d+(\\.\\d+){%d}$", depth-1)
grep(pat2, x, value = TRUE)
[1] "1.1" "1.2"

depth <- 2
pat2 <- sprintf("^%s$", paste(rep("\\d+", depth), collapse = "\\."))
grep(pat2, x, value = TRUE)
## [1] "1.1" "1.2"

在上面的所有示例中,如果需要x中的索引而不是其值,则省略value=TRUE

我们可以像这样获得所有叶节点:

leaf <- x[sapply(x, function(st) sum(startsWith(x, st))) == 1]
leaf
## [1] "1.1.1.1"   "1.1.2"     "1.1.3.1"   "1.1.3.2"   "1.1.3.3"   "1.2.1.1"  
## [7] "1.2.1.2.1" "1.2.2.1"   "1.2.2.2"  

以及所有内部节点,即非叶节点,如下所示:

setdiff(x, leaf)
## [1] "1"       "1.1"     "1.1.1"   "1.1.3"   "1.2"     "1.2.1"   "1.2.1.2"
## [8] "1.2.2"  

注意

x <- c("1", "1.1", "1.1.1", "1.1.1.1", "1.1.2", "1.1.3", "1.1.3.1", 
"1.1.3.2", "1.1.3.3", "1.2",       
"1.2.1", "1.2.1.1", "1.2.1.2", "1.2.1.2.1", "1.2.2", "1.2.2.1", "1.2.2.2")
© www.soinside.com 2019 - 2024. All rights reserved.