我们能否通过类似于SQL代码的交叉连接得到预期的输出

问题描述 投票:0回答:1

我正在尝试在 R 中使用交叉联接编写类似的代码,就像我们在 proc SQL 中所做的那样。

但是,我无法编写代码,请帮助我。

数据如下:

data.frame(
  all_names = c("PARENT1", "CHILD1", "CHILD2", "CHILD3", "PARENT2", "PARENT3", "CHILD4", "CHILD5", "CHILD6", "PARENT7", "CHILD7", "CHILD8")
)

   all_names
1    PARENT1
2     CHILD1
3     CHILD2
4     CHILD3
5    PARENT2
6    PARENT3
7     CHILD4
8     CHILD5
9     CHILD6
10   PARENT7
11    CHILD7
12    CHILD8

预期输出:

data.frame(
  parent= c("PARENT1", "PARENT1", "PARENT1", "PARENT1", "PARENT2", "PARENT2", "PARENT2", "PARENT3", "PARENT3", "PARENT3", "PARENT4", "PARENT4", "PARENT4"),
child=c("CHILD1", "CHILD2", "CHILD3", "CHILD4", "PARENT5", "PARENT6", "CHILD4", "CHILD5", "CHILD6",  "CHILD7", "CHILD8")
)

    parent  child
1  PARENT1 CHILD1
2  PARENT1 CHILD2
3  PARENT1 CHILD3
4  PARENT2 CHILD4
5  PARENT2 CHILD5
6  PARENT2 CHILD6
7  PARENT3 CHILD4
8  PARENT3 CHILD5
9  PARENT3 CHILD6
10 PARENT4 CHILD7
11 PARENT4 CHILD8

我尝试了如下操作,但无法进一步进行并受到打击,我正在尝试进行交叉连接

data %>% mutate(seq=row_number())

child <- data %>% filter(stringr::str_detect(all_names,'CHILD'))
parent <- data %>% filter(stringr::str_detect(all_names,'PARENT'))

parent %>% cross_join(child) %>% filter(seq.x <= seq.y)
sql r
1个回答
0
投票

在示例中,输出中有 PARENT4,但输入中没有 PARENT4。假设输出中的 PARENT4 应该是 PARENT7。

这里的关键是使用

consecutive_id
来形成游程。

library(dplyr)

tmp <- dat %>%
   mutate(row = row_number(), 
          runs = consecutive_id(grepl("PARENT", all_names)),
          p = runs %% 2)

p <- tmp %>% filter(p == 1)
ch <- tmp %>% filter(p == 0) %>% mutate(runs = runs - 1)
p %>%
  inner_join(ch, "runs", relationship = "many-to-many") %>%
  select(parent = all_names.x,  child = all_names.y)

给予

    parent  child
1  PARENT1 CHILD1
2  PARENT1 CHILD2
3  PARENT1 CHILD3
4  PARENT2 CHILD4
5  PARENT2 CHILD5
6  PARENT2 CHILD6
7  PARENT3 CHILD4
8  PARENT3 CHILD5
9  PARENT3 CHILD6
10 PARENT7 CHILD7
11 PARENT7 CHILD8
© www.soinside.com 2019 - 2024. All rights reserved.