按列中的文本分组,在两个数据框中查找公用条目

问题描述 投票:0回答:1

我正在尝试比较两个数据框中的列,以提取出现在两个数据框中的项目。具体来说:

df1:
 state group species
1 CA 2 cat, dog, chicken, mouse
2 CA 1 cat
3 NV 1 dog, chicken
4 NV 2 chicken
5 WA 1 chicken, rat, mouse, lion
6 WA 2 dog, cat
7 WA 3 dog, chicken
8 WA 4 cat, chicken

df2:
 state special_species
1 CA cat
2 CA chicken
3 CA mouse
4 WA cat
5 WA chicken
6 NV dog

我有兴趣确定df1中存在df2中的哪个“ special_species”。我想要一个具有状态和特殊种类的新数据框。我认为这应该是join,group_by和summary的组合,但我似乎无法使其正常工作。

r group-by dplyr summarize r-faq
1个回答
1
投票

我们可以用,separate_rows将'df1'中的'种类'列分开,然后进行连接

library(tidyr)
library(dplyr)
separate_rows(df1, species) %>%
    select(-group) %>%
    distinct %>%
    intersect(set_names(df2, c('state', 'species')))
#  state species
#1    CA     cat
#2    CA chicken
#3    CA   mouse
#4    NV     dog
#5    WA chicken
#6    WA     cat

或执行inner_join

separate_rows(df1, species) %>%
   select(-group) %>%
   distinct %>%
    inner_join(df2, by = c('state', 'species' = 'special_species'))

数据

df1 <- structure(list(state = c("CA", "CA", "NV", "NV", "WA", "WA", 
"WA", "WA"), group = c(2L, 1L, 1L, 2L, 1L, 2L, 3L, 4L), species = c("cat, dog, chicken, mouse", 
"cat", "dog, chicken", "chicken", "chicken, rat, mouse, lion", 
"dog, cat", "dog, chicken", "cat, chicken")), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))

df2 <- structure(list(state = c("CA", "CA", "CA", "WA", "WA", "NV"), 
    special_species = c("cat", "chicken", "mouse", "cat", "chicken", 
    "dog")), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6"))
© www.soinside.com 2019 - 2024. All rights reserved.