我正在尝试比较两个数据框中的列,以提取出现在两个数据框中的项目。具体来说:
df1:
state group species
1 CA 2 cat, dog, chicken, mouse
2 CA 1 cat
3 NV 1 dog, chicken
4 NV 2 chicken
5 WA 1 chicken, rat, mouse, lion
6 WA 2 dog, cat
7 WA 3 dog, chicken
8 WA 4 cat, chicken
df2:
state special_species
1 CA cat
2 CA chicken
3 CA mouse
4 WA cat
5 WA chicken
6 NV dog
我有兴趣确定df1中存在df2中的哪个“ special_species”。我想要一个具有状态和特殊种类的新数据框。我认为这应该是join,group_by和summary的组合,但我似乎无法使其正常工作。
我们可以用,
和separate_rows
将'df1'中的'种类'列分开,然后进行连接
library(tidyr)
library(dplyr)
separate_rows(df1, species) %>%
select(-group) %>%
distinct %>%
intersect(set_names(df2, c('state', 'species')))
# state species
#1 CA cat
#2 CA chicken
#3 CA mouse
#4 NV dog
#5 WA chicken
#6 WA cat
或执行inner_join
separate_rows(df1, species) %>%
select(-group) %>%
distinct %>%
inner_join(df2, by = c('state', 'species' = 'special_species'))
df1 <- structure(list(state = c("CA", "CA", "NV", "NV", "WA", "WA",
"WA", "WA"), group = c(2L, 1L, 1L, 2L, 1L, 2L, 3L, 4L), species = c("cat, dog, chicken, mouse",
"cat", "dog, chicken", "chicken", "chicken, rat, mouse, lion",
"dog, cat", "dog, chicken", "cat, chicken")),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))
df2 <- structure(list(state = c("CA", "CA", "CA", "WA", "WA", "NV"),
special_species = c("cat", "chicken", "mouse", "cat", "chicken",
"dog")), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6"))