我有两个数据框。第一个数据帧只有一列,第二个数据帧有7列和30000行。数据框1:
coulmn1
otu_1
otu_2
otu_3
otu_4
otu_5
otu_6
otu_7
otu_8
otu_9
otu_10
第二个数据框:
otu1 otu2 Name.x Name.y
otu_1 otu_2 Gemmiger Bacteroides
OTU_3 otu_1 Bifido Gemmiger
otu_4 otu_5 Fusobacterium fags
otu_6 otu_7 Dialister gems
otu_8 otu_9 Streptococcus hen
OTU_10 OTU_6 Clostridium IV Dialister
OTU_11 OTU_16 Clostridium IV Dialister
现在,我必须将数据帧1与数据帧2中的两列(otu1和otu2)进行比较,并在Name.x和Name.y中获得它们的对应值(每个ID的名称不是唯一的)。即,可能有几个OTU ID具有相同的名称。但是otu id是唯一的
期望输出将是:
coulmn1 Name
otu_1 Gemmiger
otu_2 Bacteroides
otu_3 Bifido
otu_4 Fusobacterium
otu_5 fags
otu_6 Dialister
otu_7 gems
otu_8 Streptococcus
otu_9 hen
otu_10 Clostridium IV
otu_11 Clostridium IV
otu_16 Dialister
这是我的尝试。
pivot_longer(data = df2, cols = starts_with("Name"),
values_to = "Name") %>%
group_by(otu2) %>%
mutate(otu1 = tolower(if_else(row_number() == n(), otu2, otu1))) %>%
ungroup %>%
select(-c(otu2, name)) %>%
distinct(.keep_all = TRUE) %>%
rename(col1 = "otu1")
col1 Name
<chr> <chr>
1 otu_1 Gemmiger
2 otu_2 Bacteroides
3 otu_3 Bifido
4 otu_4 Fusobacterium
5 otu_5 fags
6 otu_6 Dialister
7 otu_7 gems
8 otu_8 Streptococcus
9 otu_9 hen
10 otu_10 Clostridium IV
11 otu_11 Clostridium IV
12 otu_16 Dialister