根据一列比较两个数据帧(具有不同的长度),并在r中检索其他列

问题描述 投票:0回答:1

我有两个数据框。第一个数据帧只有一列,第二个数据帧有7列和30000行。数据框1:

coulmn1
otu_1
otu_2
otu_3
otu_4
otu_5
otu_6
otu_7
otu_8
otu_9
otu_10

第二个数据框:

    otu1    otu2    Name.x  Name.y
    otu_1   otu_2   Gemmiger    Bacteroides
    OTU_3   otu_1  Bifido   Gemmiger
    otu_4   otu_5   Fusobacterium   fags
    otu_6   otu_7   Dialister   gems
    otu_8   otu_9   Streptococcus   hen
    OTU_10  OTU_6   Clostridium IV  Dialister
    OTU_11  OTU_16  Clostridium IV  Dialister

现在,我必须将数据帧1与数据帧2中的两列(otu1和otu2)进行比较,并在Name.x和Name.y中获得它们的对应值(每个ID的名称不是唯一的)。即,可能有几个OTU ID具有相同的名称。但是otu id是唯一的

期望输出将是:

    coulmn1   Name
otu_1     Gemmiger
otu_2     Bacteroides
otu_3     Bifido
otu_4     Fusobacterium
otu_5     fags
otu_6     Dialister 
otu_7     gems
otu_8     Streptococcus
otu_9     hen
otu_10    Clostridium IV
otu_11    Clostridium IV    
otu_16    Dialister
r dataframe
1个回答
0
投票

这是我的尝试。

pivot_longer(data = df2, cols = starts_with("Name"),
             values_to = "Name") %>% 
group_by(otu2) %>% 
mutate(otu1 = tolower(if_else(row_number() == n(), otu2, otu1))) %>% 
ungroup %>% 
select(-c(otu2, name)) %>% 
distinct(.keep_all = TRUE) %>%
rename(col1 = "otu1")

    col1   Name          
   <chr>  <chr>         
 1 otu_1  Gemmiger      
 2 otu_2  Bacteroides   
 3 otu_3  Bifido        
 4 otu_4  Fusobacterium 
 5 otu_5  fags          
 6 otu_6  Dialister     
 7 otu_7  gems          
 8 otu_8  Streptococcus 
 9 otu_9  hen           
10 otu_10 Clostridium IV
11 otu_11 Clostridium IV
12 otu_16 Dialister  
© www.soinside.com 2019 - 2024. All rights reserved.