左连接并保留唯一值

问题描述 投票:0回答:2

我有以下 df :

t1<-structure(list(Country = c("France", "Spain", "England")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -3L))

t2<-structure(list(Country = c("France", "France", "France", "France", 
"France", "France", "Spain", "Spain", "Spain", "Spain", "Spain", 
"Spain", "Spain", "England", "England", "England", "England", 
"England", "England", "England", "England"), League = c("Ligue 1", 
"Ligue 1", "Ligue 1", "Ligue 2", "Ligue 2", "Ligue 2", "Liga", 
"Liga", "Liga", "Liga 2", "Liga 2", "Liga 2", "Liga 2", "PL1", 
"PL1", "PL1", "PL1", "PL2", "PL2", "PL2", "PL2"), ID = c("EUR", 
"EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", 
"EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", 
"EUR", "EUR")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-21L))

我想根据国家将两者分开,以便我拥有每个国家的每个联赛。我无法在 left_join() 中使用 multiple="any" 参数,因为每个国家/地区有多个联赛。 所需的输出将是这样的:

# A tibble: 6 × 2
  Country League 
  <chr>   <chr>  
1 France  Ligue 1
2 France  Ligue 2
3 Spain   Liga   
4 Spain   Liga 2 
5 England PL1    
6 England PL2

我可以通过执行以下操作获得此结果,但它会引发警告消息。另外,我的印象是,这可以使用 dplyr 以更 R 友好的方式完成。

dtl<- dtl %>% 
  left_join(select(t2,Country,League),by=c("Country")) %>% 
  dplyr::distinct(Country,League,.keep_all = T)

谢谢您的帮助。

r dataframe join dplyr left-join
2个回答
2
投票

正如 Darren 的评论 所示,问题是

t2
中的重复行。删除这些,你就得到了想要的结果:

library(dplyr)

left_join(t1, distinct(t2), by = "Country") %>% 
  select(-ID)

# Output:
# A tibble: 6 × 2
  Country League 
  <chr>   <chr>  
1 France  Ligue 1
2 France  Ligue 2
3 Spain   Liga   
4 Spain   Liga 2 
5 England PL1    
6 England PL2  

0
投票

您可以使用 dplyr 的 group_by 和汇总函数来组合每个国家/地区的联赛,从而获得所需的输出。这是在没有警告消息的情况下实现结果的代码:

library(dplyr)

t1 <- structure(list(Country = c("France", "Spain", "England")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L))
t2 <- structure(list(Country = c("France", "France", "France", "France", "France", "France", "Spain", "Spain", "Spain", "Spain", "Spain", "Spain", "Spain", "England", "England", "England", "England", "England", "England", "England", "England"), League = c("Ligue 1", "Ligue 1", "Ligue 1", "Ligue 2", "Ligue 2", "Ligue 2", "Liga", "Liga", "Liga", "Liga 2", "Liga 2", "Liga 2", "Liga 2", "PL1", "PL1", "PL1", "PL1", "PL2", "PL2", "PL2", "PL2"), ID = c("EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -21L))

result <- t1 %>% 
  left_join(t2, by = "Country") %>% 
  group_by(Country) %>% 
  summarize(League = toString(unique(League)))

print(result)

输出:

# A tibble: 3 × 2
  Country League                      
  <chr>   <chr>                       
1 England PL1, PL2                    
2 France  Ligue 1, Ligue 2            
3 Spain   Liga, Liga 2
© www.soinside.com 2019 - 2024. All rights reserved.