如何根据两列(ID 和季度)删除或删除重复项? R工作室

问题描述 投票:0回答:1

我正在使用 R Studio。我的数据看起来像这样

dataframe2 = structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
                                   2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), Quarterly = c(2006.1, 2006.1, 
                                                                                  2006.1, 2006.2, 2006.3, 2006.4, 2006.1, 2006.2, 2006.3, 2006.3, 
                                                                                  2006.4, 2006.1, 2006.1, 2006.1, 2006.2, 2006.2, 2006.3, 2006.4
                                   ), Status = c("Employed", "Employed", "Employed", "Employed", 
                                                 "Null", "Employed", "Employed", "Employed", "Employed", "Employed", 
                                                 "Employed", "Null", "Null", "Employed", "Employed", "Employed", 
                                                 "Employed", "Employed")), class = "data.frame", row.names = c(NA, 
                                                                                                               -18L))
ID   Quarterly    Status 
1    2006.1     Employed 
1    2006.1     Employed 
1    2006.1     Employed 
1    2006.2     Employed 
1    2006.3     Null 
1    2006.4     Employed 
2    2006.1     Employed 
2    2006.2     Employed 
2    2006.3     Employed 
2    2006.3     Employed 
2    2006.4     Employed 
3    2006.1     Null 
3    2006.1     Null 
3    2006.1     Employed 
3    2006.2     Employed 
3    2006.2     Employed 
3    2006.3     Employed 
3    2006.4     Employed 

我希望它看起来像这样,这样每个 ID 就只有一个观察结果

ID   Quarterly    Status 
1    2006.1.    Employed 
1    2006.2     Employed 
1    2006.3     Null 
1    2006.4     Employed 
2    2006.1     Employed 
2    2006.2     Employed 
2    2006.3     Employed 
2    2006.4     Employed 
3    2006.1     Null 
3    2006.2     Employed 
3    2006.3     Employed 
3    2006.4     Employed 

我尝试了在该网站上找到的几个选项,但没有一个能按预期工作。

我做了: A)。

group_by(ID,Quarterly) %>% filer(n()>1)

我尝试过b)。

group_by(ID,Quarterly) %>%
distinct(ID, keep.all = TRUE) 

选项 (b) 只是删除了所有季度日期,只留下 2023.1(我的最新日期)

r filter duplicates drop
1个回答
0
投票

使用

dplyr
做到这一点的方法是使用 summarise

result <- dataframe2  |> summarise(.by = c("ID", "Quarterly"), first(Status))

为了简单起见,我首先使用该功能通过 ID 和 Quaterly 来汇总组

© www.soinside.com 2019 - 2024. All rights reserved.