如何只保留唯一的行而忽略列？

Question

如果我有这些数据：

df1 <- data.frame(name = c("apple", "apple", "apple", "orange", "orange"),
       ID = c(1, 2, 3, 4, 5),
       is_fruit = c("yes", "yes", "yes", "yes", "yes"))

并且我只想保留唯一的行，但忽略

ID

列，以便输出如下所示：

df2 <- data.frame(name = c("apple", "orange"),
       ID = c(1, 4),
       is_fruit = c("yes", "yes"))

df2
#    name ID is_fruit
#1  apple  1      yes
#2 orange  4      yes

我该如何做到这一点，最好使用

dplyr

？

Answer 1

您可以使用

distinct

功能；通过显式指定变量，您可以仅根据这些列保留唯一的行；还有来自

?distinct

：

如果给定的输入组合有多行，则仅保留第一行

distinct(df1, name, is_fruit, .keep_all = T)
#    name ID is_fruit
#1  apple  1      yes
#2 orange  4      yes

Answer 2

基础R

df1[!duplicated(df1[!names(df1) %in% c("ID")]),]
#    name ID is_fruit
#1  apple  1      yes
#4 orange  4      yes

将

c("ID")

替换为您要忽略的列的名称

Answer 3

你可以使用dplyr的distinct和cross

df1 |> distinct(across(ID))

如何只保留唯一的行而忽略列？

问题描述投票：0回答：3

3个回答

最新问题

如何只保留唯一的行而忽略列？

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3