如何保持值只出现在一个组中?

问题描述 投票:0回答:4

我正在研究数据框,如:

groups .  values
a .        1
a .        1
a          2
b .        2
b .        3
b .        3
c .        4
c .        5
c .        6
d .        6
d .        7
d .        2

问题是把它变成这样的东西:

groups .  values
a .        1
a .        1
b .        3
b .        3
c .        4
c .        5
d .        7

我想保留其值仅出现在一个组中的行。例如,值2被删除,因为它出现在三个不同的组中,但值1保留,尽管它在ONLY ONE组中出现两次。

dplyr包中是否有任何函数可以处理这个问题?或者我必须写自己的功能?

r data-manipulation
4个回答
1
投票

正如您要求的dplyr解决方案:

df %>% group_by(values) %>% filter(n_distinct(groups) == 1)
# # A tibble: 7 x 2
# # Groups:   values [5]
# groups values
# <chr>   <int>
#1 a           1
#2 a           1
#3 b           3
#4 b           3
#5 c           4
#6 c           5
#7 d           7

df <- structure(list(groups = c("a", "a", "a", "b", "b", "b", "c", "c", "c", "d", "d", "d"),
                     values = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 5L, 6L, 6L, 7L, 2L)),
                row.names = c(NA, -12L), class = "data.frame")

0
投票

values分组,看看groups列是否只有一个元素。这可以用ave完成。

i <- as.logical(with(df1, ave(as.numeric(groups), values, FUN = function(x) length(unique(x)) == 1)))
df1[i, ]
#   groups values
#1       a      1
#2       a      1
#5       b      3
#6       b      3
#7       c      4
#8       c      5
#11      d      7

dput格式的数据。

df1 <-
structure(list(groups = structure(c(1L, 1L, 1L, 2L, 
2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("a", "b", 
"c", "d"), class = "factor"), values = c(1L, 1L, 2L, 
2L, 3L, 3L, 4L, 5L, 6L, 6L, 7L, 2L)), 
class = "data.frame", row.names = c(NA, -12L))

0
投票
x[x$values %in% names(which(colSums(table(x)>0)==1)),]

哪里

x = structure(list(groups = c("a", "a", "a", "b", "b", "b", "c", 
  "c", "c", "d", "d", "d"), values = c(1L, 1L, 2L, 2L, 3L, 3L, 
    4L, 5L, 6L, 6L, 7L, 2L)), row.names = c(NA, -12L), class = "data.frame")

或者,data.table解决方案:

setDT(x)[, .SD[uniqueN(groups)==1], values]

0
投票

使用sqldf包为您的原始数据框架df

library(sqldf)
result <- sqldf("SELECT * FROM df
                 WHERE `values` IN (
                     SELECT `values` from (
                         SELECT `values`, groups, count(*) as num from df
                         GROUP BY `values`, groups) t
                      GROUP BY `values` 
                      HAVING COUNT(1) = 1
                 )")   
© www.soinside.com 2019 - 2024. All rights reserved.