如果向量包含5个元素中的3个，则在dplyr中返回该ID == TRUE的全部

Question

我正在尝试获取包含至少五个元素c（2,3,4,5,6）中的三个元素的所有ID，以便对该ID的每一行都返回TRUE，而对其他ID则返回false。

id <- c(1,1,2,2,3,3,3,3)
time <- c(4,6,4,5,4,5,6,7)
df1 <- data.frame(id,time)

解决方案

solution <-c(FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE)
df_w_sol <- data.frame(df1,solution)

我正在尝试以下组合：

df1 %>%
  group_by(id) %>%
  mutate(INCLUDE = any(2:6 %in% time))

但是奋斗是“ 5个中的至少3个”部分，我认为其中应包括n>部分。

Answer 1

您可以使用sum来计数匹配的值的数量：

library(dplyr)
df1 %>% group_by(id) %>% mutate(solution = sum(2:6 %in% time) >= 3)

#    id  time solution
#  <dbl> <dbl> <lgl>   
#1     1     4 FALSE   
#2     1     6 FALSE   
#3     2     4 FALSE   
#4     2     5 FALSE   
#5     3     4 TRUE    
#6     3     5 TRUE    
#7     3     6 TRUE    
#8     3     7 TRUE

基数R中的等价物

transform(df1, solution = ave(time, id, FUN = function(x)  sum(2:6 %in% x)) >= 3)

和data.table

library(data.table)
setDT(df1)[, solution := sum(2:6 %in% time) >= 3, id]

Answer 2

一个选项可能是：

df1 %>%
 group_by(id) %>%
 mutate(include = n_distinct(match(time, 2:6)) >= 3)

     id  time include
  <dbl> <dbl> <lgl>  
1     1     4 FALSE  
2     1     6 FALSE  
3     2     4 FALSE  
4     2     5 FALSE  
5     3     4 TRUE   
6     3     5 TRUE   
7     3     6 TRUE   
8     3     7 TRUE

Answer 3

我们可以将length和intersect一起使用

library(dplyr)
df1 %>% 
  group_by(id) %>%
  mutate(solution = length(intersect(time, 2:6))>=3)
# A tibble: 8 x 3
# Groups:   id [3]
#    id  time solution
#  <dbl> <dbl> <lgl>   
#1     1     4 FALSE   
#2     1     6 FALSE   
#3     2     4 FALSE   
#4     2     5 FALSE   
#5     3     4 TRUE    
#6     3     5 TRUE    
#7     3     6 TRUE    
#8     3     7 TRUE

或带有data.table

library(data.table)   
setDT(df1)[, solution := length(intersect(time, 2:6))>=3, id]

如果向量包含5个元素中的3个，则在dplyr中返回该ID == TRUE的全部

问题描述投票：2回答：3

3个回答

最新问题

如果向量包含5个元素中的3个，则在dplyr中返回该ID == TRUE的全部

问题描述 投票：2回答：3

3个回答

最新问题

问题描述投票：2回答：3