根据 R 中的条件标记数据 ID

Question

我需要对我的数据集进行有条件的修改。这是一个示例数据集。

data <- data.frame(id = c(1,1,1,1,1,1, 2,2,2, 3,3,3),
                   cat1 = c("A","A","A","B","B","B", "A","A","A", "A","A","B"),
                   levels = c("L1","L3","L4","L2","L1","L3", "L1","L2","L2", "L1","L2","L1"))

> data
   id cat1 levels
1   1    A     L1
2   1    A     L3
3   1    A     L4
4   1    B     L2
5   1    B     L1
6   1    B     L3
7   2    A     L1
8   2    A     L2
9   2    A     L2
10  3    A     L1
11  3    A     L2
12  3    B     L1

a) 对于每个

id

，如果

cat1 == "A"

有

L3

或

L4

，那么

id

应该有

cat1 == "B"

。这是主要规则。 [

Rule_satisfied

]

b) 如果

cat1 == "A"

有

L1

或

L2

，那

id

不应该有

cat1 == "B"

[

Rule_NotSatisfied

]

c) 如果

cat1 == "A"

有

L1

或

L2

，那么

id

有

cat1 == "B"

，那么这是违反规则的。 [

Rule_violation

]

如何获得如下所需的输出？

> data.1
   id cat1 levels                  label
1   1    A     L1         Rule_satisfied
2   1    A     L3         Rule_satisfied
3   1    A     L4         Rule_satisfied
4   1    B     L2         Rule_satisfied
5   1    B     L1         Rule_satisfied
6   1    B     L3         Rule_satisfied
7   2    A     L1      Rule_NotSatisfied
8   2    A     L2      Rule_NotSatisfied
9   2    A     L2      Rule_NotSatisfied
10  3    A     L1      Rule_violation
11  3    A     L2      Rule_violation
12  3    B     L1      Rule_violation

Answer 1

也许这是

dplyr::group_by

和

dplyr::case_when

的用法。

library(dplyr)
data %>%
  group_by(id) %>%
  mutate(
    label = case_when(
      any(cat1 == "A" & levels %in% c("L3", "L4")) && "B" %in% cat1 ~ "Rule_satisfied",
      any(cat1 == "A" & levels %in% c("L1", "L2")) && !"B" %in% cat1 ~ "Rule_NotSatisfied",
      any(cat1 == "A" & levels %in% c("L1", "L2")) && "B" %in% cat1 ~ "Rule_violation"
    )
  ) %>%
  ungroup()
# # A tibble: 12 × 4
#       id cat1  levels label            
#    <dbl> <chr> <chr>  <chr>            
#  1     1 A     L1     Rule_satisfied   
#  2     1 A     L3     Rule_satisfied   
#  3     1 A     L4     Rule_satisfied   
#  4     1 B     L2     Rule_satisfied   
#  5     1 B     L1     Rule_satisfied   
#  6     1 B     L3     Rule_satisfied   
#  7     2 A     L1     Rule_NotSatisfied
#  8     2 A     L2     Rule_NotSatisfied
#  9     2 A     L2     Rule_NotSatisfied
# 10     3 A     L1     Rule_violation   
# 11     3 A     L2     Rule_violation   
# 12     3 B     L1     Rule_violation

根据 R 中的条件标记数据 ID

问题描述投票：0回答：1

1个回答

最新问题

根据 R 中的条件标记数据 ID

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1