使用 `case_when` 和 `mutate` 在多列中搜索条件

问题描述 投票:0回答:1

我正在尝试使用

NEW
中的
case_when
功能在我的数据框 (
dplyr
) 中创建一个新列。我能够运行下面的代码,但我想知道是否有一种方法可以基于以
COL_
开头的四列创建这个新列,而不是当前的编写方式,只查看
COL_1 
。否则,我必须将每个案例重复四次(每个
COL_1
COL_2
COL_3
COL_4
各一次)。

library(dplyr)
set.seed(1)

# Make sample data
data <- data.frame(STRATUM_ID = c(rep("C19", 5), rep("C20", 15), rep("C21", 4)),
                   COL_1 = sample(c(rep("X", 3), rep("T", 2), rep("Y", 7), rep("Z", 5), rep("D", 5), rep("G", 2)), 24, replace = T),
                   COL_2 = sample(c(rep("T", 4), rep("G", 6), rep("Y", 3), rep("C", 2), rep("W", 6), rep("R", 3)), 24, replace = T),
                   COL_3 = sample(c(rep("G", 1), rep("F", 5), rep("D", 3), rep("Z", 7), rep("C", 3), rep("E", 5)), 24, replace = T),
                   COL_4 = sample(c(rep("E", 7), rep("G", 2), rep("Y", 7), rep("D", 5), rep("V", 1), rep("U", 2)), 24, replace = T))

# Create new column based on COL columns
data <- data %>% mutate(NEW = case_when(
  STRATUM_ID == "C20" & COL_1 == "X" ~ "Class_A",
  STRATUM_ID == "C20" & COL_1 %in% c("C", "D", "E") ~ "Class_B",
  STRATUM_ID == "C20" & COL_1 %in% c("U", "V", "W", "Y") ~ "Class_C",
  STRATUM_ID == "C20" & COL_1 == "T" ~ "Class_D",
  STRATUM_ID == "C20" & COL_1 %in% c("G", "Z", "R") ~ "Class_E",
  STRATUM_ID == "C20" & COL_1 == "F" ~ "Class_F",
  STRATUM_ID == "C20" & is.na(data$COL_1) ~ "Unknown",
  TRUE ~ STRATUM_ID
))

我尝试使用以下方法来解决所有

COL
列的查找问题:

data <- data %>% mutate(test = case_when(
  STRATUM_ID == "C20" & grep("COL", colnames(data)) %in% c("C", "D", "E") ~ "CLASS_B"))
data <- data %>% mutate(test = case_when(
  STRATUM_ID == "C20" & vars(starts_with("COL")) %in% c("C", "D", "E") ~ "CLASS_B"))

请原谅我,因为我正在使用的数据集实际上要大得多,并且我已尽力简化此处的问题。

r dataframe dplyr
1个回答
0
投票

这就是你想要做的吗?

library(dplyr)

data %>% mutate(NEW = case_when(
  STRATUM_ID == "C20" & if_any(starts_with("COL"), ~ . == "X") ~ "Class_A",
  STRATUM_ID == "C20" & if_any(starts_with("COL"), ~ . %in% c("C", "D", "E"))  ~ "Class_B",
  STRATUM_ID == "C20" & if_any(starts_with("COL"), ~ . %in% c("U", "V", "W", "Y")) ~ "Class_C",
  STRATUM_ID == "C20" & if_any(starts_with("COL"), ~ . == "T") ~ "Class_D",
  STRATUM_ID == "C20" & if_any(starts_with("COL"), ~ . %in% c("G", "Z", "R")) ~ "Class_E",
  STRATUM_ID == "C20" & if_any(starts_with("COL"), ~ . == "F") ~ "Class_F",
  STRATUM_ID == "C20" & is.na(data$COL_1) ~ "Unknown",
  TRUE ~ STRATUM_ID
))

我不是 100% 确定您会为每个“COL”重复什么,但根据您的尝试,这看起来像是您正在尝试做的事情。

© www.soinside.com 2019 - 2024. All rights reserved.