我正在尝试使用
NEW
中的 case_when
功能在我的数据框 (dplyr
) 中创建一个新列。我能够运行下面的代码,但我想知道是否有一种方法可以基于以 COL_
开头的四列创建这个新列,而不是当前的编写方式,只查看 COL_1
。否则,我必须将每个案例重复四次(每个COL_1
、COL_2
、COL_3
和COL_4
各一次)。
library(dplyr)
set.seed(1)
# Make sample data
data <- data.frame(STRATUM_ID = c(rep("C19", 5), rep("C20", 15), rep("C21", 4)),
COL_1 = sample(c(rep("X", 3), rep("T", 2), rep("Y", 7), rep("Z", 5), rep("D", 5), rep("G", 2)), 24, replace = T),
COL_2 = sample(c(rep("T", 4), rep("G", 6), rep("Y", 3), rep("C", 2), rep("W", 6), rep("R", 3)), 24, replace = T),
COL_3 = sample(c(rep("G", 1), rep("F", 5), rep("D", 3), rep("Z", 7), rep("C", 3), rep("E", 5)), 24, replace = T),
COL_4 = sample(c(rep("E", 7), rep("G", 2), rep("Y", 7), rep("D", 5), rep("V", 1), rep("U", 2)), 24, replace = T))
# Create new column based on COL columns
data <- data %>% mutate(NEW = case_when(
STRATUM_ID == "C20" & COL_1 == "X" ~ "Class_A",
STRATUM_ID == "C20" & COL_1 %in% c("C", "D", "E") ~ "Class_B",
STRATUM_ID == "C20" & COL_1 %in% c("U", "V", "W", "Y") ~ "Class_C",
STRATUM_ID == "C20" & COL_1 == "T" ~ "Class_D",
STRATUM_ID == "C20" & COL_1 %in% c("G", "Z", "R") ~ "Class_E",
STRATUM_ID == "C20" & COL_1 == "F" ~ "Class_F",
STRATUM_ID == "C20" & is.na(data$COL_1) ~ "Unknown",
TRUE ~ STRATUM_ID
))
我尝试使用以下方法来解决所有
COL
列的查找问题:
data <- data %>% mutate(test = case_when(
STRATUM_ID == "C20" & grep("COL", colnames(data)) %in% c("C", "D", "E") ~ "CLASS_B"))
data <- data %>% mutate(test = case_when(
STRATUM_ID == "C20" & vars(starts_with("COL")) %in% c("C", "D", "E") ~ "CLASS_B"))
请原谅我,因为我正在使用的数据集实际上要大得多,并且我已尽力简化此处的问题。
这就是你想要做的吗?
library(dplyr)
data %>% mutate(NEW = case_when(
STRATUM_ID == "C20" & if_any(starts_with("COL"), ~ . == "X") ~ "Class_A",
STRATUM_ID == "C20" & if_any(starts_with("COL"), ~ . %in% c("C", "D", "E")) ~ "Class_B",
STRATUM_ID == "C20" & if_any(starts_with("COL"), ~ . %in% c("U", "V", "W", "Y")) ~ "Class_C",
STRATUM_ID == "C20" & if_any(starts_with("COL"), ~ . == "T") ~ "Class_D",
STRATUM_ID == "C20" & if_any(starts_with("COL"), ~ . %in% c("G", "Z", "R")) ~ "Class_E",
STRATUM_ID == "C20" & if_any(starts_with("COL"), ~ . == "F") ~ "Class_F",
STRATUM_ID == "C20" & is.na(data$COL_1) ~ "Unknown",
TRUE ~ STRATUM_ID
))
我不是 100% 确定您会为每个“COL”重复什么,但根据您的尝试,这看起来像是您正在尝试做的事情。