我有一个如下所示的数据框,该代码的输出如下所示。
问题:由于在原始数据框中,可以有多列值格式类型,如 col1 和 col2 中,如何修改下面的代码以获得所需的输出?
#STEP1
df <- data.frame(
col1 = c("abc_1_102", "abc_1_103", "xyz_1_104"),
col2 = c("107", "108", "106")
)
#STEP2
split_text <- strsplit(df$col1, "_")
third_elements <- sapply(split_text, function(x) if(length(x) >= 3) x[3] else NA)
#STEP3
df$col3<-third_elements
#STEP4
selection<-c(107,102,108)
df$col4<-ifelse(df$col2 %in% selection,"SELECT","NOTSELECT")
df$col5<-ifelse(df$col3 %in% selection,"SELECT","NOTSELECT")
#STEP5
df$col6<-paste(df$col4,df$col5,sep = ",")
上述代码的输出:
col1 col2 col3 col4 col5 col6
1 abc_1_102 107 102 SELECT SELECT SELECT,SELECT
2 abc_1_103 108 103 SELECT NOTSELECT SELECT,NOTSELECT
3 xyz_1_104 106 104 NOTSELECT NOTSELECT NOTSELECT,NOTSELECT
所需输出
col1 col2 col6
1 abc_1_102 107 SELECT,SELECT
2 abc_1_103 108 SELECT,NOTSELECT
3 xyz_1_104 106 NOTSELECT,NOTSELECT
您可以通过将两个
ifelse
语句粘贴在一起来一次性完成这一切。 ifelse
的 col2
很简单。 ifelse
的 col3
使用 grepl
来搜索 select
中的任何数字。 paste(..., sep = ",")
把它们放在一起:
df$col6 <- paste(ifelse(df$col2 %in% selection, "SELECT", "NOTSELECT"),
ifelse(grepl(paste(selection, collapse = "|"), df$col1), "SELECT", "NOTSELECT"),
sep = ",")
输出:
col1 col2 col6
1 abc_1_102 107 SELECT,SELECT
2 abc_1_103 108 SELECT,NOTSELECT
3 xyz_1_104 106 NOTSELECT,NOTSELECT
df$col6 <- paste(
ifelse(df$col2 %in% selection, "SELECT", "NOTSELECT"),
strsplit(df$col1, "_") |>
sapply(`%in%`, selection) |>
colSums() |>
ifelse("SELECT", "NOTSELECT"),
sep = ",")
# col1 col2 col6
# 1 abc_1_102 107 SELECT,SELECT
# 2 abc_1_103 108 SELECT,NOTSELECT
# 3 xyz_1_104 106 NOTSELECT,NOTSELECT