我正在尝试使用 grep 来识别哪些观察结果与某个正则表达式匹配。但棘手的是,潜在的匹配分布在多个列中。
复制样本数据:
df <- matrix(nrow = 12, ncol = 20) %>% as.data.frame()
names(df) <- paste0("col", 1:20)
for (i in 1:20){
df[,i] <- month.name[sample(1:12, 12, replace = T)]
}
df$July.dummy <- 0
我一直在尝试在潜在变量上循环
grep
表达式来识别匹配项。在样本数据中,目标是识别“July”出现在 col1 到 col20 中任何一个中的所有观测值,并相应地将 df$July.dummy
更改为 1。
all.cols <- paste0("col", 1:20)
for (VAR in all.cols){
df$July.dummy[grep("July", noquote(paste0("df$", VAR)))] <- 1
}
此代码片段成功执行,每个组件似乎都“工作”......但没有任何观察结果发生改变。
我很高兴考虑其他非循环/非 grep 解决方案,但如果有人可以向我展示为什么我尝试的解决方案不起作用,以供我自己启发,我也会很感激。谢谢!
假设您使用
grep
因为您实际上需要正则表达式匹配,一种可能是:
df$July.dummy <- apply(df[-length(df)], 1, \(x) as.numeric(any(grep("July", x))))
但是,如果您不需要使用正则表达式而只想进行相等比较,您可以这样做:
df$July.dummy <- as.numeric(rowSums(df[-length(df)] == "July") > 0)
library(tidyverse)
df |>
filter(if_any(any_of(all.cols), \(x) str_detect(x, "July")))
#> col1 col2 col3 col4 col5 col6 col7 col8
#> 1 October May May March March December July March
#> 2 October November October March July December May May
#> 3 May September December August October June January April
#> 4 March August February January October February May March
#> 5 January December July October March August June October
#> 6 April March December February June February December December
#> 7 February July October July June August September July
#> 8 March February January June November January July February
#> 9 April October November January February May July December
#> 10 December February October November August April September October
#> 11 January March January August June February March July
#> col9 col10 col11 col12 col13 col14 col15 col16
#> 1 April March January May January June November January
#> 2 November November December December March July April October
#> 3 June October March February July January May November
#> 4 June December May August June July June April
#> 5 July April December May November July February August
#> 6 January March April December July July May November
#> 7 January February September July May July October July
#> 8 March March July December August November December December
#> 9 July June July September May August May November
#> 10 March November July April February December February September
#> 11 July September November November March January April February
#> col17 col18 col19 col20 July.dummy
#> 1 January August March February 0
#> 2 November May November October 0
#> 3 December May June February 0
#> 4 April March June March 0
#> 5 January February October July 0
#> 6 July July May April 0
#> 7 September July November September 0
#> 8 April November October May 0
#> 9 December April February July 0
#> 10 March September September October 0
#> 11 May February February October 0