library(tidyverse)
library(magrittr)
df <- data.frame(year = c(1977:1981), set852 = c(1,1,0,0,0), set857=c(0,0,1,1,0), set874=c(0,0,0,1,1))
对于每个变量 set852、set857 等(在真实数据集中,这是一个很长的列表),我想创建一个变量来指示时间序列是否有变化(值将是“开始”、“结束”和“不用找了”)。附加变量应如下所示:
df_final <- data.frame(year = c(1977:1981), c852 = c("start","end","no change","no change","no change"), c857=c("no change","no change","start","end","no change"), c874=c("no change","no change","no change","start","end"))
我在 tidyverse 中尝试使用 for-loop、mutate、paste 和 case_when:
set_num <- as.integer(str_extract(colnames(df), "[0-9]+"))
for (i in 2:nrow(df))
{
df %<>% mutate(paste0("c", set_num[[i]]) = case_when(paste("set", set_num[[i]], sep="")==1 & year == 1977 ~ "start",
paste("set", set_num[[i]], sep="")==1 & lag(paste("set", set_num[[i]], sep=""))==0 ~ "start",
paste("set", set_num[[i]], sep="")==1 & lead(paste("set", set_num[[i]], sep=""))==0 ~ "end",
TRUE~"no change"))
}
但是,mutate之后的paste-function不会被识别为函数,而是被识别为以“paste0(”c”....等等”开头的变量的名称。如何获取注册paste0的代码-函数作为函数而不是字符串?
代替 for 循环,您可以使用
dplyr::across
来实现所需的结果,如下所示:
library(dplyr, warn = FALSE)
df <- data.frame(
year = c(1977:1981),
set852 = c(1, 1, 0, 0, 0),
set857 = c(0, 0, 1, 1, 0),
set874 = c(0, 0, 0, 1, 1)
)
myfun <- function(.x, year) {
case_when(
.x == 1 & year == 1977 ~ "start",
.x == 1 & lag(.x) == 0 ~ "start",
.x == 1 & lead(.x) == 0 ~ "end",
.default = "no change"
)
}
set_cols <- grep("\\d+$", names(df), value = TRUE)
df |>
mutate(
across(all_of(set_cols), ~ myfun(.x, year),
.names = "{gsub('^.*?(\\\\d+)$', 'c\\\\1', .col)}"
)
) |>
select(-all_of(set_cols))
#> year c852 c857 c874
#> 1 1977 start no change no change
#> 2 1978 end no change no change
#> 3 1979 no change start no change
#> 4 1980 no change end start
#> 5 1981 no change no change no change