我正在使用
mutate
和很多 ifelse
条件制作一个大型数据框。我的方法是不在 mutate 中命名列,因为我有数百个这样的条件,每次更新一个条件时,我都必须更新它们。相反,我希望在 mutate
之外的操作之后命名列。
这里有一些代码概述了我想要做的事情
df <- data.frame(a = rnorm(20, 100, 1), b = rnorm(20, 100, 1), c = rnorm(20, 100, 1) )
df2 <- df %>%
mutate(# condition 1
ifelse((lag(a, 1) - lag(c, 1)) < (lag(a, 2) - lag(b, 2)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 3) - lag(b, 3)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 4) - lag(b, 4)), 1, 0),
# condition 2
ifelse((lag(a, 1) - lag(c, 1)) < (lag(a, 2) - lag(b, 2)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 3) - lag(b, 3)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 4) - lag(b, 4)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 5) - lag(b, 5)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 6) - lag(b, 6)), 1, 0),
# condition 3
ifelse(a < b, 1, 0),
.keep = 'none'
)
c_names <- paste('df', rep(1:ncol(df2), 1), sep = '')
colnames(df2) <- c_names
问题是
mutate
正在截断长 ifelse
条件 #condition 1
和 #condition 2
的列名称,并将它们集中在一起作为 ifelse(...)
,所以我最终只有 2 列而不是 3 列。
我可以采取什么措施来防止这种行为,或者采取更有效的方法来实现我想要做的事情。我希望避免每次需要更新 df 时为每个条件手动输入数百个列名称。
您可以使用唯一/随机的列名称,例如 UUID:
library(dplyr)
set.seed(123)
df <- data.frame(a = rnorm(20, 100, 1), b = rnorm(20, 100, 1), c = rnorm(20, 100, 1))
df2 <- df %>%
mutate(# condition 1
"{uuid::UUIDgenerate()}" :=
ifelse((lag(a, 1) - lag(c, 1)) < (lag(a, 2) - lag(b, 2)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 3) - lag(b, 3)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 4) - lag(b, 4)), 1, 0),
# condition 2
"{uuid::UUIDgenerate()}" :=
ifelse((lag(a, 1) - lag(c, 1)) < (lag(a, 2) - lag(b, 2)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 3) - lag(b, 3)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 4) - lag(b, 4)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 5) - lag(b, 5)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 6) - lag(b, 6)), 1, 0),
# condition 3
"{uuid::UUIDgenerate()}" :=
ifelse(a < b, 1, 0),
.keep = 'none'
)
str(df2)
#> 'data.frame': 20 obs. of 3 variables:
#> $ 2175b2b7-511f-471a-94d5-d82116b12137: num NA NA NA 0 1 1 0 0 1 1 ...
#> $ 07e353a6-58b9-4c50-9c08-2b7c742cf28b: num NA NA NA 0 NA NA 0 0 1 1 ...
#> $ a4fb004b-f498-4da0-b60b-1fbf872670a5: num 0 1 0 0 0 0 1 1 0 1 ...
c_names <- paste('df', rep(1:ncol(df2), 1), sep = '')
colnames(df2) <- c_names
str(df2)
#> 'data.frame': 20 obs. of 3 variables:
#> $ df1: num NA NA NA 0 1 1 0 0 1 1 ...
#> $ df2: num NA NA NA 0 NA NA 0 0 1 1 ...
#> $ df3: num 0 1 0 0 0 0 1 1 0 1 ...
创建于 2024-01-30,使用 reprex v2.0.2