我想根据分组变量中的值创建一个新的数据列。
例如,给定以下数据集,如果第一个时间点的值为“蓝色”,我想将组的所有行分配为 TRUE 值。
这是我的数据集的示例:
group <- c("A","A", "A", "A", "B","B","B","B","C", "C", "C","C")
time <- c("t1","t2","t3","t4","t1","t2","t3","t4","t1","t2","t3","t4")
color <- c("blue", "red", "green", "yellow", "yellow","green","purple", "blue", "blue", "green", "yellow","red")
first_row_blue <- c(TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE)
df <- data.frame(group, time, color, first_row_blue)
df
最后一列是我希望数据具有的内容,但我不希望对其进行硬编码。
我试过这个:
df %>%
group_by(group) %>%
mutate(all_blue = ifelse(time == "t1" & color == "blue", TRUE, FALSE))
但是,即使数据已分组,它也仅将 TRUE 值分配给第一行。 我是否缺少一个功能可以让我做我正在寻找的事情?
我还看到过一些帖子,用于执行某些操作,例如计算写入“蓝色”的次数,但是这不起作用,因为它出现在 t1 以外的时间。
预先感谢您的任何建议!
尝试
first
:
df %>%
group_by(group) %>%
mutate(x = first(color) == "blue")
library(tidyverse)
group <- c("A","A", "A", "A", "B","B","B","B","C", "C", "C","C")
time <- c("t1","t2","t3","t4","t1","t2","t3","t4","t1","t2","t3","t4")
color <- c("blue", "red", "green", "yellow", "yellow","green","purple", "blue", "blue", "green", "yellow","red")
first_row_blue <- c(TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE)
df <- data.frame(group, time, color, first_row_blue)
df %>%
group_by(group) %>%
mutate(blue2 = ifelse(time == "t1" & color == "blue", TRUE, NA)) |>
fill(blue2) %>%
mutate(blue2 = replace_na(blue2, FALSE))
#> # A tibble: 12 × 5
#> # Groups: group [3]
#> group time color first_row_blue blue2
#> <chr> <chr> <chr> <lgl> <lgl>
#> 1 A t1 blue TRUE TRUE
#> 2 A t2 red TRUE TRUE
#> 3 A t3 green TRUE TRUE
#> 4 A t4 yellow TRUE TRUE
#> 5 B t1 yellow FALSE FALSE
#> 6 B t2 green FALSE FALSE
#> 7 B t3 purple FALSE FALSE
#> 8 B t4 blue FALSE FALSE
#> 9 C t1 blue TRUE TRUE
#> 10 C t2 green TRUE TRUE
#> 11 C t3 yellow TRUE TRUE
#> 12 C t4 red TRUE TRUE
创建于 2024-05-06,使用 reprex v2.1.0