我有一个数据框,其中包含每个条件的点击次数。我想在数据进入时计算每个条件的这些点击的累积总和。我目前正在使用 ifelse() 函数来执行此操作。但是,对于测试的“否”部分,我想重复在前一个“是”部分中创建的值,直到出现下一个“是”。目前我正在使用 NA 创建占位符。
当 ifelse 函数的测试为“否”时,如何重复为上一个“是”创建的值,直到下一个“是”?
我做了一个最小的例子:
clicked <- round(runif(n = 20),0)
condition <- sample(c("Intervention", "Control"), size = 20, replace = T)
df <- data.frame(clicked, condition)
df %>% select(clicked, condition) %>% group_by(condition) %>%
mutate(successes.intervention = ifelse(condition == "Intervention", cumsum(clicked), NA),
N.intervention = ifelse(condition == "Intervention", 1:n(), NA),
successes.control = ifelse(condition == "Control", cumsum(clicked), NA),
N.control = ifelse(condition == "Control", 1:n(), NA)))
我希望输出看起来像这样:
clicked condition successes.intervention N.intervention successes.control N.control
<dbl> <chr> <dbl> <int> <dbl> <int>
1 0 Control 0 0 0 1
2 1 Control 0 0 1 2
3 0 Control 0 0 1 3
4 1 Intervention 1 1 1 3
5 0 Control 1 1 1 4
6 0 Intervention 1 2 1 4
7 0 Intervention 1 3 1 4
8 0 Control 1 3 1 5
9 0 Intervention 1 4 1 5
10 1 Intervention 2 5 1 5
这个怎么样?
library(dplyr)
df %>%
group_by(condition) %>%
mutate(
data.frame(
lapply(setNames(unique(df$condition), paste0("successes.", unique(df$condition))),
function(z) cumsum(condition == z & clicked > 0))
),
across(starts_with("successes"), ~ row_number() - 1L, .names = "N{sub('successes','',.col)}")
) %>%
ungroup()
# # A tibble: 20 × 6
# clicked condition successes.intervention successes.control N.intervention N.control
# <dbl> <chr> <int> <int> <int> <int>
# 1 1 Intervention 1 0 0 0
# 2 1 Intervention 2 0 1 1
# 3 0 Intervention 2 0 2 2
# 4 1 Intervention 3 0 3 3
# 5 1 Intervention 4 0 4 4
# 6 1 Control 0 1 0 0
# 7 1 Intervention 5 0 5 5
# 8 0 Intervention 5 0 6 6
# 9 1 Intervention 6 0 7 7
# 10 1 Intervention 7 0 8 8
# 11 0 Control 0 1 1 1
# 12 1 Control 0 2 2 2
# 13 1 Control 0 3 3 3
# 14 0 Control 0 3 4 4
# 15 0 Intervention 7 0 9 9
# 16 1 Control 0 4 5 5
# 17 1 Intervention 8 0 10 10
# 18 0 Control 0 4 6 6
# 19 0 Control 0 4 7 7
# 20 1 Control 0 5 8 8
演练:
lapply(..)
迭代字符串文字(动态确定)并生成 list
;当转换为 data.frame
时,mutate
将动态添加列cumsum(..)
内部,我们验证condition
是我们要总结的,然后对click
的个数进行累加求和。across
将迭代所有选定的列并返回行号(组内)减 1;它可以选择根据 .names
“glue”字符串重命名列。为此,我选择了已经创建的 successes.*
列,因为它们总是分为不同的 condition
级别。数据,以
set.seed(42)
开头以确保可重复性:
set.seed(42)
df <- data.frame(clicked = round(runif(n = 20),0),
condition = sample(c("Intervention", "Control"), size = 20, replace = T))
head(df)
# clicked condition
# 1 1 Intervention
# 2 1 Intervention
# 3 0 Intervention
# 4 1 Intervention
# 5 1 Intervention
# 6 1 Control