我有一些纵向患者数据,其中有一列描述患者当前或曾经是吸烟者。如果患者后来被登记为从未吸烟过,我只想回填缺失值。我不能简单地使用 tiydr::fill,因为它不允许对值进行歧视。 鉴于下面的示例,我希望将
id==1
的 ´NA´ 替换为
never_smoker
,而 id==2
应保持不变,因为我们无法准确推断患者何时开始吸烟。df <- tibble::tribble(
~id, ~visit, ~smoking,
1, 1, NA,
1, 2, NA,
1, 3, "never_smoker",
2, 1, NA,
2, 2, NA,
2, 3, "current_smoker"
)
应该导致
expected_result <- tibble::tribble(
~id, ~visit, ~smoking,
1, 1, "never_smoker",
1, 2, "never_smoker",
1, 3, "never_smoker",
2, 1, NA,
2, 2, NA,
2, 3, "current_smoker"
)
我想出了这个解决方案,似乎可行,但需要将色谱柱反转两次。我希望一定有更好的方法来做到这一点?
df %>%
group_by(id) %>%
mutate(smoking = rev(accumulate(rev(smoking), ~ ifelse(is.na(.y) & .x == "never_smoker", "never_smoker", .y))))
!(smoking != 'never_smoker' | is.na(smoking))
,那么如果条目是“从不吸烟”,您将获得
TRUE
,否则您将获得FALSE
。如果反转这个向量并进行累积和,然后反转 that结果以将其放回原始顺序,那么“从不吸烟”期间或之前出现的任何值都将大于 0。这允许简单的
ifelse
如果条目为正,则将 smoking
列标记为“从不吸烟者”,否则保持原样。library(dplyr)
df %>%
mutate(smoking = ifelse(rev(cumsum(
rev(!(smoking != 'never_smoker' | is.na(smoking))))) > 0,
'never smoker', smoking), .by = 'id')
#> # A tibble: 6 x 3
#> id visit smoking
#> <dbl> <dbl> <chr>
#> 1 1 1 never smoker
#> 2 1 2 never smoker
#> 3 1 3 never smoker
#> 4 2 1 NA
#> 5 2 2 NA
#> 6 2 3 current_smoker
which.max()
来识别是否/何时
"never_smoker"
发生:library(dplyr)
df %>%
group_by(id) %>%
mutate(smoking = if_else(
row_number() < which.max(!is.na(smoking) & smoking == "never_smoker"),
"never_smoker",
smoking
))
结果:
# A tibble: 6 × 3
# Groups: id [2]
id visit smoking
<dbl> <dbl> <chr>
1 1 1 never_smoker
2 1 2 never_smoker
3 1 3 never_smoker
4 2 1 <NA>
5 2 2 <NA>
6 2 3 current_smoker