我有一个这样的数据集:
structure(list(study_id = structure(c("P005", "P005", "P005",
"P008", "P008", "P008", "P021", "P021", "P021", "P028", "P028",
"P028", "P032", "P032", "P032", "P036", "P036", "P036", "P037",
"P037", "P037", "P049", "P049", "P049", "P053", "P053", "P053",
"P069", "P069", "P069", "P079", "P079", "P079", "P089", "P089",
"P089", "P093", "P093", "P093", "P096", "P096", "P096", "P104",
"P104", "P104", "P105", "P105", "P105"), label = "ISMART Study ID", format.stata = "%9s"),
phase = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("Baseline", "Midterm",
"Final"), class = "factor"), selfeff1 = structure(c(3L, 3L,
3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, NA, 3L,
3L, 3L, 3L, 3L, 3L, NA, 3L, 3L, 3L, 3L, 3L, 3L, 3L, NA, 3L,
2L), levels = c("Not confident", "Somewhat confident", "Very confident"
), class = "factor"), selfeff3 = structure(c(3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 2L, 3L, 3L, 3L,
3L, 3L, NA, 3L, 2L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, NA, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L
), levels = c("Not confident", "Somewhat confident", "Very confident"
), class = "factor")), class = "data.frame", row.names = c(NA,
-48L))
这是枢轴长格式数据集。每个 Study_id 都有三行,分别是基线值、中期值和最终值。现在我想使用结转/结转方法来估算缺失值。但由于它们是重复测量,我也想应用这样的规则:
如果缺少基线,但有中期:结转(即用中期替换基线);
如果他们错过了期中考试,但有期末考试:结转(即用期末考试代替期中考试)
如果他们错过了期末考试,但有期中考试:结转(即用期中考试代替期末考试)
如果它们同时缺少基线和最终结果,则结转并返回期中(即用期中替换两者)。
我尝试编写一个函数来实现这一目标,因为在我的真实数据集中,我有 selfeff1-13。代码是这样的:
impute_values <- function(x, phase) {
# Carryback: Replace baseline with midterm if baseline is missing but midterm is available
if (phase == "Baseline" & is.na(x) & phase == "Midterm" & !is.na(x)) {
x <- na.locf(x)
}
# Carryback: Replace midterm with final if midterm is missing but final is available
# Carryforward: Replace final with midterm if final is missing but midterm is available
else if (phase == "Midterm" & is.na(x) & phase == "Final" & !is.na(x[3])) {
x <- na.locf(x)
} else if (phase == "Midterm" & !is.na(x) & phase == "Final" & is.na(x[3])) {
x <- na.locf(x, option = "nocb")
}
# For the case where both baseline and final are missing but midterm is available,
# we can simply carry forward the missing values from midterm
else if (phase == "Baseline" & is.na(x) & phase == "Final" & is.na(x) & phase == "Midterm" & !is.na(x)) {
x <- na.locf(x)
}
return(x)
}
但是当我尝试用一个变量测试这个函数时:比如 selfeff1,我使用代码:
df2<-df%>%
mutate(selfeff1=impute_values(selfeff1, phase))
summary(is.na(df2$selfeff1)
我收到错误消息: if(```)NULL 出错,条件长度>1
有人可以帮助我展示如何修复它并使其适用于我的情况吗?非常感谢!
可能有特定原因导致您想要对实际数据使用循环,但是对于您的示例,基于 vec_fill_missing() 的方法可能更实用/直接:
library(dplyr)
library(vctrs)
df <- structure(list(study_id = structure(c("P005", "P005", "P005",
"P008", "P008", "P008", "P021", "P021", "P021", "P028", "P028",
"P028", "P032", "P032", "P032", "P036", "P036", "P036", "P037",
"P037", "P037", "P049", "P049", "P049", "P053", "P053", "P053",
"P069", "P069", "P069", "P079", "P079", "P079", "P089", "P089",
"P089", "P093", "P093", "P093", "P096", "P096", "P096", "P104",
"P104", "P104", "P105", "P105", "P105"), label = "ISMART Study ID", format.stata = "%9s"),
phase = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("Baseline", "Midterm",
"Final"), class = "factor"), selfeff1 = structure(c(3L, 3L,
3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, NA, 3L,
3L, 3L, 3L, 3L, 3L, NA, 3L, 3L, 3L, 3L, 3L, 3L, 3L, NA, 3L,
2L), levels = c("Not confident", "Somewhat confident", "Very confident"
), class = "factor"), selfeff3 = structure(c(3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 2L, 3L, 3L, 3L,
3L, 3L, NA, 3L, 2L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, NA, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L
), levels = c("Not confident", "Somewhat confident", "Very confident"
), class = "factor")), class = "data.frame", row.names = c(NA,
-48L))
df2 <- df %>%
mutate(selfeff1 = vec_fill_missing(selfeff1, direction = "updown"), .by = study_id)
df2
#> study_id phase selfeff1 selfeff3
#> 1 P005 Baseline Very confident Very confident
#> 2 P005 Midterm Very confident Very confident
#> 3 P005 Final Very confident Very confident
#> 4 P008 Baseline Very confident Very confident
#> 5 P008 Midterm Very confident Very confident
#> 6 P008 Final Very confident Very confident
#> 7 P021 Baseline Somewhat confident Very confident
#> 8 P021 Midterm Very confident Very confident
#> 9 P021 Final Very confident Very confident
#> 10 P028 Baseline Somewhat confident Somewhat confident
#> 11 P028 Midterm Very confident Very confident
#> 12 P028 Final Very confident Very confident
#> 13 P032 Baseline Very confident Somewhat confident
#> 14 P032 Midterm Very confident Very confident
#> 15 P032 Final Very confident Somewhat confident
#> 16 P036 Baseline Very confident Very confident
#> 17 P036 Midterm Very confident Very confident
#> 18 P036 Final Very confident Very confident
#> 19 P037 Baseline Very confident Very confident
#> 20 P037 Midterm Very confident Very confident
#> 21 P037 Final Very confident <NA>
#> 22 P049 Baseline Very confident Very confident
#> 23 P049 Midterm Somewhat confident Somewhat confident
#> 24 P049 Final Very confident Very confident
#> 25 P053 Baseline Very confident Somewhat confident
#> 26 P053 Midterm Very confident Very confident
#> 27 P053 Final Very confident Very confident
#> 28 P069 Baseline Very confident Very confident
#> 29 P069 Midterm Very confident Very confident
#> 30 P069 Final Very confident Very confident
#> 31 P079 Baseline Very confident <NA>
#> 32 P079 Midterm Very confident Very confident
#> 33 P079 Final Very confident Very confident
#> 34 P089 Baseline Very confident Very confident
#> 35 P089 Midterm Very confident Very confident
#> 36 P089 Final Very confident Very confident
#> 37 P093 Baseline Very confident Very confident
#> 38 P093 Midterm Very confident Very confident
#> 39 P093 Final Very confident Very confident
#> 40 P096 Baseline Very confident Very confident
#> 41 P096 Midterm Very confident Very confident
#> 42 P096 Final Very confident Very confident
#> 43 P104 Baseline Very confident Very confident
#> 44 P104 Midterm Very confident Very confident
#> 45 P104 Final Very confident Very confident
#> 46 P105 Baseline Very confident Very confident
#> 47 P105 Midterm Very confident Very confident
#> 48 P105 Final Somewhat confident Somewhat confident
创建于 2024-04-24,使用 reprex v2.1.0