使用 na.locf 在数据透视长格式数据集中插补具有多个时间点的数据集

问题描述 投票:0回答:1

我有一个这样的数据集:

structure(list(study_id = structure(c("P005", "P005", "P005",
"P008", "P008", "P008", "P021", "P021", "P021", "P028", "P028",
"P028", "P032", "P032", "P032", "P036", "P036", "P036", "P037",
"P037", "P037", "P049", "P049", "P049", "P053", "P053", "P053",
"P069", "P069", "P069", "P079", "P079", "P079", "P089", "P089",
"P089", "P093", "P093", "P093", "P096", "P096", "P096", "P104",
"P104", "P104", "P105", "P105", "P105"), label = "ISMART Study ID", format.stata = "%9s"),
    phase = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("Baseline", "Midterm",
    "Final"), class = "factor"), selfeff1 = structure(c(3L, 3L,
    3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
    3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, NA, 3L,
    3L, 3L, 3L, 3L, 3L, NA, 3L, 3L, 3L, 3L, 3L, 3L, 3L, NA, 3L,
    2L), levels = c("Not confident", "Somewhat confident", "Very confident"
    ), class = "factor"), selfeff3 = structure(c(3L, 3L, 3L,
    3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 2L, 3L, 3L, 3L,
    3L, 3L, NA, 3L, 2L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, NA, 3L, 3L,
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L
    ), levels = c("Not confident", "Somewhat confident", "Very confident"
    ), class = "factor")), class = "data.frame", row.names = c(NA,
-48L))

这是枢轴长格式数据集。每个 Study_id 都有三行,分别是基线值、中期值和最终值。现在我想使用结转/结转方法来估算缺失值。但由于它们是重复测量,我也想应用这样的规则:

如果缺少基线,但有中期:结转(即用中期替换基线);

如果他们错过了期中考试,但有期末考试:结转(即用期末考试代替期中考试)

如果他们错过了期末考试,但有期中考试:结转(即用期中考试代替期末考试)

如果它们同时缺少基线和最终结果,则结转并返回期中(即用期中替换两者)。

我尝试编写一个函数来实现这一目标,因为在我的真实数据集中,我有 selfeff1-13。代码是这样的:

impute_values <- function(x, phase) {
  # Carryback: Replace baseline with midterm if baseline is missing but midterm is available
  if (phase == "Baseline" & is.na(x) & phase == "Midterm" & !is.na(x)) {
    x <- na.locf(x)
  }
  # Carryback: Replace midterm with final if midterm is missing but final is available
  # Carryforward: Replace final with midterm if final is missing but midterm is available
  else if (phase == "Midterm" & is.na(x) & phase == "Final" & !is.na(x[3])) {
    x <- na.locf(x)
  } else if (phase == "Midterm" & !is.na(x) & phase == "Final" & is.na(x[3])) {
    x <- na.locf(x, option = "nocb")
  }
  # For the case where both baseline and final are missing but midterm is available,
  # we can simply carry forward the missing values from midterm
  else if (phase == "Baseline" & is.na(x) & phase == "Final" & is.na(x) & phase == "Midterm" & !is.na(x)) {
    x <- na.locf(x)
  }
  return(x)
}

但是当我尝试用一个变量测试这个函数时:比如 selfeff1,我使用代码:

df2<-df%>%
mutate(selfeff1=impute_values(selfeff1, phase))

summary(is.na(df2$selfeff1)


我收到错误消息: if(```)NULL 出错,条件长度>1

有人可以帮助我展示如何修复它并使其适用于我的情况吗?非常感谢!

r
1个回答
0
投票

可能有特定原因导致您想要对实际数据使用循环,但是对于您的示例,基于 vec_fill_missing() 的方法可能更实用/直接:

library(dplyr)
library(vctrs)

df <- structure(list(study_id = structure(c("P005", "P005", "P005",
                                      "P008", "P008", "P008", "P021", "P021", "P021", "P028", "P028",
                                      "P028", "P032", "P032", "P032", "P036", "P036", "P036", "P037",
                                      "P037", "P037", "P049", "P049", "P049", "P053", "P053", "P053",
                                      "P069", "P069", "P069", "P079", "P079", "P079", "P089", "P089",
                                      "P089", "P093", "P093", "P093", "P096", "P096", "P096", "P104",
                                      "P104", "P104", "P105", "P105", "P105"), label = "ISMART Study ID", format.stata = "%9s"),
               phase = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
                                   2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
                                   2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
                                   2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("Baseline", "Midterm",
                                                                               "Final"), class = "factor"), selfeff1 = structure(c(3L, 3L,
                                                                                                                                   3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
                                                                                                                                   3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, NA, 3L,
                                                                                                                                   3L, 3L, 3L, 3L, 3L, NA, 3L, 3L, 3L, 3L, 3L, 3L, 3L, NA, 3L,
                                                                                                                                   2L), levels = c("Not confident", "Somewhat confident", "Very confident"
                                                                                                                                   ), class = "factor"), selfeff3 = structure(c(3L, 3L, 3L,
                                                                                                                                                                                3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 2L, 3L, 3L, 3L,
                                                                                                                                                                                3L, 3L, NA, 3L, 2L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, NA, 3L, 3L,
                                                                                                                                                                                3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L
                                                                                                                                   ), levels = c("Not confident", "Somewhat confident", "Very confident"
                                                                                                                                   ), class = "factor")), class = "data.frame", row.names = c(NA,
                                                                                                                                                                                              -48L))

df2 <- df %>%
  mutate(selfeff1 = vec_fill_missing(selfeff1, direction = "updown"), .by = study_id)

df2
#>    study_id    phase           selfeff1           selfeff3
#> 1      P005 Baseline     Very confident     Very confident
#> 2      P005  Midterm     Very confident     Very confident
#> 3      P005    Final     Very confident     Very confident
#> 4      P008 Baseline     Very confident     Very confident
#> 5      P008  Midterm     Very confident     Very confident
#> 6      P008    Final     Very confident     Very confident
#> 7      P021 Baseline Somewhat confident     Very confident
#> 8      P021  Midterm     Very confident     Very confident
#> 9      P021    Final     Very confident     Very confident
#> 10     P028 Baseline Somewhat confident Somewhat confident
#> 11     P028  Midterm     Very confident     Very confident
#> 12     P028    Final     Very confident     Very confident
#> 13     P032 Baseline     Very confident Somewhat confident
#> 14     P032  Midterm     Very confident     Very confident
#> 15     P032    Final     Very confident Somewhat confident
#> 16     P036 Baseline     Very confident     Very confident
#> 17     P036  Midterm     Very confident     Very confident
#> 18     P036    Final     Very confident     Very confident
#> 19     P037 Baseline     Very confident     Very confident
#> 20     P037  Midterm     Very confident     Very confident
#> 21     P037    Final     Very confident               <NA>
#> 22     P049 Baseline     Very confident     Very confident
#> 23     P049  Midterm Somewhat confident Somewhat confident
#> 24     P049    Final     Very confident     Very confident
#> 25     P053 Baseline     Very confident Somewhat confident
#> 26     P053  Midterm     Very confident     Very confident
#> 27     P053    Final     Very confident     Very confident
#> 28     P069 Baseline     Very confident     Very confident
#> 29     P069  Midterm     Very confident     Very confident
#> 30     P069    Final     Very confident     Very confident
#> 31     P079 Baseline     Very confident               <NA>
#> 32     P079  Midterm     Very confident     Very confident
#> 33     P079    Final     Very confident     Very confident
#> 34     P089 Baseline     Very confident     Very confident
#> 35     P089  Midterm     Very confident     Very confident
#> 36     P089    Final     Very confident     Very confident
#> 37     P093 Baseline     Very confident     Very confident
#> 38     P093  Midterm     Very confident     Very confident
#> 39     P093    Final     Very confident     Very confident
#> 40     P096 Baseline     Very confident     Very confident
#> 41     P096  Midterm     Very confident     Very confident
#> 42     P096    Final     Very confident     Very confident
#> 43     P104 Baseline     Very confident     Very confident
#> 44     P104  Midterm     Very confident     Very confident
#> 45     P104    Final     Very confident     Very confident
#> 46     P105 Baseline     Very confident     Very confident
#> 47     P105  Midterm     Very confident     Very confident
#> 48     P105    Final Somewhat confident Somewhat confident

创建于 2024-04-24,使用 reprex v2.1.0

© www.soinside.com 2019 - 2024. All rights reserved.