尝试生成一个在新数据帧下组织复杂情况的循环

问题描述 投票:0回答:1

我对 R 比较陌生,正在尝试准备一个数据集以将其合并到另一个数据集中。原始数据集filtered_datum中的每一行表示一个医疗并发症。值 record_id 确定患者的 ID,以便并发症与他们相关联。 surg_complication 变量下列出了从 1 到 18 的并发症。我正在尝试生成数据帧,该数据帧基本上仅将每个 record_id 作为患者标识符记录一次,并生成表示每种并发症类型的列。例如,surg_complication 下的值 1 是手术部位感染,它将生成为 SSI,并且如果患者是否有并发症,则将标记为 0 或 1。然而,我更进一步,试图区分并发症发生在哪一侧。下面的代码尝试附加 _right 或 _left,具体取决于 complication_Laterality 变量是否等于 1 或 3(右侧)和 2 或 3(左侧)(3 代表双边,可以包含在两者中,1 是右侧,2 是左侧)。

我的目标是拥有一个新的数据框 Comp_summary_wide ,它被初始化为出现在filtered_datum中的唯一记录ID,然后为每种复杂类型生成列以及它在哪一边作为SSI_right、SSI_left等。但是,当我运行时这段代码,即使它生成了我想要的列,我也遇到了两个主要问题:

  1. 并发症没有适当标记。尽管每个记录 ID 似乎都有一个复杂性,但当我用原始数据集验证它时,错误的复杂性被标记了。例如,右侧 surg_complication = 1 的患者通常会在 SSI_right 列中填写 1,但应在 surg_complication = 5 且 (complication_ Laterality = 1 | Complication_Laterity = 3)(右)时填写“Hematoma_right”列。侧),将被标记为 1。
  2. 此外,当许多患者有不止一种并发症时,该语法仅在每一行返回一种类型的并发症

我制作了一个模仿更大的真实filtered_datum 文件的数据框。它包含已使用的变量,以及代表数据集中未使用变量的另一个 Other_surg_compl 变量。但这与代码无关。

感谢您的帮助!抱歉我的经验不足!

library(DT)
library(tidymodels)
library(ggpubr)
library(lubridate)  # tools for date/time work
library(pROC) # calculate log regression ROC curve
library(oddsratio)
library(lmtest)
library(broom)
library(broom.mixed)
library(survival)
library(survminer)
library(matrixStats)
library(arsenal)
library(glmnet)
library(knitr)
library(lme4)
library(lmerTest)
library(tidyverse)
library(rstatix)
library(glmmLasso)
library(rio)
library(hrbrthemes)
library(viridis)
library(plot3D)
library(ggrepel)
library(plotly)

library(plot3Drgl)
library(rgl)
library(magick)
library(fmsb)

library(tidyquant)
library(scales)
library(corrr)
library(showtext)
library(ragg)
library(dplyr)
#remotes::install_github("ngreifer/MatchIt", dependencies = TRUE, force = TRUE)
#devtools::install_github("ngreifer/MatchIt")
#devtools::install_github("lme4",user="lme4")
library(MatchIt)
library(reshape2)
library(extrafont)
library(ggforce)
library(afex)
library(performance)
library(ggpmisc)
library(lubridate)  # tools for date/time work
library(epitools)   
library(rmarkdown)
library(knitr)
library(pROC) # calculate log regression ROC curve
library(oddsratio)
library(car)
library(lmtest)
library(matrixStats)
library(glmnet)
library(lme4)
library(lmerTest)
library(glmmLasso)
library(tidyverse)
library(finalfit)
complications_list <- c("SSI", "DelayHeal", "Seroma", "Hematoma", "FatNecr", "FlapLoss", "HerniaBulg", "FascialDehis", "SBO", "PE", "DVT", "SkinNec", "ImplExtrInfec",
                   "ImplRupt", "Other", "InfectedMesh", "RecurrHern", "FlapLossVasc")

filtered_datum <- data.frame(
  record_id = c("1", "1", "2", "2", "2", "2", "3", "4"),
  surg_complication = c("SSI", "DelayHeal", "Seroma", "Seroma", "FatNec", "Hematoma", "FascialDehic", "Hematoma"),
  complication_laterality = c("1", "2", "2", "2", "1", "1", "1", "2"),
  other_surg_compl = c("thrombosis", "vein thrombosis", "twisting of internal mammary", "Thrombosis of vein", "arterial thrombosis", "thrombosed IMA", "loss of artery", "vein thrombosis"),
  repeat_instance = c(1, 2, 1, 2, 3, 4, 1, 1)
)

data.table::setDT(filtered_datum)
Comp_summary_wide <- unique(filtered_datum[, c("record_id"), with = FALSE])

data.table::setDT(Comp_summary_wide)
for (complication in complications_list) {
    comp_time_var <- paste0("time_to_", complication)
    
    occurred_rows_right <- which(filtered_datum$surg_complication == complication & ((filtered_datum$complication_laterality == 1) | (filtered_datum$complication_laterality == 3)))
    occurred_rows_left <- which(filtered_datum$surg_complication == complication & ((filtered_datum$complication_laterality == 2) | (filtered_datum$complication_laterality == 3)))
    
    right_rows <- which((filtered_datum$complication_laterality == 1) | (filtered_datum$complication_laterality == 3))
    left_rows <- which((filtered_datum$complication_laterality == 2) | (filtered_datum$complication_laterality == 3))
    
    for (i in 1:nrow(Comp_summary_wide)) {
      
       matching_rows <- which(filtered_datum$record_id == Comp_summary_wide$record_id[i])

        if (complication > 0) {
            if (i %in% occurred_rows_right) {
                Comp_summary_wide[i, (paste0(complication, "_right"))] <- 1
               
            } 
        }
    
   
        if (complication > 0) {   
             if (i %in% occurred_rows_left) {
                Comp_summary_wide[i, (paste0(complication, "_left"))] <- 1
             } 
         
        }
    }
}

r for-loop data-cleaning
1个回答
0
投票

# get all unique complications
complications <- unique(c(df$surg_complication, df$other_surg_compl))

# create a vector of column names for the left and right side of each complication
side_cols <- c(paste0(complications, "_left"), paste0(complications, "_right"))

# create a dataframe/tibble with the columns from side_cols
all_complication_columns <- setNames(data.frame(matrix(ncol = length(side_cols), nrow = 0)), side_cols)

df %>%
  # convert complication_laterality to left/right
  mutate(complication_laterality = ifelse(complication_laterality == "1", "left", "right")) %>%

  # turn the surg_complication, other_surg_compl into one complication column
  pivot_longer(cols = c(surg_complication, other_surg_compl), values_to = "complication") %>%
  select(-name, -repeat_instance) %>%
  distinct() %>%
  pivot_wider(id_cols = record_id, names_from = c(complication, complication_laterality), values_from = complication, values_fn = length, values_fill = 0) %>%
  full_join(all_complication_columns) %>%
  
  # sort columns in alphabetical order except for record_id
  select(record_id, sort(names(.)[-1])) %>%
  # replace NAs with 0s
  mutate(across(-record_id, ~as.integer(replace_na(., 0))))
© www.soinside.com 2019 - 2024. All rights reserved.