如何在 R 中创建多个新的数据框,派生自单个数据框并按顺序命名?

问题描述 投票:0回答:1

我有一个包含许多外科医生及其患者信息的单一数据框,用于生成 Kaplan-Meier 生存曲线和进行 Cox 比例风险模型分析。数据包括外科医生 ID(按顺序从 1 开始)、患者年龄、患者性别、状态(0 = 截尾,1 = 事件)以及索引事件(手术)和结束事件(再手术)或截尾(患者)之间的天数死了,搬走了,等等)。

我想为每个外科医生生成一个新的数据框来支持我的分析,根据外科医生的 ID 创建一个新变量(“SurgeonGroup”)——对于具有该外科医生 ID 的记录,SurgeonGroup 是“你”,或者是“其他”外科医生”的所有其他值 - 并按顺序保存新数据框(DataProvider1、DataProvider2 等),这样每个外科医生都可以在生存曲线和风险比分析中与他们的同行进行比较。例如,SurgeonGroup 变量将用于使用 coxph 函数将外科医生与其同行进行比较,如下所示:

 coxph(Surv(Days, Status) ~ PatientAge + PatientSex + SurgeonGroup, data = DataProvider1) %>%
                tbl_regression(exp = TRUE)

以下代码生成了一个只有 5 名外科医生的较小样本数据框,创建了一个简单的函数,并通过调用该函数 5 次为 5 个不同的提供者创建了 5 个不同的数据框。然而,由于我的原始数据框有更多的外科医生,为每个人写出数据框赋值/函数调用语句很笨重,并且有复制/粘贴错误的风险。

有没有简单的方法来重复这个“DataProviderX”<- MyFunction(X)" pattern for any similar dataset, producing the same number of new data frames as there are unique surgeons? I have searched for loop and apply function approaches that could be used in this case, but can't seem to make any work (iterations are not my strength in R). Any advice would be much appreciated!

这是我的可复制示例:

# Load dplyr Package

    library(dplyr)


# Create Sample Data Frame
     
    Surgeon <- c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5)
    PatientAge <- c(69,84,94,67,92,76,74,92,76,89,96,99,94,95,84,85,99,93,89,84,74,86,77,88,81,82,89,88,88,81,83,95,81,72,80,92,83,83,96,82,98,79,84,88,91,82,89,88,78,88)
    PatientSex <- c("M","F","F","F","F","F","M","M","F","F","M","M","F","F","F","F","F","M","F","F","F","M","F","M","M","F","F","F","M","M","F","M","F","M","F","M","F","M","M","M","F","M","F","F","M","F","M","F","M","F")
    Status <- c(1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0)
    Days <- c(254,450,488,798,395,667,1836,220,3401,292,52,663,656,52,3797,1097,51,234,367,1641,1402,8,546,913,1849,2171,1474,312,2139,118,572,8,1175,2634,24,36,93,2627,312,1582,220,276,1329,135,116,933,2038,76,1018,1224)

    Data <- data.frame(Surgeon, PatientAge, PatientSex, Status, Days)
     
     
# Create Function
     
    MyFunction <- function(FunctionID) {
        FunctionData <- Data %>% mutate(SurgeonGroup = case_when(Surgeon == FunctionID ~ "You",
                                                                 TRUE ~ "Other Surgeons"))
       return(FunctionData)
     }
     
     DataProvider1 <- MyFunction(1)
     DataProvider2 <- MyFunction(2)
     DataProvider3 <- MyFunction(3)
     DataProvider4 <- MyFunction(4)
     DataProvider5 <- MyFunction(5)
r dataframe function dplyr mutate
1个回答
0
投票

我会这样做:

unique_ids <- unique(Data$Surgeon)

lapply(unique_ids, function(id) {
  Data$SurgeonGroup <- ifelse(Data$Surgeon == id, "You", "OtherSurgeons")
  Data
})

你也可以把你的

coxph()
电话也放在里面。

如果您想加快函数式编程的速度(例如,大量使用

lapply()
等函数的编程),请查看 Hadley 书中的这一章:https://adv-r.hadley.nz/ fp.html

© www.soinside.com 2019 - 2024. All rights reserved.