搜索变量范围,确定目标疾病并返回最早的疾病诊断(R)日期

问题描述 投票:1回答:1

我在R中有一个df,其中包含多列来描述icd10诊断已在研究期内分配给了某个人,并且这些诊断的日期也记录在单独的变量中:

df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005),
                Disease_code_1 = c('I802', 'G200','I802','', 'H356'),
                Disease_code_2 = c('A071','','G20','','H250'),
                Disease_code_3 = c('H250', '','','',''),
                Date_of_diagnosis_1 = c('12/06/1997','13/06/1997','14/02/2003','','18/20/2005'),
                Date_of_diagnosis_2 = c('12/06/1998','','18/09/2001','','12/07/1993'),
                Date_of_diagnosis_3 = c('17/09/2010','','','',''))

    ID Disease_code_1 Disease_code_2 Disease_code_3 Date_of_disease_1 Date_of_disease_2 Date_of_disease_3
1 1001           I802           A071           H250        12/06/1997        12/06/1998        17/09/2010
2 1002           G200                                      13/06/1997                                    
3 1003           I802            G20                       14/02/2003        18/09/2001                  
4 1004                                                                                                   
5 1005           H356           H250                       18/20/2005        12/07/1993                  

我想在Disease_code_ *变量中进行搜索,如果已为某人分配了codes_of_interest = c("H250", "H356")中指定的任何所关注的疾病代码,则返回[1,除了最早的日期是记录感兴趣的代码。理想情况下,我的df如下所示:

df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005),
                Disease_of_interest = c('1','0','0','0','1'),
                Date_of_disease_interest = c('17/09/2010','','','','12/07/1993'),
                Disease_code_1 = c('I802', 'G200','I802','', 'H356'),
                Disease_code_2 = c('A071','','G20','','H250'),
                Disease_code_3 = c('H250', '','','',''),
                Date_of_diagnosis_1 = c('12/06/1997','13/06/1997','14/02/2003','','18/20/2005'),
                Date_of_diagnosis_2 = c('12/06/1998','','18/09/2001','','12/07/1993'),
                Date_of_diagnosis_3 = c('17/09/2010','','','',''))

    ID Disease_of_interest Date_of_disease_interest Disease_code_1 Disease_code_2 Disease_code_3 Date_of_disease_1 Date_of_disease_2 Date_of_disease_3
1 1001                   1               17/09/2010           I802           A071           H250        12/06/1997        12/06/1998        17/09/2010
2 1002                   0                                    G200                                      13/06/1997                                    
3 1003                   0                                    I802            G20                       14/02/2003        18/09/2001                  
4 1004                   0                                                                                                                            
5 1005                   1               12/07/1993           H356           H250                       18/20/2005        12/07/1993       

我目前用于识别目标疾病代码的代码是(尽管它对诊断日期不敏感:]

dfs$Disease_of_interest<- apply(df[, -1], 1, function(x) {
  if(any(x %in% codes_of_interest))) {
    return(1)
  } else {
    return(0)
  }
}) 

非常感谢您提供的任何建议!

r search assign
1个回答
0
投票

您可以在%in%中使用apply获取找到codes_of_interest的位置,然后在mapply中使用该位置获取Datemin。如果重新调整了日期,则找到code_of_interest,如果未找到,则返回NA

i <- apply(df[,2:4], 1, "%in%", codes_of_interest)
mapply(function(x, i) if(any(i)) min(x[i]) else NA, asplit(df[,5:7], 1), asplit(i, 2))
#[1] "17/09/2010" NA           NA           NA           "12/07/1993"
© www.soinside.com 2019 - 2024. All rights reserved.