使用多个条件过滤数据集

问题描述 投票:0回答:1

我有这个数据集,其中包含患者 ID、诊断日期、诊断、trt_date 和药物代码。

# Create the dataset
data <- data.frame(
  patient_id = c(1, 1, 1, 1, 5, 5, 7, 7),
  diagnosis_date = as.Date(c("1/9/10", "1/9/10", "1/9/10", "1/9/10", "1/11/10", "1/11/10", "1/9/10", "1/9/10"), format = "%m/%d/%y"),
  diagnosis = c("breast cancer", "breast cancer", "breast cancer", "breast cancer", "breast cancer", "breast cancer", "breast cancer", "breast cancer"),
  trt_date = as.Date(c("1/20/10", "1/20/10", "1/21/10", "1/21/10", "1/29/10", "1/30/10", "1/25/10", "1/26/10"), format = "%m/%d/%y"),
  drug_code = c("A", "B", "A", "A", "B", "A", "A", "A")
)

# Print the dataset
print(data)

过滤条件如下:

第一次观察,患者 ID 1,首次诊断日期为 2010 年 1 月 20 日,接受药物 A,但也在同一首次诊断日期接受药物 B(第二线观察),(如果患者第一次接受药物 B,则相同诊断日期以及药物 A 在第二线观察的相同首次诊断日期)这称为药物 A 和 B 的一线组合治疗,我们需要过滤掉这些类型的观察结果。因此患者 ID 1 将被过滤掉,

患者 ID 5,首次诊断日期为 1/29/10,仅接受药物 B,这称为药物 B 的一线单一疗法,

患者 ID 7 的首次诊断日期为 1/25/10,并且仅接受药物 A,这称为药物 A 的一线单一疗法。

因此,我想要的观察结果是只有药物 A 的一线单药治疗和药物 B 的一线单药治疗,而不是药物 A 和 B 的一线组合治疗。

因此所需的输出将如下所示:

patient_id    diagnosis_date     diagnosis          trt_date    drug_code
5             2010-01-11         breast cancer      2010-01-29  B
5             2010-01-11         breast cancer      2010-01-30  A
7             2010-01-09         breast cancer      2010-01-25  A
7             2010-01-09         breast cancer      2010-01-26  A
7             2010-01-09         breast cancer      2010-01-27  C

我将不胜感激所有的帮助。谢谢!!!

r filter conditional-statements
1个回答
0
投票

dplyr
解决方案:

library(dplyr)

data |> group_by(patient_id) |> 
  slice_min(trt_date) |> 
  filter(n() == 1) |> 
  ungroup() |> 
  left_join(data, by = c("patient_id"), suffix = c(".x", "")) |> 
  select(patient_id, !contains('.x'))

结果:

# A tibble: 5 × 5
  patient_id diagnosis_date diagnosis     trt_date   drug_code
       <dbl> <date>         <chr>         <date>     <chr>    
1          5 2010-01-11     breast cancer 2010-01-29 B        
2          5 2010-01-11     breast cancer 2010-01-30 A        
3          7 2010-01-09     breast cancer 2010-01-25 A        
4          7 2010-01-09     breast cancer 2010-01-26 A        
5          7 2010-01-09     breast cancer 2010-01-27 C   

使用数据:

> dput(data)
structure(list(patient_id = c(1, 1, 1, 1, 5, 5, 7, 7, 7), diagnosis_date = structure(c(14618, 
14618, 14618, 14618, 14620, 14620, 14618, 14618, 14618), class = "Date"), 
    diagnosis = c("breast cancer", "breast cancer", "breast cancer", 
    "breast cancer", "breast cancer", "breast cancer", "breast cancer", 
    "breast cancer", "breast cancer"), trt_date = structure(c(14629, 
    14629, 14630, 14630, 14638, 14639, 14634, 14635, 14636), class = "Date"), 
    drug_code = c("A", "B", "A", "A", "B", "A", "A", "A", "C"
    )), class = "data.frame", row.names = c(NA, -9L))
© www.soinside.com 2019 - 2024. All rights reserved.