我想重新格式化我的数据,以便如果一个人在呈阳性后又进行了阴性测试,则该阳性测试将更改为阴性(被视为假阳性)。
在我的数据集中,阳性测试在测试栏中用模棱两可、不确定或阳性表示。
library(tidyverse)
library(data.table)
date=c("2023-01-01", "2023-02-07", "2023-02-20","2023-01-01", "2023-02-07", "2023-02-20", "2023-01-01", "2023-05-10", "2023-01-01", "2023-01-01", "2023-01-01", "2023-01-01", "2023-01-01", "2023-01-01", "2023-01-10", "2023-01-01", "2023-01-10")
ID=c("A", "A", "A","A2", "A2", "A2", "B", "B", "C", "D", "D", "D1", "D1", "E", "E", "F", "F")
test=c("negative", "equivocal", "negative", "negative", "indeterminate", "negative", "negative", "negative", "positive", "positive", "negative","indeterminate", "negative", "positive", "negative", "negative", "positive")
df=as.data.table(cbind(date, ID, test))
df[, date := as.Date(date)]
因此,以下突出显示的测试将全部转换为阴性,因为在同一天或在阳性测试之后有阴性测试。
df[, test2 := df[.SD,
on = .(ID, date >= date),
if (any(test == "negative")) "negative" else test,
by = .EACHI]$V1]
# date ID test test2
# <Date> <char> <char> <char>
# 1: 2023-01-01 A negative negative
# 2: 2023-02-07 A equivocal negative
# 3: 2023-02-20 A negative negative
# 4: 2023-01-01 A2 negative negative
# 5: 2023-02-07 A2 indeterminate negative
# 6: 2023-02-20 A2 negative negative
# 7: 2023-01-01 B negative negative
# 8: 2023-05-10 B negative negative
# 9: 2023-01-01 C positive positive
# 10: 2023-01-01 D positive negative
# 11: 2023-01-01 D negative negative
# 12: 2023-01-01 D1 indeterminate negative
# 13: 2023-01-01 D1 negative negative
# 14: 2023-01-01 E positive negative
# 15: 2023-01-10 E negative negative
# 16: 2023-01-01 F negative negative
# 17: 2023-01-10 F positive positive