我有一个纵向收集的样本数据集,它进一步分为三个时间间隔。我正在做纵向分析,但我也在单独查看时间间隔。有些 ID 在每个时间间隔捐赠了多个样本,我想删除重复的 ID。但是,我想在每个时间间隔内保留最接近平均时间点的样本。
关于如何解决这个问题有什么建议吗?
我的数据集看起来像这样:
df <- data.frame(ID = c(1, 2, 3, 4, 5, 2, 3, 4, 5, 4, 3),
TimePoint = c(10, 12, 13, 10, 12, 11, 15, 14, 13, 12, 13),
TimeInterval = c("T1", "T1", "T1", "T1", "T1", "T1", "T2", "T2", "T2", "T1", "T2"))
我尝试过
unique()
和distinct()
,但他们保留了他们遇到的第一个ID
df |>
transform(meanTimePoint = ave(TimePoint, TimeInterval, FUN = mean)) |>
transform(timeDist = abs(TimePoint - meanTimePoint)) |>
transform(useThis = ave(timeDist, list(ID, TimeInterval), FUN = function(z) seq_along(z) == which.min(z))) |>
subset(useThis > 0)
# ID TimePoint TimeInterval meanTimePoint timeDist useThis
# 1 1 10 T1 11.42857 1.4285714 1
# 3 3 13 T1 11.42857 1.5714286 1
# 5 5 12 T1 11.42857 0.5714286 1
# 6 2 11 T1 11.42857 0.4285714 1
# 8 4 14 T2 13.75000 0.2500000 1
# 9 5 13 T2 13.75000 0.7500000 1
# 10 4 12 T1 11.42857 0.5714286 1
# 11 3 13 T2 13.75000 0.7500000 1
library(dplyr)
df |>
mutate(
.by = TimeInterval,
meanTimePoint = mean(TimePoint),
timeDist = abs(TimePoint - meanTimePoint)
) |>
slice_min(n = 1, order_by = timeDist, by = c(ID, TimeInterval))
# ID TimePoint TimeInterval meanTimePoint timeDist
# 1 1 10 T1 11.42857 1.4285714
# 2 2 11 T1 11.42857 0.4285714
# 3 3 13 T1 11.42857 1.5714286
# 4 4 12 T1 11.42857 0.5714286
# 5 5 12 T1 11.42857 0.5714286
# 6 3 13 T2 13.75000 0.7500000
# 7 4 14 T2 13.75000 0.2500000
# 8 5 13 T2 13.75000 0.7500000