使用 R 中的开始和结束日期计算活跃爆发的计数

问题描述 投票:0回答:1

我有一个数据集,用于查看设施中呼吸道疾病的爆发情况。疫情爆发有开始和结束日期,并表明是否存在 COVID-19、流感或 RSV,可能存在多种病原体,我将其称为混合病原体。从通知日期到声明日期,疫情被视为活跃。我的最终目标是绘制从最早通知日期到今天按病原体划分的每天活跃疫情爆发的数量。我在尝试计算每天活跃爆发总数而不仅仅是每天新爆发时遇到问题。

这是我当前的代码

ari_test <- ari_data %>%
  select(record_id, notification_date, declaration_date, c_cov_present, c_flu_present, c_rsv_present) |> 
  mutate(notification_date = as.Date(notification_date),
         declaration_date = as.Date(declaration_date)) %>%
  filter(!is.na(notification_date) & !is.na(declaration_date)) %>%
  # Generate a sequence of dates from notification_date to declaration_date for each facility
  rowwise() %>%
  mutate(date = list(seq(notification_date, declaration_date, by = "day"))) %>%
  unnest(date) %>%
  select(-notification_date, -declaration_date) %>%
  # Count the number of active outbreaks per day for each pathogen
  group_by(date) %>%
  summarise(active_covid = sum(c_cov_present == 1 & is.na(c_flu_present) & is.na(c_rsv_present)),
            active_influenza = sum(is.na(c_cov_present) & c_flu_present == 1 & is.na(c_rsv_present)),
            active_rsv = sum(is.na(c_cov_present) & is.na(c_flu_present) & c_rsv_present == 1),
            active_mixed = sum(rowSums(cbind(c_cov_present, c_flu_present, c_rsv_present), na.rm = TRUE) >= 2))

但这仅计算一次爆发,应将其在通知日期和声明日期之间活跃的每一天都计算在内。

我也尝试过这个,但我得到一个错误,说无法找到 record_id,即使 record_id 肯定在数据框中。

ari_test <- ari_data %>%
  mutate(notification_date = as.Date(notification_date),
         declaration_date = as.Date(declaration_date)) %>%
  filter(!is.na(notification_date) & !is.na(declaration_date)) %>%
  mutate(across(starts_with("c_"), ~if_else(is.na(.), 0, 1))) %>%  # Convert NA to 0 for presence/absence
  group_by(record_id) %>%
  mutate(active_covid = +(any(c_cov_present == 1 & is.na(c_flu_present) & is.na(c_rsv_present))),
         active_influenza = +(any(is.na(c_cov_present) & c_flu_present == 1 & is.na(c_rsv_present))),
         active_rsv = +(any(is.na(c_cov_present) & is.na(c_flu_present) & c_rsv_present == 1)),
         active_mixed = +(any(rowSums(select(., starts_with("c_"))) >= 2))) %>%
  complete(record_id, date = seq.Date(min(notification_date), max(declaration_date), by = "day"), fill = list(active_covid = 0, active_influenza = 0, active_rsv = 0, active_mixed = 0)) %>%
  ungroup()

这是给您的一些示例数据

structure(list(record_id = c(1, 2, 5, 6, 7, 8, 10, 11, 12, 13
), notification_date = structure(c(19523, 19524, 19535, 19535, 
19535, 19535, 19536, 19536, 19542, 19542), class = "Date"), declaration_date = structure(c(19544, 
19537, 19548, 19559, 19542, 19555, 19548, 19549, 19550, 19569
), class = "Date"), c_cov_present = c(1, 1, 1, 1, 0, 1, 1, 1, 
1, 0), c_flu_present = c(1, 0, 0, 0, 
0, 0, 0, 0, 0, 1), 
    c_rsv_present = c(0, 0, 0, 0, 1, 0, 0, 1, 0, 0)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

非常感谢一些帮助。谢谢!

r data-manipulation
1个回答
0
投票
# Libraries and data.

library(tidyverse)

ari_data <- structure(list(record_id = c(1, 2, 5, 6, 7, 8, 10, 11, 12, 13
), notification_date = structure(c(19523, 19524, 19535, 19535, 
19535, 19535, 19536, 19536, 19542, 19542), class = "Date"), declaration_date = structure(c(19544, 
19537, 19548, 19559, 19542, 19555, 19548, 19549, 19550, 19569
), class = "Date"), c_cov_present = c(1, 1, 1, 1, 0, 1, 1, 1, 
1, 0), c_flu_present = c(1, 0, 0, 0, 
0, 0, 0, 0, 0, 1), 
    c_rsv_present = c(0, 0, 0, 0, 1, 0, 0, 1, 0, 0)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

# Get the long data of illness counts for each day.

days_data <- ari_data %>%
  rowwise() %>%
  do(data.frame(record_id = .$record_id, 
                date = seq(.$notification_date, .$declaration_date, by = "days"),
                c_cov_present = .$c_cov_present,
                c_flu_present = .$c_flu_present,
                c_rsv_present = .$c_rsv_present)) %>%
  ungroup()

# Summarise the counts for plotting.

daily_counts <- days_data %>%
  group_by(date) %>%
  summarise(flu_count = sum(c_flu_present),
            covid_count = sum(c_cov_present),
            rsv_count = sum(c_rsv_present))

# Plot counts over time.

ggplot(daily_counts) +
  geom_line(aes(x = date, y = flu_count, colour = "Flu"), size = 1.5) +
  geom_line(aes(x = date, y = covid_count, colour = "COVID"), size = 1.5) +
  geom_line(aes(x = date, y = rsv_count, colour = "RSV"), size = 1.5) +
  theme_bw() +
  labs(x = "Date", 
       y = "Count of Cases")

对于输出:

© www.soinside.com 2019 - 2024. All rights reserved.