我们的气象站每周记录每日天气数据(约 7 行/观测)。我们每周收集一次疾病数据(每周一次观察/行)。如何将
weather_df
的最后一行与 disease_df
连接起来,同时保持其他单元格空白?我尝试过使用 left_join,但它错误地将 disease_df
中的一个值添加到一周中的所有天,而不是在周末记录疾病数据。
可重现的示例
weather_df <- structure(list(week = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("1", "2"), class = "factor"),
date = structure(c(1401062400, 1401148800, 1401235200, 1401321600,
1401408000, 1401494400, 1401580800, 1401667200, 1402272000,
1402358400, 1402444800, 1402531200, 1402617600, 1402704000,
1402790400, 1402876800), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
rainfall = c(0.8, 0, 1.4, 3, 0, 1, 0, 0, 3, 0, 2.4, 1.2,
0, 0, 0, 0), temperature = c(23.6, 21.9, 22.6, 20.1, 21.9,
20.3, 17.3, 15.5, 23.1, 22.4, 21.1, 20.4, 21.2, 21.5, 20.2,
20.4)), row.names = c(NA, -16L), class = c("tbl_df", "tbl",
"data.frame"))
disease_df <- structure(list(week = structure(1:2, levels = c("1", "2"), class = "factor"),
disease_intensity = c(0.4, 0.3)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"))
combine_df <- left_join(weather_df, disease_df, by = "week")
这是输出
如您所见,第 1 周的所有日期都添加了 0.4,第 2 周的所有日期都添加了 0.3。我只想将这些添加到两周的最后几天,同时将其他单元格保留为空白。
您可以将
disease_df
和 weather_df
与“最后一场比赛”合并,并将结果连接回 weather_df
。
library(dplyr)
left_join(disease_df, weather_df, by = "week", multiple = "last") %>%
left_join(weather_df, .)
另一种选择:
在
flag
中创建一个 weather_df
列,指示每周的最后一天,然后合并到 disease_df
。
weather_df %>%
mutate(flag = row_number() == which.max(date), .by = week) %>%
left_join(mutate(disease_df, flag = TRUE)) %>%
select(-flag)
# Joining with `by = join_by(week, date, rainfall, temperature)`
# # A tibble: 16 × 5
# week date rainfall temperature disease_intensity
# <fct> <dttm> <dbl> <dbl> <dbl>
# 1 1 2014-05-26 00:00:00 0.8 23.6 NA
# 2 1 2014-05-27 00:00:00 0 21.9 NA
# 3 1 2014-05-28 00:00:00 1.4 22.6 NA
# 4 1 2014-05-29 00:00:00 3 20.1 NA
# 5 1 2014-05-30 00:00:00 0 21.9 NA
# 6 1 2014-05-31 00:00:00 1 20.3 NA
# 7 1 2014-06-01 00:00:00 0 17.3 NA
# 8 1 2014-06-02 00:00:00 0 15.5 0.4
# 9 2 2014-06-09 00:00:00 3 23.1 NA
# 10 2 2014-06-10 00:00:00 0 22.4 NA
# 11 2 2014-06-11 00:00:00 2.4 21.1 NA
# 12 2 2014-06-12 00:00:00 1.2 20.4 NA
# 13 2 2014-06-13 00:00:00 0 21.2 NA
# 14 2 2014-06-14 00:00:00 0 21.5 NA
# 15 2 2014-06-15 00:00:00 0 20.2 NA
# 16 2 2014-06-16 00:00:00 0 20.4 0.3
您可以使用许多连接技术,但在这种情况下,增强连接标准更容易。我添加了两个条件,一周中的某一天和该天的累计计数,因为您的周计数包括一周中的同一天。
从那里开始,常规的左连接就起作用了
library(tidyverse)
weather_augmented_tbl <- weather_df |>
group_by(week) |>
mutate(
wday=wday(date)
,n_wday=cumsum(if_else(wday==2,1,0))
)
disease_augmented_tbl <- disease_df |>
mutate(
wday=2
,n_wday=2
)
left_join(
weather_augmented_tbl
,disease_augmented_tbl
,by=join_by(
week,wday,n_wday
)
)