使用最后一行连接两个数据框

问题描述 投票:0回答:2

我们的气象站每周记录每日天气数据(约 7 行/观测)。我们每周收集一次疾病数据(每周一次观察/行)。如何将

weather_df
的最后一行与
disease_df
连接起来,同时保持其他单元格空白?我尝试过使用 left_join,但它错误地将
disease_df
中的一个值添加到一周中的所有天,而不是在周末记录疾病数据。

可重现的示例

weather_df <- structure(list(week = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("1", "2"), class = "factor"), 
    date = structure(c(1401062400, 1401148800, 1401235200, 1401321600, 
    1401408000, 1401494400, 1401580800, 1401667200, 1402272000, 
    1402358400, 1402444800, 1402531200, 1402617600, 1402704000, 
    1402790400, 1402876800), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    rainfall = c(0.8, 0, 1.4, 3, 0, 1, 0, 0, 3, 0, 2.4, 1.2, 
    0, 0, 0, 0), temperature = c(23.6, 21.9, 22.6, 20.1, 21.9, 
    20.3, 17.3, 15.5, 23.1, 22.4, 21.1, 20.4, 21.2, 21.5, 20.2, 
    20.4)), row.names = c(NA, -16L), class = c("tbl_df", "tbl", 
"data.frame"))


disease_df <- structure(list(week = structure(1:2, levels = c("1", "2"), class = "factor"), 
    disease_intensity = c(0.4, 0.3)), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame"))



combine_df <- left_join(weather_df, disease_df, by = "week")

这是输出

enter image description here

如您所见,第 1 周的所有日期都添加了 0.4,第 2 周的所有日期都添加了 0.3。我只想将这些添加到两周的最后几天,同时将其他单元格保留为空白。

r dataframe join dplyr merge
2个回答
2
投票

您可以将

disease_df
weather_df
与“最后一场比赛”合并,并将结果连接回
weather_df

library(dplyr)

left_join(disease_df, weather_df, by = "week", multiple = "last") %>%
  left_join(weather_df, .)

另一种选择:

flag
中创建一个
weather_df
列,指示每周的最后一天,然后合并到
disease_df

weather_df %>%
  mutate(flag = row_number() == which.max(date), .by = week) %>%
  left_join(mutate(disease_df, flag = TRUE)) %>%
  select(-flag)
输出
# Joining with `by = join_by(week, date, rainfall, temperature)`
# # A tibble: 16 × 5
#    week  date                rainfall temperature disease_intensity
#    <fct> <dttm>                 <dbl>       <dbl>             <dbl>
#  1 1     2014-05-26 00:00:00      0.8        23.6              NA  
#  2 1     2014-05-27 00:00:00      0          21.9              NA  
#  3 1     2014-05-28 00:00:00      1.4        22.6              NA  
#  4 1     2014-05-29 00:00:00      3          20.1              NA  
#  5 1     2014-05-30 00:00:00      0          21.9              NA  
#  6 1     2014-05-31 00:00:00      1          20.3              NA  
#  7 1     2014-06-01 00:00:00      0          17.3              NA  
#  8 1     2014-06-02 00:00:00      0          15.5               0.4
#  9 2     2014-06-09 00:00:00      3          23.1              NA  
# 10 2     2014-06-10 00:00:00      0          22.4              NA  
# 11 2     2014-06-11 00:00:00      2.4        21.1              NA  
# 12 2     2014-06-12 00:00:00      1.2        20.4              NA  
# 13 2     2014-06-13 00:00:00      0          21.2              NA  
# 14 2     2014-06-14 00:00:00      0          21.5              NA  
# 15 2     2014-06-15 00:00:00      0          20.2              NA  
# 16 2     2014-06-16 00:00:00      0          20.4               0.3

1
投票

您可以使用许多连接技术,但在这种情况下,增强连接标准更容易。我添加了两个条件,一周中的某一天和该天的累计计数,因为您的周计数包括一周中的同一天。

从那里开始,常规的左连接就起作用了

library(tidyverse)

weather_augmented_tbl <- weather_df |> 
  group_by(week) |> 
  mutate(
    wday=wday(date)
    ,n_wday=cumsum(if_else(wday==2,1,0))
  )

disease_augmented_tbl <- disease_df |> 
  mutate(
    wday=2
    ,n_wday=2
  )

left_join(
  weather_augmented_tbl
  ,disease_augmented_tbl
  ,by=join_by(
    week,wday,n_wday
  )
)
© www.soinside.com 2019 - 2024. All rights reserved.