我有一个包含两列的数据集:“start_date”和“end_date”。这些列的日期范围为 7 到 40 天。然后,我有大约 50 列,标记为从 2021-01-01 到 2021-12-22 的日期,间隔为 16 天。我想创建能够找到哪些标有日期的列位于开始日期和结束日期之间的代码。然后我想在数据帧的新列中返回这些匹配列的平均值,然后删除所有标有日期的列。
这是我创建的数据集:
rm(list = ls())
library(lubridate)
# Create date columns with 16-day intervals (ensure correct order here!)
date_cols <- as.character(seq(ymd("2021-01-01"), ymd("2022-12-30"), by = "16 days")) # Use as.character
# Determine the desired number of rows
num_rows <- 100 # Adjust this if needed
# Create an empty data frame with all columns (no changes here)
df <- data.frame(matrix(ncol = length(date_cols) + 2, nrow = num_rows))
colnames(df) <- c("start_date", "end_date", date_cols) # Assign column names for all columns
# Fill the start_date and end_date columns (no changes here)
start_date <- sample(seq(ymd("2021-01-01"), ymd("2022-12-30"), by = "day"), num_rows)
end_date <- start_date + sample(7:40, num_rows, replace = TRUE)
df$start_date <- start_date
df$end_date <- end_date
# Fill the date columns with random numbers (no changes here)
df[, date_cols] <- runif(num_rows * length(date_cols), min = 1, max = 2)
# Print the dataset
head(df)
提前谢谢您。
作为参考,带日期的列表示为各个点(行)提取的栅格。我想匹配正确日期的数据。
library(tidyverse)
df %>%
pivot_longer(!c(start_date, end_date), names_to = "dates") %>%
type_convert() %>%
summarise(across(value, ~ mean(.x[between(dates, start_date, end_date)])), .by = c(start_date, end_date))
# A tibble: 100 x 3
start_date end_date value
<date> <date> <dbl>
1 2021-06-19 2021-06-27 1.04
2 2021-09-10 2021-10-18 1.44
3 2021-05-22 2021-06-07 1.46
4 2021-04-03 2021-04-28 1.65
5 2022-04-11 2022-05-03 1.76
6 2022-06-21 2022-07-11 1.55
7 2022-09-20 2022-10-06 1.01
8 2021-10-07 2021-11-01 1.71
9 2022-02-07 2022-03-11 1.33
10 2022-01-11 2022-02-15 1.55
# i 90 more rows
# i Use `print(n = ...)` to see more rows