我有一个从2008年到2020年的随机日期的数据,以及它们的对应值。
Date Val
September 16, 2012 32
September 19, 2014 33
January 05, 2008 26
June 07, 2017 02
December 15, 2019 03
May 28, 2020 18
我想把2008年1月1日至2020年3月31日的缺失日期及其对应的数值填成1。
我参考了一些帖子,比如 员额1, 员额2 而我却无法在此基础上解决这个问题。我是一个R的初学者。
我正在寻找这样的数据
Date Val
January 01, 2008 1
January 02, 2008 1
January 03, 2008 1
January 04, 2008 1
January 05, 2008 26
........
使用 tidyr::complete
:
library(dplyr)
df %>%
mutate(Date = as.Date(Date, "%B %d, %Y")) %>%
tidyr::complete(Date = seq(as.Date('2008-01-01'), as.Date('2020-03-31'),
by = 'day'), fill = list(Val = 1)) %>%
mutate(Date = format(Date, "%B %d, %Y"))
# A tibble: 4,475 x 2
# Date Val
# <chr> <dbl>
# 1 January 01, 2008 1
# 2 January 02, 2008 1
# 3 January 03, 2008 1
# 4 January 04, 2008 1
# 5 January 05, 2008 26
# 6 January 06, 2008 1
# 7 January 07, 2008 1
# 8 January 08, 2008 1
# 9 January 09, 2008 1
#10 January 10, 2008 1
# … with 4,465 more rows
数据
df <- structure(list(Date = c("September 16, 2012", "September 19, 2014",
"January 05, 2008", "June 07, 2017", "December 15, 2019", "May 28, 2020"
), Val = c(32L, 33L, 26L, 2L, 3L, 18L)), class = "data.frame",
row.names = c(NA, -6L))
我们可以在所需的日期范围内创建数据框,然后将我们的数据框加入其中,并替换所有的 NAs
与1。
library(tidyverse)
days_seq %>%
left_join(df) %>%
mutate(Val = if_else(is.na(Val), as.integer(1), Val))
Joining, by = "Date"
# A tibble: 4,474 x 2
Date Val
<date> <int>
1 2008-01-01 1
2 2008-01-02 1
3 2008-01-03 1
4 2008-01-04 1
5 2008-01-05 33
6 2008-01-06 1
7 2008-01-07 1
8 2008-01-08 1
9 2008-01-09 1
10 2008-01-10 1
# ... with 4,464 more rows
数据
days_seq <- tibble(Date = seq(as.Date("2008/01/01"), as.Date("2020/03/31"), "days"))
df <- tibble::tribble(
~Date, ~Val,
"2012/09/16", 32L,
"2012/09/19", 33L,
"2008/01/05", 33L
)
df$Date <- as.Date(df$Date)