我有混合频率的数据,日期变量是一个字符向量。我正在尝试使用
case_when()
根据分配的频率来解析日期。如果我按频率拆分数据帧并且不使用条件逻辑,则 yq()
和 ym()
函数都可以正常工作。但是,当我在混合频率表上使用 case_when()
时,我收到有关解析失败的警告。具体来说,故障归因于月度数据中的 05-12 月。解析函数似乎返回了预期的结果,但我担心情况是否总是如此,以及我是否遗漏了一个陷阱。实际的解析失败是什么,为什么会发生?
代表:
library(dplyr, quietly = TRUE)
library(lubridate, quietly = TRUE)
library(purrr, quietly = TRUE)
library(stringr, quietly = TRUE)
### Data - construct date columns in the format of my data
qtrs <- paste(":0", 1:4, sep = "")
mnths <- c(paste(":0", 1:9, sep = ""), paste(":", 10:12, sep = ""))
yr_qtrs <- map(1979:1980, ~str_c(.x, qtrs)) %>% list_c()
yr_mnths <- map(1979:1980, ~str_c(.x, mnths)) %>% list_c()
#### Create separate tables and assign frequency variable
qtr_tbl <- tibble(date = yr_qtrs, freq = "quarterly")
mnth_tbl <- tibble(date = yr_mnths, freq = "monthly")
#### Create a mixed frequency tibble
mixed_tbl <- bind_rows(qtr_tbl, mnth_tbl)
### Quarters parse fine with yq()
qtr_tbl %>%
mutate(date_new = yq(date))
#> # A tibble: 8 × 3
#> date freq date_new
#> <chr> <chr> <date>
#> 1 1979:01 quarterly 1979-01-01
#> 2 1979:02 quarterly 1979-04-01
#> 3 1979:03 quarterly 1979-07-01
#> 4 1979:04 quarterly 1979-10-01
#> 5 1980:01 quarterly 1980-01-01
#> 6 1980:02 quarterly 1980-04-01
#> 7 1980:03 quarterly 1980-07-01
#> 8 1980:04 quarterly 1980-10-01
### Months parse fine with ym()
mnth_tbl %>%
mutate(date_new = ym(date))
#> # A tibble: 24 × 3
#> date freq date_new
#> <chr> <chr> <date>
#> 1 1979:01 monthly 1979-01-01
#> 2 1979:02 monthly 1979-02-01
#> 3 1979:03 monthly 1979-03-01
#> 4 1979:04 monthly 1979-04-01
#> 5 1979:05 monthly 1979-05-01
#> 6 1979:06 monthly 1979-06-01
#> 7 1979:07 monthly 1979-07-01
#> 8 1979:08 monthly 1979-08-01
#> 9 1979:09 monthly 1979-09-01
#> 10 1979:10 monthly 1979-10-01
#> # ℹ 14 more rows
### Chooses the parsing function based on the freq variable for the mixed freq tbl.
### It gives 16 warnings, which correspond to months 05:12 for each of the two years
mixed_tbl %>%
mutate(date_new = case_when(freq == "monthly" ~ ym(date),
freq == "quarterly" ~ yq(date),
TRUE ~ NA_Date_
)
) %>% print(n = 32)
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `date_new = case_when(...)`.
#> Caused by warning:
#> ! 16 failed to parse.
#> # A tibble: 32 × 3
#> date freq date_new
#> <chr> <chr> <date>
#> 1 1979:01 quarterly 1979-01-01
#> 2 1979:02 quarterly 1979-04-01
#> 3 1979:03 quarterly 1979-07-01
#> 4 1979:04 quarterly 1979-10-01
#> 5 1980:01 quarterly 1980-01-01
#> 6 1980:02 quarterly 1980-04-01
#> 7 1980:03 quarterly 1980-07-01
#> 8 1980:04 quarterly 1980-10-01
#> 9 1979:01 monthly 1979-01-01
#> 10 1979:02 monthly 1979-02-01
#> 11 1979:03 monthly 1979-03-01
#> 12 1979:04 monthly 1979-04-01
#> 13 1979:05 monthly 1979-05-01
#> 14 1979:06 monthly 1979-06-01
#> 15 1979:07 monthly 1979-07-01
#> 16 1979:08 monthly 1979-08-01
#> 17 1979:09 monthly 1979-09-01
#> 18 1979:10 monthly 1979-10-01
#> 19 1979:11 monthly 1979-11-01
#> 20 1979:12 monthly 1979-12-01
#> 21 1980:01 monthly 1980-01-01
#> 22 1980:02 monthly 1980-02-01
#> 23 1980:03 monthly 1980-03-01
#> 24 1980:04 monthly 1980-04-01
#> 25 1980:05 monthly 1980-05-01
#> 26 1980:06 monthly 1980-06-01
#> 27 1980:07 monthly 1980-07-01
#> 28 1980:08 monthly 1980-08-01
#> 29 1980:09 monthly 1980-09-01
#> 30 1980:10 monthly 1980-10-01
#> 31 1980:11 monthly 1980-11-01
#> 32 1980:12 monthly 1980-12-01
提前感谢您的反馈!
case_when
不控制将要评估的内容,也不会根据条件屏蔽输入向量,这意味着 yq(date)
正是它的本质 - yq()
使用 date
向量调用。只是为了说明这里发生了什么,我们可以使 ym()
和 yq()
调用更冗长:
library(dplyr, quietly = TRUE)
library(lubridate, quietly = TRUE)
library(purrr, quietly = TRUE)
library(stringr, quietly = TRUE)
# override ym & yq to get an idea how those are called
ym <- \(dates, ...) {message(">> ym() called with: ", paste(dates, collapse = " ")); lubridate::ym(dates,...)}
yq <- \(dates, ...) {message(">> yq() called with: ", paste(dates, collapse = " ")); lubridate::yq(dates,...)}
# test how it works:
ym(c("2000:04","2000:05"))
#> >> ym() called with: 2000:04 2000:05
#> [1] "2000-04-01" "2000-05-01"
yq(c("2000:04","2000:05"))
#> >> yq() called with: 2000:04 2000:05
#> Warning: 1 failed to parse.
#> [1] "2000-10-01" NA
# now with case_when() to see if input vectors
# for ym() and yq() are different:
mixed_tbl %>%
mutate(date_new = case_when(freq == "monthly" ~ ym(date),
freq == "quarterly" ~ yq(date),
TRUE ~ NA_Date_
)
) %>%
glimpse()
#> >> ym() called with: 1979:01 1979:02 1979:03 1979:04 1980:01 1980:02 1980:03 1980:04 1979:01 1979:02 1979:03 1979:04 1979:05 1979:06 1979:07 1979:08 1979:09 1979:10 1979:11 1979:12 1980:01 1980:02 1980:03 1980:04 1980:05 1980:06 1980:07 1980:08 1980:09 1980:10 1980:11 1980:12
#> >> yq() called with: 1979:01 1979:02 1979:03 1979:04 1980:01 1980:02 1980:03 1980:04 1979:01 1979:02 1979:03 1979:04 1979:05 1979:06 1979:07 1979:08 1979:09 1979:10 1979:11 1979:12 1980:01 1980:02 1980:03 1980:04 1980:05 1980:06 1980:07 1980:08 1980:09 1980:10 1980:11 1980:12
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `date_new = case_when(...)`.
#> Caused by warning:
#> ! 16 failed to parse.
#> Rows: 32
#> Columns: 3
#> $ date <chr> "1979:01", "1979:02", "1979:03", "1979:04", "1980:01", "1980:…
#> $ freq <chr> "quarterly", "quarterly", "quarterly", "quarterly", "quarterl…
#> $ date_new <date> 1979-01-01, 1979-04-01, 1979-07-01, 1979-10-01, 1980-01-01, …
一个可能的解决方案,按
freq
拆分,在每个部分上应用适当的函数并将结果列表绑定回单个 df:
mixed_tbl %>%
split(~freq) %>%
map_at("monthly", ~ mutate(.x, date_new = ym(date))) %>%
map_at("quarterly", ~ mutate(.x, date_new = yq(date))) %>%
list_rbind() %>%
glimpse()
#> >> ym() called with: 1979:01 1979:02 1979:03 1979:04 1979:05 1979:06 1979:07 1979:08 1979:09 1979:10 1979:11 1979:12 1980:01 1980:02 1980:03 1980:04 1980:05 1980:06 1980:07 1980:08 1980:09 1980:10 1980:11 1980:12
#> >> yq() called with: 1979:01 1979:02 1979:03 1979:04 1980:01 1980:02 1980:03 1980:04
#> Rows: 32
#> Columns: 3
#> $ date <chr> "1979:01", "1979:02", "1979:03", "1979:04", "1979:05", "1979:…
#> $ freq <chr> "monthly", "monthly", "monthly", "monthly", "monthly", "month…
#> $ date_new <date> 1979-01-01, 1979-02-01, 1979-03-01, 1979-04-01, 1979-05-01, …
输入数据:
mixed_tbl <- structure(list(
date = c("1979:01", "1979:02", "1979:03", "1979:04",
"1980:01", "1980:02", "1980:03", "1980:04", "1979:01", "1979:02",
"1979:03", "1979:04", "1979:05", "1979:06", "1979:07", "1979:08",
"1979:09", "1979:10", "1979:11", "1979:12", "1980:01", "1980:02",
"1980:03", "1980:04", "1980:05", "1980:06", "1980:07", "1980:08",
"1980:09", "1980:10", "1980:11", "1980:12"), freq = c("quarterly",
"quarterly", "quarterly", "quarterly", "quarterly", "quarterly",
"quarterly", "quarterly", "monthly", "monthly", "monthly", "monthly",
"monthly", "monthly", "monthly", "monthly", "monthly", "monthly",
"monthly", "monthly", "monthly", "monthly", "monthly", "monthly",
"monthly", "monthly", "monthly", "monthly", "monthly", "monthly",
"monthly", "monthly")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -32L))
创建于 2023-05-19 与 reprex v2.0.2