问:在 R tidyverts/fable 框架中预测具有外生回归量的许多系列的情况下,是否有一种方法可以告诉
fable::ARIMA(y~x1+x2+...)
对于每个包含外生回归量组合的系列,故障转移到运行 ARIMA(y)
的例程x
是秩不足的(例如,因为 x1
、x2
...都只是零)而不是返回空模型?
下面我设置了一个最小的工作示例。正如我预期的那样,当特定的外生回归量只是一列零时,
fable::ARIMA()
只是将其丢弃,用剩余的外生回归量来估计模型。但是,令我惊讶的是,当所有外生回归量为零并且 ARIMA()
将它们全部丢弃时,它会产生一个 NULL
模型。理想情况下,在这些情况下,我只能告诉 ARIMA()
我想运行 ARIMA(y)
的例程。
问:但是如果这不是一个选择,我该怎么办?我是否尝试某种尝试捕获?有没有一种简洁的方法可以做到这一点?
这与我的问题相关为 ARIMA 中的每个系列有条件地指定外生回归量?。在那里,我询问了为不同系列指定不同回归量的“一般”情况,这可能会出现,例如,来自许多系列的最佳子集回归。当回归量全部为零时指定无回归量,当回归量不为零时指定完整集合可能是处理此问题的一种方法。但在这里,我真的在寻找直接、整洁的方法,旨在适用于这种特殊情况。
set.seed(123)
# create data
tibble(idx=1:12,group='aa',y=rnorm(12)) -> aa
tibble(idx=7:12,group='bb',y=rnorm(6)) -> bb
tibble(idx=1:6,group='cc',y=rnorm(6)) -> cc
tibble(idx=1:12,
x1=c(0,0,1,rep(0,9)),
x2=c(rep(0,9),1,0,0)) -> xreg
bind_rows(aa,bb,cc) %>%
inner_join(xreg,by='idx') %>%
as_tsibble(index=idx,key=group) -> dat
dat %>% print(n=Inf)
# # A tsibble: 24 x 5 [1]
# # Key: group [3]
# idx group y x1 x2
# <int> <chr> <dbl> <dbl> <dbl>
# 1 1 aa 1.15 0 0
# 2 2 aa -0.231 0 0
# 3 3 aa -0.365 1 0
# 4 4 aa 0.158 0 0
# 5 5 aa 0.766 0 0
# 6 6 aa 0.426 0 0
# 7 7 aa 0.927 0 0
# 8 8 aa 0.0461 0 0
# 9 9 aa -0.504 0 0
# 10 10 aa -0.300 0 1
# 11 11 aa 0.823 0 0
# 12 12 aa -0.849 0 0
# 13 7 bb -0.954 0 0
# 14 8 bb 0.313 0 0
# 15 9 bb 0.612 0 0
# 16 10 bb -1.69 0 1
# 17 11 bb 0.785 0 0
# 18 12 bb 0.0119 0 0
# 19 1 cc -0.181 0 0
# 20 2 cc 1.11 0 0
# 21 3 cc 1.48 1 0
# 22 4 cc -1.15 0 0
# 23 5 cc 1.01 0 0
# 24 6 cc -0.632 0 0
# Everything as I'd expect. No problem here.
dat %>% model(arima=ARIMA(y)) -> fit.y
fit.y
# # A mable: 3 x 2
# # Key: group [3]
# group arima
# <chr> <model>
# 1 aa <ARIMA(0,0,0)>
# 2 bb <ARIMA(0,0,0)>
# 3 cc <ARIMA(0,0,0)>
# Still, everything as I'd expect. Just a warning message which follows.
dat %>% model(arima=ARIMA(y~x1+x2)) -> fit.yx1x2
dat %>% model(arima=ARIMA(y,xreg=dat%>%select(x1,x2))) -> fit.yx1x2
# Warning messages:
# 1: Provided exogenous regressors are rank deficient, removing regressors: `x1`
# 2: Provided exogenous regressors are rank deficient, removing regressors: `x2`
# For group bb, x1 dropped and x2 retained, as I'd expect
# For group cc, x2 dropped and x1 retained, as I'd expect
fit.yx1x2 %>% pull(arima) %>% map(c('fit', 'par'))
# [[1]]
# # A tibble: 2 × 5
# term estimate std.error statistic p.value
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 x1 1.56 0.777 2.01 0.0679
# 2 x2 -0.446 0.777 -0.574 0.577
#
# [[2]]
# # A tibble: 1 × 5
# term estimate std.error statistic p.value
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 x2 1.79 0.875 2.04 0.0873
#
# [[3]]
# # A tibble: 1 × 5
# term estimate std.error statistic p.value
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 x1 -1.07 0.625 -1.71 0.139
# Now there's an error. I was surprised by this.
# Sure, for group bb, x1 is just a column of zeros.
# But if it's removed shouldn't the model estimated by the same as the one as
# if I had specified `ARIMA(y)`, as above, as if `x1` had never been mentioned?
dat %>% model(arima=ARIMA(y~x1)) -> fit.yx1
# Warning messages:
# 1: Provided exogenous regressors are rank deficient, removing regressors: `x1`
# 2: 1 error encountered for arima
# [1] subscript out of bounds
# A NULL model is estimated for group bb, rather than `<ARIMA(0,0,0)>` as above
# when I had specified `ARIMA(y)` without `x1`
fit.yx1 %>% pull(arima) %>% map(c('fit', 'par'))
# [[1]]
# # A tibble: 1 × 5
# term estimate std.error statistic p.value
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 x1 1.56 0.787 1.98 0.0712
#
# [[2]]
# NULL
#
# [[3]]
# # A tibble: 1 × 5
# term estimate std.error statistic p.value
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 x1 -1.07 0.625 -1.71 0.139
# the following is to create a tibble with one row for each group
# and a tibble for each group, to be used in `. %>% model(arima=ARIMA(<my formula>))`
# though I don't know how to continue this approach and, basically, say
# -ARIMA(y~x1+x2) for group aa,
# -ARIMA(y~x2) for group bb,
# -ARIMA(y~x1) for group cc
# in a tidy way.
# But, also, I don't think this is even the right approach.
# `fable::ARIMA()` was intended to be used once, for the entire dataset, with
# all series. It wasn't intended to be called once for each series, here, that
# being 3 times, once for each of series groups aa, bb, cc
dat %>%
as_tibble %>%
group_by(group) %>% summarise(across(x1:x2, \(.x) any(.x!=0))) %>% ungroup() %>%
pivot_longer(cols=x1:x2, names_to='x', values_to='keep') %>%
inner_join(
dat %>%
as_tibble() %>%
pivot_longer(cols=x1:x2, names_to='x', values_to='value'),
by=join_by(group, x)
) %>%
filter(keep) %>% select(-keep) %>%
nest(data=-group) %>%
mutate(data=map(data, \(.dat) pivot_wider(.dat, names_from=x, values_from=value) %>%
as_tsibble(index=idx))) -> dat.series
dat.series
# # A tibble: 3 × 2
# group data
# <chr> <list>
# 1 aa <tbl_ts [12 × 4]>
# 2 bb <tbl_ts [6 × 3]>
# 3 cc <tbl_ts [6 × 3]>
dat.series$data
# [[1]]
# # A tsibble: 12 x 4 [1]
# idx y x1 x2
# <int> <dbl> <dbl> <dbl>
# 1 1 -0.560 0 0
# 2 2 -0.230 0 0
# 3 3 1.56 1 0
# 4 4 0.0705 0 0
# 5 5 0.129 0 0
# 6 6 1.72 0 0
# 7 7 0.461 0 0
# 8 8 -1.27 0 0
# 9 9 -0.687 0 0
# 10 10 -0.446 0 1
# 11 11 1.22 0 0
# 12 12 0.360 0 0
#
# [[2]]
# # A tsibble: 6 x 3 [1]
# idx y x2
# <int> <dbl> <dbl>
# 1 7 0.401 0
# 2 8 0.111 0
# 3 9 -0.556 0
# 4 10 1.79 1
# 5 11 0.498 0
# 6 12 -1.97 0
#
# [[3]]
# # A tsibble: 6 x 3 [1]
# idx y x1
# <int> <dbl> <dbl>
# 1 1 0.701 0
# 2 2 -0.473 0
# 3 3 -1.07 1
# 4 4 -0.218 0
# 5 5 -1.03 0
# 6 6 -0.729 0
model(ARIMA(.))
为组
<NULL Model>
返回bb
是一个错误。
在我原来的帖子中,我的黑客解决方案是一些逻辑,它根据每个组的外生回归量的值构建不同的训练数据集,例如,为组bb
创建了不同的数据集。但我认为我下面所做的是一个更好的黑客解决方案。该错误会返回一个 <NULL Model>
,可以使用
fabletools::null_model()
轻松检查。我只是估计一个 safety
模型,并在这些 final
案例中将其用作我的 <NULL Model>
模型。当然,这适用于所有情况,而不仅仅是排名不足的外生回归量案例,它会导致
model(ARIMA(.))
返回具有相同 <NULL model>
模型的
safety
。但其实我对此很满意。
注意:如果一段时间后没有任何更好的答案,我会将其标记为已接受的答案。
# Package author pointed out in comments that `model(ARIMA(.))` returning a null
# model for group `bb` is a bug.
# In my original post, my hack solution was some logic which constructed
# different training data sets depending on the values of exogenous regressors
# over each group, e.g., which created a different data set for group `bb`.
# But I think what I've done below is a better hack solution.
dat %>%
model(
arima=ARIMA(y~x1),
safety=ARIMA(y)
) %>%
suppressWarnings() %>%
mutate(
final=if_else(is_null_model(arima), safety, arima)
) ->
fit.yx1
# Everything appears as I expected
print(fit.yx1)
# # A mable: 3 x 4
# # Key: group [3]
# group arima safety final
# <chr> <model> <model> <model>
# 1 aa <LM w/ ARIMA(0,0,0) errors> <ARIMA(0,0,0)> <LM w/ ARIMA(0,0,0) errors>
# 2 bb <NULL model> <ARIMA(0,0,0)> <ARIMA(0,0,0)>
# 3 cc <LM w/ ARIMA(0,0,0) errors> <ARIMA(0,0,0)> <LM w/ ARIMA(0,0,0) errors>
# Everything appears as I expected
fit.yx1 %>% pull(final) %>% map(c('fit', 'par'))
# [[1]]
# # A tibble: 1 × 5
# term estimate std.error statistic p.value
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 x1 1.56 0.787 1.98 0.0712
#
# [[2]]
# # A tibble: 0 × 5
# # ℹ 5 variables: term <chr>, estimate <dbl>, std.error <lgl>, statistic <dbl>, p.value <dbl>
#
# [[3]]
# # A tibble: 1 × 5
# term estimate std.error statistic p.value
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 x1 -1.07 0.625 -1.71 0.139