当它们完全排名不足时，在没有 xreg 的情况下运行 ARIMA？

Question

问：在 R tidyverts/fable 框架中预测具有外生回归量的许多系列的情况下，是否有一种方法可以告诉

fable::ARIMA(y~x1+x2+...)

对于每个包含外生回归量组合的系列，故障转移到运行

ARIMA(y)

的例程

是秩不足的（例如，因为

x1

、

x2

...都只是零）而不是返回空模型？

下面我设置了一个最小的工作示例。正如我预期的那样，当特定的外生回归量只是一列零时，

fable::ARIMA()

只是将其丢弃，用剩余的外生回归量来估计模型。但是，令我惊讶的是，当所有外生回归量为零并且

ARIMA()

将它们全部丢弃时，它会产生一个

NULL

模型。理想情况下，在这些情况下，我只能告诉

ARIMA()

我想运行

ARIMA(y)

的例程。

问：但是如果这不是一个选择，我该怎么办？我是否尝试某种尝试捕获？有没有一种简洁的方法可以做到这一点？

这与我的问题相关为 ARIMA 中的每个系列有条件地指定外生回归量？。在那里，我询问了为不同系列指定不同回归量的“一般”情况，这可能会出现，例如，来自许多系列的最佳子集回归。当回归量全部为零时指定无回归量，当回归量不为零时指定完整集合可能是处理此问题的一种方法。但在这里，我真的在寻找直接、整洁的方法，旨在适用于这种特殊情况。 set.seed(123) # create data tibble(idx=1:12,group='aa',y=rnorm(12)) -> aa tibble(idx=7:12,group='bb',y=rnorm(6)) -> bb tibble(idx=1:6,group='cc',y=rnorm(6)) -> cc tibble(idx=1:12, x1=c(0,0,1,rep(0,9)), x2=c(rep(0,9),1,0,0)) -> xreg bind_rows(aa,bb,cc) %>% inner_join(xreg,by='idx') %>% as_tsibble(index=idx,key=group) -> dat dat %>% print(n=Inf) # # A tsibble: 24 x 5 [1] # # Key: group [3] # idx group y x1 x2 # <int> <chr> <dbl> <dbl> <dbl> # 1 1 aa 1.15 0 0 # 2 2 aa -0.231 0 0 # 3 3 aa -0.365 1 0 # 4 4 aa 0.158 0 0 # 5 5 aa 0.766 0 0 # 6 6 aa 0.426 0 0 # 7 7 aa 0.927 0 0 # 8 8 aa 0.0461 0 0 # 9 9 aa -0.504 0 0 # 10 10 aa -0.300 0 1 # 11 11 aa 0.823 0 0 # 12 12 aa -0.849 0 0 # 13 7 bb -0.954 0 0 # 14 8 bb 0.313 0 0 # 15 9 bb 0.612 0 0 # 16 10 bb -1.69 0 1 # 17 11 bb 0.785 0 0 # 18 12 bb 0.0119 0 0 # 19 1 cc -0.181 0 0 # 20 2 cc 1.11 0 0 # 21 3 cc 1.48 1 0 # 22 4 cc -1.15 0 0 # 23 5 cc 1.01 0 0 # 24 6 cc -0.632 0 0 # Everything as I'd expect. No problem here. dat %>% model(arima=ARIMA(y)) -> fit.y fit.y # # A mable: 3 x 2 # # Key: group [3] # group arima # <chr> <model> # 1 aa <ARIMA(0,0,0)> # 2 bb <ARIMA(0,0,0)> # 3 cc <ARIMA(0,0,0)> # Still, everything as I'd expect. Just a warning message which follows. dat %>% model(arima=ARIMA(y~x1+x2)) -> fit.yx1x2 dat %>% model(arima=ARIMA(y,xreg=dat%>%select(x1,x2))) -> fit.yx1x2 # Warning messages: # 1: Provided exogenous regressors are rank deficient, removing regressors: `x1` # 2: Provided exogenous regressors are rank deficient, removing regressors: `x2` # For group bb, x1 dropped and x2 retained, as I'd expect # For group cc, x2 dropped and x1 retained, as I'd expect fit.yx1x2 %>% pull(arima) %>% map(c('fit', 'par')) # [[1]] # # A tibble: 2 × 5 # term estimate std.error statistic p.value # <chr> <dbl> <dbl> <dbl> <dbl> # 1 x1 1.56 0.777 2.01 0.0679 # 2 x2 -0.446 0.777 -0.574 0.577 # # [[2]] # # A tibble: 1 × 5 # term estimate std.error statistic p.value # <chr> <dbl> <dbl> <dbl> <dbl> # 1 x2 1.79 0.875 2.04 0.0873 # # [[3]] # # A tibble: 1 × 5 # term estimate std.error statistic p.value # <chr> <dbl> <dbl> <dbl> <dbl> # 1 x1 -1.07 0.625 -1.71 0.139 # Now there's an error. I was surprised by this. # Sure, for group bb, x1 is just a column of zeros. # But if it's removed shouldn't the model estimated by the same as the one as # if I had specified `ARIMA(y)`, as above, as if `x1` had never been mentioned? dat %>% model(arima=ARIMA(y~x1)) -> fit.yx1 # Warning messages: # 1: Provided exogenous regressors are rank deficient, removing regressors: `x1` # 2: 1 error encountered for arima # [1] subscript out of bounds # A NULL model is estimated for group bb, rather than `<ARIMA(0,0,0)>` as above # when I had specified `ARIMA(y)` without `x1` fit.yx1 %>% pull(arima) %>% map(c('fit', 'par')) # [[1]] # # A tibble: 1 × 5 # term estimate std.error statistic p.value # <chr> <dbl> <dbl> <dbl> <dbl> # 1 x1 1.56 0.787 1.98 0.0712 # # [[2]] # NULL # # [[3]] # # A tibble: 1 × 5 # term estimate std.error statistic p.value # <chr> <dbl> <dbl> <dbl> <dbl> # 1 x1 -1.07 0.625 -1.71 0.139 # the following is to create a tibble with one row for each group # and a tibble for each group, to be used in `. %>% model(arima=ARIMA(<my formula>))` # though I don't know how to continue this approach and, basically, say # -ARIMA(y~x1+x2) for group aa, # -ARIMA(y~x2) for group bb, # -ARIMA(y~x1) for group cc # in a tidy way. # But, also, I don't think this is even the right approach. # `fable::ARIMA()` was intended to be used once, for the entire dataset, with # all series. It wasn't intended to be called once for each series, here, that # being 3 times, once for each of series groups aa, bb, cc dat %>% as_tibble %>% group_by(group) %>% summarise(across(x1:x2, \(.x) any(.x!=0))) %>% ungroup() %>% pivot_longer(cols=x1:x2, names_to='x', values_to='keep') %>% inner_join( dat %>% as_tibble() %>% pivot_longer(cols=x1:x2, names_to='x', values_to='value'), by=join_by(group, x) ) %>% filter(keep) %>% select(-keep) %>% nest(data=-group) %>% mutate(data=map(data, \(.dat) pivot_wider(.dat, names_from=x, values_from=value) %>% as_tsibble(index=idx))) -> dat.series dat.series # # A tibble: 3 × 2 # group data # <chr> <list> # 1 aa <tbl_ts [12 × 4]> # 2 bb <tbl_ts [6 × 3]> # 3 cc <tbl_ts [6 × 3]> dat.series$data # [[1]] # # A tsibble: 12 x 4 [1] # idx y x1 x2 # <int> <dbl> <dbl> <dbl> # 1 1 -0.560 0 0 # 2 2 -0.230 0 0 # 3 3 1.56 1 0 # 4 4 0.0705 0 0 # 5 5 0.129 0 0 # 6 6 1.72 0 0 # 7 7 0.461 0 0 # 8 8 -1.27 0 0 # 9 9 -0.687 0 0 # 10 10 -0.446 0 1 # 11 11 1.22 0 0 # 12 12 0.360 0 0 # # [[2]] # # A tsibble: 6 x 3 [1] # idx y x2 # <int> <dbl> <dbl> # 1 7 0.401 0 # 2 8 0.111 0 # 3 9 -0.556 0 # 4 10 1.79 1 # 5 11 0.498 0 # 6 12 -1.97 0 # # [[3]] # # A tsibble: 6 x 3 [1] # idx y x1 # <int> <dbl> <dbl> # 1 1 0.701 0 # 2 2 -0.473 0 # 3 3 -1.07 1 # 4 4 -0.218 0 # 5 5 -1.03 0 # 6 6 -0.729 0

Answer 1

model(ARIMA(.))

为组

<NULL Model>

返回

bb

是一个错误。在我原来的帖子中，我的黑客解决方案是一些逻辑，它根据每个组的外生回归量的值构建不同的训练数据集，例如，为组

bb

创建了不同的数据集。但我认为我下面所做的是一个更好的黑客解决方案。

该错误会返回一个

<NULL Model>

，可以使用

fabletools::null_model()

轻松检查。我只是估计一个

safety

模型，并在这些

final

案例中将其用作我的

<NULL Model>

模型。当然，这适用于所有情况，而不仅仅是

排名不足的外生回归量案例，它会导致

model(ARIMA(.)) 返回具有相同 <NULL model>

 模型的

safety

。但其实我对此很满意。

注意：如果一段时间后没有任何更好的答案，我会将其标记为已接受的答案。

# Package author pointed out in comments that `model(ARIMA(.))` returning a null 
# model for group `bb` is a bug. 
# In my original post, my hack solution was some logic which constructed 
# different training data sets depending on the values of exogenous regressors 
# over each group, e.g., which created a different data set for group `bb`.
# But I think what I've done below is a better hack solution. 
dat %>% 
  model(
    arima=ARIMA(y~x1),
    safety=ARIMA(y)
    ) %>%
  suppressWarnings() %>%
  mutate(
    final=if_else(is_null_model(arima), safety, arima)
      ) -> 
  fit.yx1

# Everything appears as I expected
print(fit.yx1)
# # A mable: 3 x 4
# # Key:     group [3]
# group                       arima         safety                       final
# <chr>                     <model>        <model>                     <model>
#   1 aa    <LM w/ ARIMA(0,0,0) errors> <ARIMA(0,0,0)> <LM w/ ARIMA(0,0,0) errors>
#   2 bb                   <NULL model> <ARIMA(0,0,0)>              <ARIMA(0,0,0)>
#   3 cc    <LM w/ ARIMA(0,0,0) errors> <ARIMA(0,0,0)> <LM w/ ARIMA(0,0,0) errors>

# Everything appears as I expected
fit.yx1 %>% pull(final) %>% map(c('fit', 'par'))
# [[1]]
# # A tibble: 1 × 5
# term  estimate std.error statistic p.value
# <chr>    <dbl>     <dbl>     <dbl>   <dbl>
#   1 x1        1.56     0.787      1.98  0.0712
# 
# [[2]]
# # A tibble: 0 × 5
# # ℹ 5 variables: term <chr>, estimate <dbl>, std.error <lgl>, statistic <dbl>, p.value <dbl>
# 
# [[3]]
# # A tibble: 1 × 5
# term  estimate std.error statistic p.value
# <chr>    <dbl>     <dbl>     <dbl>   <dbl>
#  1 x1       -1.07     0.625     -1.71   0.139

当它们完全排名不足时，在没有 xreg 的情况下运行 ARIMA？

问题描述投票：0回答：1

1个回答

最新问题

当它们完全排名不足时，在没有 xreg 的情况下运行 ARIMA？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1