当它们完全排名不足时,在没有 xreg 的情况下运行 ARIMA?

问题描述 投票:0回答:1

问:在 R tidyverts/fable 框架中预测具有外生回归量的许多系列的情况下,是否有一种方法可以告诉

fable::ARIMA(y~x1+x2+...)
对于每个包含外生回归量组合的系列,故障转移到运行
ARIMA(y)
的例程
x
是秩不足的(例如,因为
x1
x2
...都只是零)而不是返回空模型?

下面我设置了一个最小的工作示例。正如我预期的那样,当特定的外生回归量只是一列零时,

fable::ARIMA()
只是将其丢弃,用剩余的外生回归量来估计模型。但是,令我惊讶的是,当所有外生回归量为零并且
ARIMA()
将它们全部丢弃时,它会产生一个
NULL
模型。理想情况下,在这些情况下,我只能告诉
ARIMA()
我想运行
ARIMA(y)
的例程。

问:但是如果这不是一个选择,我该怎么办?我是否尝试某种尝试捕获?有没有一种简洁的方法可以做到这一点?

这与我的问题相关为 ARIMA 中的每个系列有条件地指定外生回归量?。在那里,我询问了为不同系列指定不同回归量的“一般”情况,这可能会出现,例如,来自许多系列的最佳子集回归。当回归量全部为零时指定无回归量,当回归量不为零时指定完整集合可能是处理此问题的一种方法。但在这里,我真的在寻找直接、整洁的方法,旨在适用于这种特殊情况。 set.seed(123) # create data tibble(idx=1:12,group='aa',y=rnorm(12)) -> aa tibble(idx=7:12,group='bb',y=rnorm(6)) -> bb tibble(idx=1:6,group='cc',y=rnorm(6)) -> cc tibble(idx=1:12, x1=c(0,0,1,rep(0,9)), x2=c(rep(0,9),1,0,0)) -> xreg bind_rows(aa,bb,cc) %>% inner_join(xreg,by='idx') %>% as_tsibble(index=idx,key=group) -> dat dat %>% print(n=Inf) # # A tsibble: 24 x 5 [1] # # Key: group [3] # idx group y x1 x2 # <int> <chr> <dbl> <dbl> <dbl> # 1 1 aa 1.15 0 0 # 2 2 aa -0.231 0 0 # 3 3 aa -0.365 1 0 # 4 4 aa 0.158 0 0 # 5 5 aa 0.766 0 0 # 6 6 aa 0.426 0 0 # 7 7 aa 0.927 0 0 # 8 8 aa 0.0461 0 0 # 9 9 aa -0.504 0 0 # 10 10 aa -0.300 0 1 # 11 11 aa 0.823 0 0 # 12 12 aa -0.849 0 0 # 13 7 bb -0.954 0 0 # 14 8 bb 0.313 0 0 # 15 9 bb 0.612 0 0 # 16 10 bb -1.69 0 1 # 17 11 bb 0.785 0 0 # 18 12 bb 0.0119 0 0 # 19 1 cc -0.181 0 0 # 20 2 cc 1.11 0 0 # 21 3 cc 1.48 1 0 # 22 4 cc -1.15 0 0 # 23 5 cc 1.01 0 0 # 24 6 cc -0.632 0 0 # Everything as I'd expect. No problem here. dat %>% model(arima=ARIMA(y)) -> fit.y fit.y # # A mable: 3 x 2 # # Key: group [3] # group arima # <chr> <model> # 1 aa <ARIMA(0,0,0)> # 2 bb <ARIMA(0,0,0)> # 3 cc <ARIMA(0,0,0)> # Still, everything as I'd expect. Just a warning message which follows. dat %>% model(arima=ARIMA(y~x1+x2)) -> fit.yx1x2 dat %>% model(arima=ARIMA(y,xreg=dat%>%select(x1,x2))) -> fit.yx1x2 # Warning messages: # 1: Provided exogenous regressors are rank deficient, removing regressors: `x1` # 2: Provided exogenous regressors are rank deficient, removing regressors: `x2` # For group bb, x1 dropped and x2 retained, as I'd expect # For group cc, x2 dropped and x1 retained, as I'd expect fit.yx1x2 %>% pull(arima) %>% map(c('fit', 'par')) # [[1]] # # A tibble: 2 × 5 # term estimate std.error statistic p.value # <chr> <dbl> <dbl> <dbl> <dbl> # 1 x1 1.56 0.777 2.01 0.0679 # 2 x2 -0.446 0.777 -0.574 0.577 # # [[2]] # # A tibble: 1 × 5 # term estimate std.error statistic p.value # <chr> <dbl> <dbl> <dbl> <dbl> # 1 x2 1.79 0.875 2.04 0.0873 # # [[3]] # # A tibble: 1 × 5 # term estimate std.error statistic p.value # <chr> <dbl> <dbl> <dbl> <dbl> # 1 x1 -1.07 0.625 -1.71 0.139 # Now there's an error. I was surprised by this. # Sure, for group bb, x1 is just a column of zeros. # But if it's removed shouldn't the model estimated by the same as the one as # if I had specified `ARIMA(y)`, as above, as if `x1` had never been mentioned? dat %>% model(arima=ARIMA(y~x1)) -> fit.yx1 # Warning messages: # 1: Provided exogenous regressors are rank deficient, removing regressors: `x1` # 2: 1 error encountered for arima # [1] subscript out of bounds # A NULL model is estimated for group bb, rather than `<ARIMA(0,0,0)>` as above # when I had specified `ARIMA(y)` without `x1` fit.yx1 %>% pull(arima) %>% map(c('fit', 'par')) # [[1]] # # A tibble: 1 × 5 # term estimate std.error statistic p.value # <chr> <dbl> <dbl> <dbl> <dbl> # 1 x1 1.56 0.787 1.98 0.0712 # # [[2]] # NULL # # [[3]] # # A tibble: 1 × 5 # term estimate std.error statistic p.value # <chr> <dbl> <dbl> <dbl> <dbl> # 1 x1 -1.07 0.625 -1.71 0.139 # the following is to create a tibble with one row for each group # and a tibble for each group, to be used in `. %>% model(arima=ARIMA(<my formula>))` # though I don't know how to continue this approach and, basically, say # -ARIMA(y~x1+x2) for group aa, # -ARIMA(y~x2) for group bb, # -ARIMA(y~x1) for group cc # in a tidy way. # But, also, I don't think this is even the right approach. # `fable::ARIMA()` was intended to be used once, for the entire dataset, with # all series. It wasn't intended to be called once for each series, here, that # being 3 times, once for each of series groups aa, bb, cc dat %>% as_tibble %>% group_by(group) %>% summarise(across(x1:x2, \(.x) any(.x!=0))) %>% ungroup() %>% pivot_longer(cols=x1:x2, names_to='x', values_to='keep') %>% inner_join( dat %>% as_tibble() %>% pivot_longer(cols=x1:x2, names_to='x', values_to='value'), by=join_by(group, x) ) %>% filter(keep) %>% select(-keep) %>% nest(data=-group) %>% mutate(data=map(data, \(.dat) pivot_wider(.dat, names_from=x, values_from=value) %>% as_tsibble(index=idx))) -> dat.series dat.series # # A tibble: 3 × 2 # group data # <chr> <list> # 1 aa <tbl_ts [12 × 4]> # 2 bb <tbl_ts [6 × 3]> # 3 cc <tbl_ts [6 × 3]> dat.series$data # [[1]] # # A tsibble: 12 x 4 [1] # idx y x1 x2 # <int> <dbl> <dbl> <dbl> # 1 1 -0.560 0 0 # 2 2 -0.230 0 0 # 3 3 1.56 1 0 # 4 4 0.0705 0 0 # 5 5 0.129 0 0 # 6 6 1.72 0 0 # 7 7 0.461 0 0 # 8 8 -1.27 0 0 # 9 9 -0.687 0 0 # 10 10 -0.446 0 1 # 11 11 1.22 0 0 # 12 12 0.360 0 0 # # [[2]] # # A tsibble: 6 x 3 [1] # idx y x2 # <int> <dbl> <dbl> # 1 7 0.401 0 # 2 8 0.111 0 # 3 9 -0.556 0 # 4 10 1.79 1 # 5 11 0.498 0 # 6 12 -1.97 0 # # [[3]] # # A tsibble: 6 x 3 [1] # idx y x1 # <int> <dbl> <dbl> # 1 1 0.701 0 # 2 2 -0.473 0 # 3 3 -1.07 1 # 4 4 -0.218 0 # 5 5 -1.03 0 # 6 6 -0.729 0


r forecasting arima rlang fable-r
1个回答
0
投票
model(ARIMA(.))

为组

<NULL Model>
返回
bb
是一个错误。 在我原来的帖子中,我的黑客解决方案是一些逻辑,它根据每个组的外生回归量的值构建不同的训练数据集,例如,为组
bb
创建了不同的数据集。但我认为我下面所做的是一个更好的黑客解决方案。
该错误会返回一个 

<NULL Model>

,可以使用

fabletools::null_model()
轻松检查。我只是估计一个
safety
模型,并在这些
final
案例中将其用作我的
<NULL Model>
模型。当然,这适用于所有情况,而不仅仅是
排名不足的外生回归量案例,它会导致 
model(ARIMA(.)) 返回具有相同 <NULL model>
 模型的 
safety
。但其实我对此很满意。
注意:如果一段时间后没有任何更好的答案,我会将其标记为已接受的答案。

# Package author pointed out in comments that `model(ARIMA(.))` returning a null # model for group `bb` is a bug. # In my original post, my hack solution was some logic which constructed # different training data sets depending on the values of exogenous regressors # over each group, e.g., which created a different data set for group `bb`. # But I think what I've done below is a better hack solution. dat %>% model( arima=ARIMA(y~x1), safety=ARIMA(y) ) %>% suppressWarnings() %>% mutate( final=if_else(is_null_model(arima), safety, arima) ) -> fit.yx1 # Everything appears as I expected print(fit.yx1) # # A mable: 3 x 4 # # Key: group [3] # group arima safety final # <chr> <model> <model> <model> # 1 aa <LM w/ ARIMA(0,0,0) errors> <ARIMA(0,0,0)> <LM w/ ARIMA(0,0,0) errors> # 2 bb <NULL model> <ARIMA(0,0,0)> <ARIMA(0,0,0)> # 3 cc <LM w/ ARIMA(0,0,0) errors> <ARIMA(0,0,0)> <LM w/ ARIMA(0,0,0) errors> # Everything appears as I expected fit.yx1 %>% pull(final) %>% map(c('fit', 'par')) # [[1]] # # A tibble: 1 × 5 # term estimate std.error statistic p.value # <chr> <dbl> <dbl> <dbl> <dbl> # 1 x1 1.56 0.787 1.98 0.0712 # # [[2]] # # A tibble: 0 × 5 # # ℹ 5 variables: term <chr>, estimate <dbl>, std.error <lgl>, statistic <dbl>, p.value <dbl> # # [[3]] # # A tibble: 1 × 5 # term estimate std.error statistic p.value # <chr> <dbl> <dbl> <dbl> <dbl> # 1 x1 -1.07 0.625 -1.71 0.139


© www.soinside.com 2019 - 2024. All rights reserved.