Statsmodels ARIMA：如何获得置信度/预测区间？

Question

如何生成“较低”和“较高”预测，而不仅仅是“yhat”？

import statsmodels
from statsmodels.tsa.arima.model import ARIMA

assert statsmodels.__version__ == '0.12.0'

arima = ARIMA(df['value'], order=order)
model = arima.fit()

现在我可以生成“yhat”预测

yhat = model.forecast(123)

并获取模型参数的置信区间（但不适用于预测）：

model.conf_int()

但是如何生成

yhat_lower

和

yhat_upper

预测呢？

Answer 1

一般来说，

forecast

和

predict

方法仅产生点预测，而

get_forecast

和

get_prediction

方法产生包括预测区间的完整结果。

在您的示例中，您可以执行以下操作：

forecast = model.get_forecast(123)
yhat = forecast.predicted_mean
yhat_conf_int = forecast.conf_int(alpha=0.05)

如果您的数据是 Pandas Series，那么

yhat_conf_int

将是一个包含两列的 DataFrame，

lower <name>

和

upper <name>

，其中

<name>

是 Pandas Series 的名称。

如果您的数据是 numpy 数组（或 Python 列表），那么

yhat_conf_int

将是

(n_forecasts, 2)

数组，其中第一列是区间的下部，第二列是上部。

Answer 2

使用 ARIMA，您需要自己在模型中包含季节性和外生变量。在使用 SARIMA（季节性 ARIMA）或 SARIMAX（也适用于外生因素）实施时，C.I.到摘要框架：

import statsmodels.api as sm
import matplotlib.pyplot as plt
import pandas as pd

dta = sm.datasets.sunspots.load_pandas().data[['SUNACTIVITY']]
dta.index = pd.Index(pd.date_range("1700", end="2009", freq="A"))
print(dta)
print("init data:\n")
dta.plot(figsize=(12,4));
plt.show()

##print("SARIMAX(dta, order=(2,0,0), trend='c'):\n")
result = sm.tsa.SARIMAX(dta, order=(2,0,0), trend='c').fit(disp=False)
print(">>> result.params:\n", result.params, "\n")

##print("SARIMA_model.plot_diagnostics:\n")
result.plot_diagnostics(figsize=(15,12))
plt.show()    
# summary stats of residuals
print(">>> residuals.describe:\n", result.resid.describe(), "\n")

# Out-of-sample forecasts are produced using the forecast or get_forecast methods from the results object
# The get_forecast method is more general, and also allows constructing confidence intervals.
fcast_res1 = result.get_forecast()
# specify that we want a confidence level of 90%
print(">>> forecast summary at alpha=0.01:\n", fcast_res1.summary_frame(alpha=0.10), "\n")

# plot forecast
fig, ax = plt.subplots(figsize=(15, 5))    
ax = dta.plot(label='Training Data')
# Construct the forecasts
fcast = result.get_forecast('2050Q4').summary_frame()
print(fcast)
fcast['mean'].plot(ax=ax, style='k--')
ax.fill_between(fcast.index, fcast['mean_ci_lower'], fcast['mean_ci_upper'], color='k', alpha=0.1);    
fig.tight_layout()
plt.show()

docs：“上面的预测可能看起来不太令人印象深刻，因为它几乎是一条直线。这是因为这是一个非常简单的单变量预测模型。尽管如此，请记住，这些简单的预测模型可能非常有竞争力”

附注此处“您可以通过将季节性项设置为零来以非季节性方式使用它。”

Answer 3

要生成预测区间而不是置信区间（您已巧妙地对其进行了区分，并且在 Hyndman 的博客文章中介绍了预测区间和置信区间之间的差异），那么您可以遵循本文档中提供的指导回答。

您还可以尝试计算自举预测区间，这在这个answer中列出。

下面是我实现这一点的尝试（当我有机会更详细地检查它时，我会更新它）：

def bootstrap_prediction_interval(y_train: Union[list, pd.Series],
                                  y_fit: Union[list, pd.Series],
                                  y_pred_value: float,
                                  alpha: float = 0.05,
                                  nbootstrap: int = None,
                                  seed: int = None):
    """
    Bootstraps a prediction interval around an ARIMA model's predictions.
    Method presented clearly here:
        - https://stats.stackexchange.com/a/254321
    Also found through here, though less clearly:
        - https://otexts.com/fpp3/prediction-intervals.html
    Can consider this to be a time-series version of the following generalisation:
        - https://saattrupdan.github.io/2020-03-01-bootstrap-prediction/

    :param y_train: List or Series of training univariate time-series data.
    :param y_fit: List or Series of model fitted univariate time-series data.
    :param y_pred_value: Float of the model predicted univariate time-series you want to compute P.I. for.
    :param alpha: float = 0.05, the prediction uncertainty.
    :param nbootstrap: integer = 1000, the number of bootstrap sampling of the residual forecast error.
        Rules of thumb provided here:
            - https://stats.stackexchange.com/questions/86040/rule-of-thumb-for-number-of-bootstrap-samples
    :param seed: Integer to specify if you want deterministic sampling.

    :return: A list [`lower`, `pred`, `upper`] with `pred` being the prediction
    of the model and `lower` and `upper` constituting the lower- and upper
    bounds for the prediction interval around `pred`, respectively.
    """

    # get number of samples
    n = len(y_train)

    # compute the forecast errors/resid
    fe = y_train - y_fit

    # get percentile bounds
    percentile_lower = (alpha * 100) / 2
    percentile_higher = 100 - percentile_lower

    if nbootstrap is None:
        nbootstrap = np.sqrt(n).astype(int)
    if seed is None:
        rng = np.random.default_rng()
    else:
        rng = np.random.default_rng(seed)

    # bootstrap sample from errors
    error_bootstrap = []
    for _ in range(nbootstrap):
        idx = rng.integers(low=n)
        error_bootstrap.append(fe[idx])

    # get lower and higher percentiles of sampled forecast errors
    fe_lower = np.percentile(a=error_bootstrap, q=percentile_lower)
    fe_higher = np.percentile(a=error_bootstrap, q=percentile_higher)

    # compute P.I.
    pi = [y_pred_value + fe_lower, y_pred_value, y_pred_value + fe_higher]

    return pi

Statsmodels ARIMA：如何获得置信度/预测区间？

问题描述投票：0回答：3

3个回答

最新问题

Statsmodels ARIMA：如何获得置信度/预测区间？

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3