如何生成“较低”和“较高”预测,而不仅仅是“yhat”?
import statsmodels
from statsmodels.tsa.arima.model import ARIMA
assert statsmodels.__version__ == '0.12.0'
arima = ARIMA(df['value'], order=order)
model = arima.fit()
现在我可以生成“yhat”预测
yhat = model.forecast(123)
并获取模型参数的置信区间(但不适用于预测):
model.conf_int()
但是如何生成
yhat_lower
和 yhat_upper
预测呢?
一般来说,
forecast
和predict
方法仅产生点预测,而get_forecast
和get_prediction
方法产生包括预测区间的完整结果。
在您的示例中,您可以执行以下操作:
forecast = model.get_forecast(123)
yhat = forecast.predicted_mean
yhat_conf_int = forecast.conf_int(alpha=0.05)
如果您的数据是 Pandas Series,那么
yhat_conf_int
将是一个包含两列的 DataFrame,lower <name>
和 upper <name>
,其中 <name>
是 Pandas Series 的名称。
如果您的数据是 numpy 数组(或 Python 列表),那么
yhat_conf_int
将是 (n_forecasts, 2)
数组,其中第一列是区间的下部,第二列是上部。
使用 ARIMA,您需要自己在模型中包含季节性和外生变量。在使用 SARIMA(季节性 ARIMA)或 SARIMAX(也适用于外生因素)实施时,C.I.到摘要框架:
import statsmodels.api as sm
import matplotlib.pyplot as plt
import pandas as pd
dta = sm.datasets.sunspots.load_pandas().data[['SUNACTIVITY']]
dta.index = pd.Index(pd.date_range("1700", end="2009", freq="A"))
print(dta)
print("init data:\n")
dta.plot(figsize=(12,4));
plt.show()
##print("SARIMAX(dta, order=(2,0,0), trend='c'):\n")
result = sm.tsa.SARIMAX(dta, order=(2,0,0), trend='c').fit(disp=False)
print(">>> result.params:\n", result.params, "\n")
##print("SARIMA_model.plot_diagnostics:\n")
result.plot_diagnostics(figsize=(15,12))
plt.show()
# summary stats of residuals
print(">>> residuals.describe:\n", result.resid.describe(), "\n")
# Out-of-sample forecasts are produced using the forecast or get_forecast methods from the results object
# The get_forecast method is more general, and also allows constructing confidence intervals.
fcast_res1 = result.get_forecast()
# specify that we want a confidence level of 90%
print(">>> forecast summary at alpha=0.01:\n", fcast_res1.summary_frame(alpha=0.10), "\n")
# plot forecast
fig, ax = plt.subplots(figsize=(15, 5))
ax = dta.plot(label='Training Data')
# Construct the forecasts
fcast = result.get_forecast('2050Q4').summary_frame()
print(fcast)
fcast['mean'].plot(ax=ax, style='k--')
ax.fill_between(fcast.index, fcast['mean_ci_lower'], fcast['mean_ci_upper'], color='k', alpha=0.1);
fig.tight_layout()
plt.show()
docs:“上面的预测可能看起来不太令人印象深刻,因为它几乎是一条直线。这是因为这是一个非常简单的单变量预测模型。尽管如此,请记住,这些简单的预测模型可能非常有竞争力”
附注此处“您可以通过将季节性项设置为零来以非季节性方式使用它。”
要生成预测区间而不是置信区间(您已巧妙地对其进行了区分,并且在 Hyndman 的博客文章中介绍了预测区间和置信区间之间的差异),那么您可以遵循本文档中提供的指导回答。
您还可以尝试计算自举预测区间,这在这个answer中列出。
下面是我实现这一点的尝试(当我有机会更详细地检查它时,我会更新它):
def bootstrap_prediction_interval(y_train: Union[list, pd.Series],
y_fit: Union[list, pd.Series],
y_pred_value: float,
alpha: float = 0.05,
nbootstrap: int = None,
seed: int = None):
"""
Bootstraps a prediction interval around an ARIMA model's predictions.
Method presented clearly here:
- https://stats.stackexchange.com/a/254321
Also found through here, though less clearly:
- https://otexts.com/fpp3/prediction-intervals.html
Can consider this to be a time-series version of the following generalisation:
- https://saattrupdan.github.io/2020-03-01-bootstrap-prediction/
:param y_train: List or Series of training univariate time-series data.
:param y_fit: List or Series of model fitted univariate time-series data.
:param y_pred_value: Float of the model predicted univariate time-series you want to compute P.I. for.
:param alpha: float = 0.05, the prediction uncertainty.
:param nbootstrap: integer = 1000, the number of bootstrap sampling of the residual forecast error.
Rules of thumb provided here:
- https://stats.stackexchange.com/questions/86040/rule-of-thumb-for-number-of-bootstrap-samples
:param seed: Integer to specify if you want deterministic sampling.
:return: A list [`lower`, `pred`, `upper`] with `pred` being the prediction
of the model and `lower` and `upper` constituting the lower- and upper
bounds for the prediction interval around `pred`, respectively.
"""
# get number of samples
n = len(y_train)
# compute the forecast errors/resid
fe = y_train - y_fit
# get percentile bounds
percentile_lower = (alpha * 100) / 2
percentile_higher = 100 - percentile_lower
if nbootstrap is None:
nbootstrap = np.sqrt(n).astype(int)
if seed is None:
rng = np.random.default_rng()
else:
rng = np.random.default_rng(seed)
# bootstrap sample from errors
error_bootstrap = []
for _ in range(nbootstrap):
idx = rng.integers(low=n)
error_bootstrap.append(fe[idx])
# get lower and higher percentiles of sampled forecast errors
fe_lower = np.percentile(a=error_bootstrap, q=percentile_lower)
fe_higher = np.percentile(a=error_bootstrap, q=percentile_higher)
# compute P.I.
pi = [y_pred_value + fe_lower, y_pred_value, y_pred_value + fe_higher]
return pi