我在将 SARIMA 模型应用于 Python 中的数据集时遇到问题 - 我正在使用百货商店的商店销售数据,并希望预测明年的季度情况。数据具有稳定性,我已将历史数据分成四分之一。数据来源截至2017年12月31日。
请参阅下面的 Python 代码和输出
from statsmodels.tsa.statespace.sarimax import SARIMAX
model = SARIMAX(quarterly_sales, order=order, seasonal_order=seasonal_order)
results = model. Fit()
forecast = results.get_forecast(steps=4)
forecast_index = pd.date_range(start='2013-01-01', periods=4, freq='Q')
forecast_series = pd.Series(forecast.predicted_mean, index=forecast_index)
print(forecast_series)
# Plot the historical quarterly sales data
quarterly_sales.plot(kind='bar', figsize=(10, 6), label='Historical Quarterly Sales')
# Check if the forecast index aligns with the expected future quarters
print(forecast_series.index)
# Overlay the forecasted sales with more visibility
plt.plot(forecast_series.index, forecast_series, color='red', marker='o', linestyle='dashed', linewidth=2, label='Forecasted Quarterly Sales')
plt.ylim(0, max(quarterly_sales.max(), forecast_series.max()) * 1.1)
plt.title('Quarterly Sales with Forecast')
plt.xlabel('Quarter')
plt.ylabel('Sales')
plt.xticks(rotation=45)
plt.legend()
plt.show()
new_index = [f"Q{date.quarter} {str(date.year)[-2:]}" for date in forecast_series.index]
forecast_series.index = new_index
import matplotlib.pyplot as plt
historical_index = [f"Q{date.quarter} {str(date.year)[-2:]}" for date in quarterly_sales.index]
quarterly_sales.index = historical_index
quarterly_sales.plot(kind='bar', figsize=(10, 6), label='Historical Quarterly Sales')
plt.plot(forecast_series.index, forecast_series, color='red', marker='o', linestyle='dashed', label='Forecasted Quarterly Sales')
plt.title('Quarterly Sales with Forecast')
plt.xlabel('Quarter')
plt.ylabel('Sales')
plt.xticks(rotation=45)
plt.legend()
plt.show()
上面是我尝试过的方法,但我在可视化图表上看不到任何我对未来 12 个月的预测,尽管它显示在图例的屏幕截图中。
下面的示例显示了合成季度数据的预测和绘图。这是您想要的输出类型吗?
trn window: 2000-03-31 to 2009-12-31 [ 40 samples] | abs % err: 47.23 | target: -0.493, forecast: -0.260
trn window: 2000-03-31 to 2010-03-31 [ 41 samples] | abs % err: 8.14 | target: -0.624, forecast: -0.573
trn window: 2000-03-31 to 2010-06-30 [ 42 samples] | abs % err: 51.47 | target: -0.515, forecast: -0.780
trn window: 2000-03-31 to 2010-09-30 [ 43 samples] | abs % err: 21.35 | target: -0.892, forecast: -0.702
trn window: 2000-03-31 to 2010-12-31 [ 44 samples] | abs % err: 28.81 | target: -0.717, forecast: -0.924
trn window: 2000-03-31 to 2011-03-31 [ 45 samples] | abs % err: 18.18 | target: -0.629, forecast: -0.743
...
import pandas as pd
import numpy as np
from matplotlib import dates as mdates
from matplotlib import pyplot as plt
import statsmodels.tsa.api as tsa
#
# Synthetic quarterly data
#
dates = pd.date_range('1/1/2000', '1/1/2025', freq='QE')
t = np.linspace(0, 2*np.pi * 4, len(dates))
sine = np.sin(t) * np.exp(-0.03 * t)
data = sine + np.random.normal(0, 0.1, dates.size)
df = pd.DataFrame({'signal': data}, index=dates) #index is the date
#
# Define SARIMA model
#
#Training start date
train_start = df.index[0] #"31/03/2000"
min_train_size = 40
#Forecasting will start after 30 samples, until end of the data
forecast_start = train_start + min_train_size * df.index.freq
forecast_end = df.index[-1] #"31/12/2014"
forecasts = []
for train_end in pd.date_range(forecast_start, forecast_end, freq=df.index.freq).shift(-1):
#the "shift(-1) is so we stop one sample before the forecast date
train_data = df[train_start:train_end]
#Fit the model and get the results object
results = tsa.ARIMA(
train_data,
order=(3, 1, 0),
seasonal_order=(1, 0, 0, 20)
).fit()
#Use results object to forecast the next sample
forecast = results.forecast()
forecasts.append(forecast)
#
#Report results for each forecast
#
forecast_date = forecast.index.item().date()
assert forecast_date == (train_end + df.index.freq).date()
actual = df.loc[[forecast_date]].values.item()
print(
f'trn window: {train_start.date()} to {train_end.date()}',
f'[{train_data.size:>3d} samples] |',
f'abs % err: {abs(1 - forecast.item() / actual) * 100:>7.2f} |',
f'target: {actual:>+5.3f},',
f'forecast: {forecast.item():>+5.3f}',
)
forecasts_df = pd.concat(forecasts, axis=0)
#
# Plot data and forecasts
#
#Plot the signal
ax = df.plot(
y='signal', use_index=True, xlabel='date', ylabel='signal',
marker='.', color='dodgerblue', linewidth=1, figsize=(8, 2)
)
#Plot the forecasts
forecasts_df.plot(
use_index=True, label='forecast', color='crimson', ax=ax
)
#Add some visual lines marking various dates
[ax.axvline(date, ymax=0.05, linewidth=5, color='tab:green')
for date in [train_start, forecast_start, forecast_end]
]
ax.legend(ncols=2, fontsize=8)