我正在尝试使用 ARMA ARIMA 模型预测每周销售额。我找不到调整
statsmodels
中的 order(p,d,q) 的函数。目前 R 有一个函数 forecast::auto.arima()
可以调整 (p,d,q) 参数。
如何为我的模型选择正确的顺序? python 中是否有用于此目的的库?
您可以实施多种方法:
ARIMAResults
包括aic
和bic
。根据他们的定义,(见这里和这里),这些标准惩罚模型中的参数数量。因此,您可以使用这些数字来比较模型。 scipy 也有 optimize.brute
它在指定的参数空间上进行网格搜索。所以像这样的工作流程应该可行:
def objfunc(order, exog, endog):
from statsmodels.tsa.arima.model import ARIMA
fit = ARIMA(endog, order, exog).fit()
return fit.aic()
from scipy.optimize import brute
grid = (slice(1, 3, 1), slice(1, 3, 1), slice(1, 3, 1))
brute(objfunc, grid, args=(exog, endog), finish=None)
确保你用
brute
打电话给finish=None
。
您可以从
pvalues
获得ARIMAResults
。因此,一种向前推进的算法很容易实现,其中模型的度数在维度上增加,从而为添加的参数获得最低 p 值。
ARIMAResults.predict
交叉验证替代模型。最好的方法是将时间序列的尾部(比如最近的 5% 的数据)保留在样本之外,并使用这些点来获得拟合模型的test error。
现在有一个合适的 python 包来做 auto-arima。 https://github.com/tgsmith61591/pmdarima
文件: http://alkaline-ml.com/pmdarima
示例用法:https://github.com/tgsmith61591/pmdarima/blob/master/examples/quick_start_example.ipynb
def evaluate_arima_model(X, arima_order):
# prepare training dataset
train_size = int(len(X) * 0.90)
train, test = X[0:train_size], X[train_size:]
history = [x for x in train]
# make predictions
predictions = list()
for t in range(len(test)):
model = ARIMA(history, order=arima_order)
model_fit = model.fit(disp=0)
yhat = model_fit.forecast()[0]
predictions.append(yhat)
history.append(test[t])
# calculate out of sample error
error = mean_squared_error(test, predictions)
return error
# evaluate combinations of p, d and q values for an ARIMA model
def evaluate_models(dataset, p_values, d_values, q_values):
dataset = dataset.astype('float32')
best_score, best_cfg = float("inf"), None
for p in p_values:
for d in d_values:
for q in q_values:
order = (p,d,q)
try:
mse = evaluate_arima_model(dataset, order)
if mse < best_score:
best_score, best_cfg = mse, order
print('ARIMA%s MSE=%.3f' % (order,mse))
except:
continue
print('Best ARIMA%s MSE=%.3f' % (best_cfg, best_score))
# load dataset
def parser(x):
return datetime.strptime('190'+x, '%Y-%m')
import datetime
p_values = [4,5,6,7,8]
d_values = [0,1,2]
q_values = [2,3,4,5,6]
warnings.filterwarnings("ignore")
evaluate_models(train, p_values, d_values, q_values)
这将为您提供 p、d、q 值,然后将这些值用于您的 ARIMA 模型
最简单的方法是通过
auto_arima
包(https://github.com/Nixtla/statsforecast)使用Nixtla的
statsforecast
模型。它是 forecast::auto.arima
函数的镜像实现,使用 numba
进行了优化。它具有更好的性能,并且比 R 和pmdarima
实现更快。
只需
pip
-使用pip install statsforecast
安装库。那么,
from statsforecast.core import StatsForecast
from statsforecast.models import auto_arima
fcst = StatsForecast(
df, #your data
models=[auto_arima],
freq='W', # frequency of your data
n_jobs=7, # you can also define the number of cores used for parallelizing
)
forecasts = fcst.forecast(12) #your horizon
这里是带有例子的笔记本。
我写了这些效用函数来直接计算pdq值 get_PDQ_parallel 需要三个输入数据,这些数据是以时间戳(日期时间)为索引的系列。 n_jobs 将提供并行处理器的数量。输出将是具有 aic 和 bic 值的数据帧,索引中的顺序=(P,D,Q) p 和 q 范围是 [0,12] 而 d 是 [0,1]
import statsmodels
from statsmodels import api as sm
from sklearn.metrics import r2_score,mean_squared_error
from sklearn.utils import check_array
from functools import partial
from multiprocessing import Pool
def get_aic_bic(order,series):
aic=np.nan
bic=np.nan
#print(series.shape,order)
try:
arima_mod=statsmodels.tsa.arima_model.ARIMA(series,order=order,freq='H').fit(transparams=True,method='css')
aic=arima_mod.aic
bic=arima_mod.bic
print(order,aic,bic)
except:
pass
return aic,bic
def get_PDQ_parallel(data,n_jobs=7):
p_val=13
q_val=13
d_vals=2
pdq_vals=[ (p,d,q) for p in range(p_val) for d in range(d_vals) for q in range(q_val)]
get_aic_bic_partial=partial(get_aic_bic,series=data)
p = Pool(n_jobs)
res=p.map(get_aic_bic_partial, pdq_vals)
p.close()
return pd.DataFrame(res,index=pdq_vals,columns=['aic','bic'])
到目前为止,我们可以直接使用 PyPI 中的 pyramid-arima 包。
可能的解决方案
df=pd.read_csv("http://vincentarelbundock.github.io/Rdatasets/csv/datasets/AirPassengers.csv")
# Define the p, d and q parameters to take any value between 0 and 2
p = d = q = range(0, 2)
print(p)
import itertools
import warnings
# Generate all different combinations of p, q and q triplets
pdq = list(itertools.product(p, d, q))
print(pdq)
# Generate all different combinations of seasonal p, q and q triplets
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]
print('Examples of parameter combinations for Seasonal ARIMA...')
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[1]))
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[2]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[3]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[4]))
Examples of parameter combinations for Seasonal ARIMA...
SARIMAX: (0, 0, 1) x (0, 0, 1, 12)
SARIMAX: (0, 0, 1) x (0, 1, 0, 12)
SARIMAX: (0, 1, 0) x (0, 1, 1, 12)
SARIMAX: (0, 1, 0) x (1, 0, 0, 12)
y=df
#warnings.filterwarnings("ignore") # specify to ignore warning messages
for param in pdq:
for param_seasonal in seasonal_pdq:
try:
mod = sm.tsa.statespace.SARIMAX(y,
order=param,
seasonal_order=param_seasonal,
enforce_stationarity=False,
enforce_invertibility=False)
results = mod.fit()
print('ARIMA{}x{}12 - AIC:{}'.format(param, param_seasonal, results.aic))
except:
continue
ARIMA(0, 0, 0)x(0, 0, 1, 12)12 - AIC:3618.0303991426763
ARIMA(0, 0, 0)x(0, 1, 1, 12)12 - AIC:2824.7439963684233
ARIMA(0, 0, 0)x(1, 0, 0, 12)12 - AIC:2942.2733127230185
ARIMA(0, 0, 0)x(1, 0, 1, 12)12 - AIC:2922.178151133141
ARIMA(0, 0, 0)x(1, 1, 0, 12)12 - AIC:2767.105066400224
ARIMA(0, 0, 0)x(1, 1, 1, 12)12 - AIC:2691.233398643673
ARIMA(0, 0, 1)x(0, 0, 0, 12)12 - AIC:3890.816777796087
ARIMA(0, 0, 1)x(0, 0, 1, 12)12 - AIC:3541.1171286722
ARIMA(0, 0, 1)x(0, 1, 0, 12)12 - AIC:3028.8377323188824
ARIMA(0, 0, 1)x(0, 1, 1, 12)12 - AIC:2746.77973129136
ARIMA(0, 0, 1)x(1, 0, 0, 12)12 - AIC:3583.523640623017
ARIMA(0, 0, 1)x(1, 0, 1, 12)12 - AIC:3531.2937768990187
ARIMA(0, 0, 1)x(1, 1, 0, 12)12 - AIC:2781.198675746594
ARIMA(0, 0, 1)x(1, 1, 1, 12)12 - AIC:2720.7023088205974
ARIMA(0, 1, 0)x(0, 0, 1, 12)12 - AIC:3029.089945668332
ARIMA(0, 1, 0)x(0, 1, 1, 12)12 - AIC:2568.2832251221016
ARIMA(0, 1, 0)x(1, 0, 0, 12)12 - AIC:2841.315781459511
ARIMA(0, 1, 0)x(1, 0, 1, 12)12 - AIC:2815.4011044132576
ARIMA(0, 1, 0)x(1, 1, 0, 12)12 - AIC:2588.533386513587
ARIMA(0, 1, 0)x(1, 1, 1, 12)12 - AIC:2569.9453272483315
ARIMA(0, 1, 1)x(0, 0, 0, 12)12 - AIC:3327.5177587522303
ARIMA(0, 1, 1)x(0, 0, 1, 12)12 - AIC:2984.716706112334
ARIMA(0, 1, 1)x(0, 1, 0, 12)12 - AIC:2789.128542154043
ARIMA(0, 1, 1)x(0, 1, 1, 12)12 - AIC:2537.0293659293943
ARIMA(0, 1, 1)x(1, 0, 0, 12)12 - AIC:2984.4555708516436
ARIMA(0, 1, 1)x(1, 0, 1, 12)12 - AIC:2939.460958374472
ARIMA(0, 1, 1)x(1, 1, 0, 12)12 - AIC:2578.7862352774437
ARIMA(0, 1, 1)x(1, 1, 1, 12)12 - AIC:2537.771484229265
ARIMA(1, 0, 0)x(0, 0, 0, 12)12 - AIC:3391.5248913820797
ARIMA(1, 0, 0)x(0, 0, 1, 12)12 - AIC:3038.142074281268
C:\Users\Dell\Anaconda3\lib\site-packages\statsmodels\base\model.py:496: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
"Check mle_retvals", ConvergenceWarning)
ARIMA(1, 0, 0)x(0, 1, 0, 12)12 - AIC:2839.809192263449
ARIMA(1, 0, 0)x(0, 1, 1, 12)12 - AIC:2588.50367175184
ARIMA(1, 0, 0)x(1, 0, 0, 12)12 - AIC:2993.4630440139595
ARIMA(1, 0, 0)x(1, 0, 1, 12)12 - AIC:2995.049216326931
ARIMA(1, 0, 0)x(1, 1, 0, 12)12 - AIC:2588.2463284315304
ARIMA(1, 0, 0)x(1, 1, 1, 12)12 - AIC:2592.80110502723
ARIMA(1, 0, 1)x(0, 0, 0, 12)12 - AIC:3352.0350133621478
ARIMA(1, 0, 1)x(0, 0, 1, 12)12 - AIC:3006.5493366627807
ARIMA(1, 0, 1)x(0, 1, 0, 12)12 - AIC:2810.6423724894516
ARIMA(1, 0, 1)x(0, 1, 1, 12)12 - AIC:2559.584031948852
ARIMA(1, 0, 1)x(1, 0, 0, 12)12 - AIC:2981.2250436794675
ARIMA(1, 0, 1)x(1, 0, 1, 12)12 - AIC:2959.3142304724834
ARIMA(1, 0, 1)x(1, 1, 0, 12)12 - AIC:2579.8245645892207
ARIMA(1, 0, 1)x(1, 1, 1, 12)12 - AIC:2563.13922589258
ARIMA(1, 1, 0)x(0, 0, 0, 12)12 - AIC:3354.7462930846423
ARIMA(1, 1, 0)x(0, 0, 1, 12)12 - AIC:3006.702997636003
ARIMA(1, 1, 0)x(0, 1, 0, 12)12 - AIC:2809.3844175191666
ARIMA(1, 1, 0)x(0, 1, 1, 12)12 - AIC:2558.484602766447
ARIMA(1, 1, 0)x(1, 0, 0, 12)12 - AIC:2959.885810636943
ARIMA(1, 1, 0)x(1, 0, 1, 12)12 - AIC:2960.712709764296
ARIMA(1, 1, 0)x(1, 1, 0, 12)12 - AIC:2557.945907092698
ARIMA(1, 1, 0)x(1, 1, 1, 12)12 - AIC:2559.274166458508
ARIMA(1, 1, 1)x(0, 0, 0, 12)12 - AIC:3326.3285511700374
ARIMA(1, 1, 1)x(0, 0, 1, 12)12 - AIC:2985.868532151721
ARIMA(1, 1, 1)x(0, 1, 0, 12)12 - AIC:2790.7677149967103
ARIMA(1, 1, 1)x(0, 1, 1, 12)12 - AIC:2538.820635541546
ARIMA(1, 1, 1)x(1, 0, 0, 12)12 - AIC:2963.2789505804294
ARIMA(1, 1, 1)x(1, 0, 1, 12)12 - AIC:2941.2436984747465
ARIMA(1, 1, 1)x(1, 1, 0, 12)12 - AIC:2559.8258191422606
ARIMA(1, 1, 1)x(1, 1, 1, 12)12 - AIC:2539.712354465328
另见https://github.com/decisionstats/pythonfordatascience/blob/master/time%2Bseries%20(1).ipynb
在conda中,使用
conda install -c saravji pmdarima
安装。
用户
saravji
已经放到anaconda cloud中了
然后使用,
from pmdarima.arima import auto_arima
(注意名字
pyramid-arima
改为pmdarima
)。
其实
def objfunc(order,*params ):
from statsmodels.tsa.arima_model import ARIMA
p,d,q = order
fit = ARIMA(endog, order, exog).fit()
return fit.aic()
from scipy.optimize import brute
grid = (slice(1, 3, 1), slice(1, 3, 1), slice(1, 3, 1))
brute(objfunc, grid, args=params, finish=None)
RAPIDScuML 机器学习库 具有auto-ARIMA 函数。
来自链接文档:
为样本内和样本外实施批量自动 ARIMA 模型 时间序列预测。
此界面提供高度可定制的搜索功能 类似于 R 中的
和forecast
包。它提供了一个 围绕底层 ARIMA 模型进行抽象以预测和预测 就像使用单个模型一样。fable
一些示例代码:
from cuml.tsa.auto_arima import AutoARIMA
model = AutoARIMA(y)
model.search(s=12, d=(0, 1), D=(0, 1), p=(0, 2, 4), q=(0, 2, 4),
P=range(2), Q=range(2), method="css", truncate=100)
model.fit(method="css-ml")
fc = model.forecast(20)
这是一个 notebook,其中包含更详细的 cuML ARIMA 示例。
cuML 库需要支持的 NVIDIA GPU(加上适当的驱动程序和安装的 CUDA),所以它应该很快。