ARIMA / SARIMAX预测异常值

问题描述 投票:0回答:1

这些是在30天内每小时获取的一系列值,我按每小时的组来收集它们,如下所示的2组:

{'date':
['2019-11-09','2019-11-10','2019-11-11','2019-11-12','2019-11-13','2019-11-14','2019-11-15','2019-11-16','2019-11-17','2019-11-18','2019-11-19','2019-11-20','2019-11-21','2019-11-22','2019-11-23','2019-11-24','2019-11-25','2019-11-26','2019-11-27','2019-11-28','2019-11-29','2019-11-30','2019-12-01','2019-12-02','2019-12-03','2019-12-04','2019-12-05','2019-12-06','2019-12-07','2019-12-08'],
'hora0':[111666.5,121672.91666666667,87669.33333333333,89035.58333333333,91707.91666666667,94449.33333333333,103476.91666666667,123271.5,133306.58333333334,103149.91666666667,106310.25,91830.25,77733.75,96823.25,102880.25,118383.33333333333,95076.66666666667,93561.83333333333,97651.58333333333,112180.0,118051.75,135456.0,149553.0,125797.25,126098.0,128603.75,84631.08333333333,85683.16666666667,96377.16666666667,113161.16666666667],
'hora2':[83768.83333333333,83319.58333333333,72922.75,71893.75,73933.0,76598.83333333333,81021.75,93588.83333333333,94514.08333333333,87147.66666666667,91464.08333333333,74022.41666666667,63709.166666666664,75939.33333333333,79904.16666666667,84435.33333333333,76736.0,85237.33333333333,79162.75,91729.58333333333,99081.58333333333,106440.41666666667,112064.66666666667,111635.58333333333,110168.58333333333,111241.25,62634.083333333336,68203.33333333333,71515.16666666667,80674.66666666667]}

系列具有相似的分布:Hour samples for 30 days

AIC值是Akaike信息准则,它将预测模型相互比较。用于测试不同ARIMA模型并计算ARIMA模型范围以查看哪个AIC值最低的代码

def AIC_iteration_i(train):
filterwarnings("ignore")
#X = df2.values
history = [x for x in train.iloc[:,0]]
p = d = q = range(0,6)
pdq = list(product(p,d,q))
aic_results = []
parameter = []
for param in pdq:
try:
model = ARIMA(history, order=param)
results = model.fit(disp=0)
# You can print each (p,d,q) parameters uncommented line below 
#print('ARIMA{} - AIC:{}'.format(param, results.aic))
aic_results.append(results.aic)
parameter.append(param)
except:
continue
d = dict(ARIMA=parameter, AIC=aic_results)
results_table = pd.DataFrame(dict([ (k, pd.Series(v)) for k,v in d.items()]))
# AIC minimum value
order = results_table.loc[results_table['AIC'].idxmin()][0]
return order

对于每个系列,AIC值最低的(0, 2, 1)参数返回相同的顺序(p,d,q)

我的预测是通过下面的代码得到的,但是结果在2小时内还不能确定

# time series hora0.iloc[:,0] and hora1.iloc[:,0] from pandas df
trained = list(hora0.iloc[:,0])

# order got it above (0,2,1)
orders = order 

size = math.ceil(len(trained)*.8)
train, test = [trained[i] for i in range(size)] , [trained[i] for i in range(size,len(trained))]
predictions = []
predictionslower = []
predictionsupper = []
for k in range(len(test)):
model = ARIMA(trained, order=orders)
model_fit = model.fit(disp=0)
forecast, stderr, conf_int = model_fit.forecast()
yhat = forecast[0]
yhatlower = conf_int[0][0]
yhatupper = conf_int[0][1]
predictions.append(yhat)
predictionslower.append(yhatlower)
predictionsupper.append(yhatupper)
obs = test[k]
trained.append(obs)
#error = mean_squared_error(test, predictions)
predictions

预测

hour0 [113815.15072419723,128600.77967037176,131580.85654685542,83200.24743417211,83167.65192576911,95062.06180437957]`
prediction for `hour1 [79564.70753715932,112491.2694928094,114410.34654966182,60882.18766484651,nan,nan]

系列2的AIC也用pmd-arima进行了检查,该顺序对于SARIMAX模型而言是相同的值。请给我一点光。

python-3.x time-series regression arima pyramid-arima
1个回答
0
投票

数据的小时2(其他小时数)的值在时间序列上是平稳的,要消除平稳,我们可以对原始数据应用微分或自然对数:

hora2 = np.log('hora2')

{'date':['2019-11-09','2019-11-10','2019-11-11','2019-11-12','2019-11-13','2019-11-14','2019-11-15','2019-11-16','2019-11-17','2019-11-18','2019-11-19','2019-11-20','2019-11-21','2019-11-22','2019-11-23','2019-11-24','2019-11-25','2019-11-26','2019-11-27','2019-11-28','2019-11-29','2019-11-30','2019-12-01','2019-12-02','2019-12-03','2019-12-04','2019-12-05','2019-12-06','2019-12-07','2019-12-08'],
'hora2':[11.3358163,11.33043889,11.19715594,11.18294461,11.21091456,11.24633712,11.30247292,11.44666635,11.45650413,11.37535928,11.42370164,11.21212325,11.06208373,11.23769005,11.28858328,11.34374123,11.24812624,11.3531948,11.27926114,11.42660022,11.50369886,11.57534064,11.62683136,11.62299513,11.60976705,11.61945655,11.04506487,11.13024872,11.17766483,11.29817989]}

一旦获得每个“ horaX”系列具有最小AIC值(Akaike信息准则)的模型ARIMA(trained, order=orders)的订单。某些系列仍在预测中返回NaN值,我不得不采用第二或第三个最小化的AIC值,返回了预测结果,并应用了指数对数来恢复原始值。

{'hora2':[11.6948938,12.00191037,11.81401922,11.77476296,11.83965601,11.89443423]}

hora2 = np.exp('hora2')

{'hora2':[119957.62142129,163066.00981609,135133.60347713,129931.53854787,138642.78415756,146449.24980086]}

对测试数据的预测结果如图所示:

enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.