我正在尝试使用 pycaret autoML 包使用 google colab 中以下链接parts_revenue_data中的数据进行时间序列预测。当我尝试比较模型并找到最好的模型时,代码挂起并保持在 20%。
代码可以在下面找到
# Only enable critical logging (Optional)
import os
os.environ["PYCARET_CUSTOM_LOGGING_LEVEL"] = "CRITICAL"
def what_is_installed():
from pycaret import show_versions
show_versions()
try:
what_is_installed()
except ModuleNotFoundError:
!pip install pycaret
what_is_installed()
import pandas as pd
import numpy as np
import pycaret
pycaret.__version__ # 3.1.0
df = pd.read_csv('parts_revenue.csv', delimiter=';')
from pycaret.utils.time_series import clean_time_index
cleaned = clean_time_index(data=df,
index_col='Posting Date',
freq='D')
# Verify the resulting DataFrame
print(cleaned.head(n=50))
# parts['MA12'] = parts['Parts Revenue'].rolling(12).mean()
# import plotly.express as px
# fig = px.line(parts, x="Posting Date", y=["Parts Revenue",
# "MA12"], template = 'plotly_dark')
# fig.show()
import time
import numpy as np
from pycaret.time_series import *
# We want to forecast the next 12 days of data and we will use 3
# fold cross-validation to test the models.
fh = 12 # or alternately fh = np.arange(1,13)
fold = 3
# Global Figure Settings for notebook ----
# Depending on whether you are using jupyter notebook, jupyter lab,
# Google Colab, you may have to set the renderer appropriately
# NOTE: Setting to a static renderer here so that the notebook
# saved size is reduced.
fig_kwargs = {
# "renderer": "notebook",
"renderer": "png",
"width": 1000,
"height": 600,
}
"""## EDA"""
eda = TSForecastingExperiment()
eda.setup(cleaned,
fh=fh,
numeric_imputation_target = 0,
fig_kwargs=fig_kwargs
)
eda.plot_model()
eda.plot_model(plot="diagnostics",
fig_kwargs={"height": 800, "width": 1000}
)
eda.plot_model(
plot="diff",
data_kwargs={"lags_list": [[1], [1, 7]],
"acf": True,
"pacf": True,
"periodogram": True},
fig_kwargs={"height": 800, "width": 1500} )
"""## Modeling"""
exp = TSForecastingExperiment()
exp.setup(data = cleaned,
fh=fh,
numeric_imputation_target = 0.0,
fig_kwargs=fig_kwargs,
seasonal_period = 5
)
# compare baseline models
best = exp_ts.compare_models(errors = 'raise') # CODE HANGS HERE!
# plot forecast for 36 months in future
plot_model(best,
plot = 'forecast',
data_kwargs = {'fh' : 24}
)
这与 pycaret 中的错误有关还是代码有问题?
注意:我没有足够的代表来发表评论,所以我将把这个准解决方法放在这里,如果需要的话我可以稍后删除它,或者在我有足够的代表后将其移至评论
我还经历过
compare_models
,当我使用 M1 Max 进行 MBP 时,时间序列异常缓慢(即,在大约 4000 条记录的数据集上运行超过 10 分钟)。我没有在 Colab 中尝试过。
注意到它挂在 Auto ARIMA 上,我将其从列表中排除,如下所示。这将运行时间减少到大约 1 分钟。
# compare baseline models
best = exp_ts.compare_models(errors="raise", exclude="auto_arima")
虽然我知道这本身并不是一个解决方案,但也许它可以帮助您解锁。
环境详情:
Python 3.10.12
pycaret==3.1.0