我正在尝试训练一些机器学习模型来预测 NASDAQ-100 股票列表中选定的 4 只股票的价格走势。
我对 Python 非常陌生,所以我遇到了一些无法解决的问题。第一个是尝试使用 ARIMA 模型时。执行代码时出现以下错误:
None if faux_endog else np.any(np.isnan(self.endog))) TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
我已经尝试使用
dropna()
、fillna()
和 isna()
来查找/删除 NaN 或 NULL 值。所以应该没有了。
这是我的代码:
# Imports
import os
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import io
from PIL import Image
import statsmodels.api as sm
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error, mean_absolute_error
import math
from sklearn.preprocessing import MinMaxScaler
import tensorflow as tf
# Chosen stocks from NASDAQ-100
chosen_stocks = ['CTSH', 'BKNG', 'REGN', 'MSFT']
def get_data():
# Get list of tickers
tickers = open("dataset/nasdaq_100_tickers.txt", "r")
data = tickers.read().splitlines()
# Check if the data has already been downloaded, drop NaN values
if os.path.exists('dataframe.csv'):
dataframe = pd.read_csv('dataframe.csv', index_col="Date", parse_dates=True).dropna()
else:
# Download Close data from Yahoo Finance
data = yf.download(tickers=data, period='1y', interval='1d')['Close']
data.to_csv('dataframe.csv')
# Convert array to Pandas dataframe, drop NaN values
complete_data = data.dropna()
dataframe = pd.DataFrame(complete_data)
dataframe.drop(['GEHC'], axis=1, inplace=True) # Dropping GEHC because it contains NULL values
return dataframe
def arima_prediction(stock):
train_data, test_data = stock[3:int(len(dataframe) * 0.5)], stock[int(len(dataframe) * 0.5):]
train_arima = train_data
test_arima = test_data
history = [x for x in train_arima]
y = test_arima
predictions = list()
model = ARIMA(history, order=(1, 1, 0))
model_fit = model.fit()
forecast = model_fit.forecast()[0]
predictions.append(forecast)
history.append(y[0])
for i in range(1, len(y)):
# Predict
model = ARIMA(history, order=(1, 1, 0))
model_fit = model.fit()
forecast = model_fit.forecast()[0]
# Invert transformed prediction
predictions.append(forecast)
# Observation
observation = y[i]
history.append(observation)
# Report performance
mean_squared = mean_squared_error(y, predictions)
print('Mean Squared Error: ' + str(mean_squared))
mean_absolute = mean_absolute_error(y, predictions)
print('Mean Absolute Error: ' + str(mean_absolute))
root_mean_squared = math.sqrt(mean_squared_error(y, predictions))
print('Root Mean Squared Error: ' + str(root_mean_squared))
dataframe = get_data()
for stock in chosen_stocks:
arima_prediction(stock)
我的数据框如下所示:
AAPL ABNB ... ZM ZS
Date ...
2022-12-15 136.500000 90.610001 ... 70.199997 117.169998
2022-12-16 134.509995 89.570000 ... 69.860001 114.209999
2022-12-19 132.369995 85.930000 ... 69.089996 112.269997
2022-12-20 132.300003 87.620003 ... 68.559998 113.540001
2022-12-21 135.449997 87.070000 ... 69.930000 112.769997
... ... ... ... ... ...
2023-11-28 190.399994 127.559998 ... 67.529999 193.850006
2023-11-29 189.369995 126.480003 ... 67.949997 199.839996
2023-11-30 189.949997 126.339996 ... 67.830002 197.529999
2023-12-01 191.240005 135.020004 ... 70.290001 198.029999
2023-12-04 188.669998 134.539993 ... 67.720001 197.919998
完整的回溯是:
Traceback (most recent call last):
File "C:/Users/xxx/source/repos/Project/main.py", line 370, in <module>
arima_prediction(stock)
File "C:/Users/xxx/source/repos/Project/main.py", line 217, in arima_prediction
model = ARIMA(history, order=(1, 1, 0))
File "C:\Users\xxx\source\repos\Project\venv\lib\site-packages\statsmodels\tsa\arima\model.py", line 158, in __init__
self._spec_arima = SARIMAXSpecification(
File "C:\Users\xxx\source\repos\Project\venv\lib\site-packages\statsmodels\tsa\arima\specification.py", line 458, in __init__
None if faux_endog else np.any(np.isnan(self.endog)))
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Process finished with exit code 1
如有任何帮助,我们将不胜感激。
事实证明,正如评论者所说,问题是由于股票没有正确传递而引起的。
因此,为了确保股票正确传递而不是作为字符串传递,而不是通过以下方式传递股票:
for stock in chosen_stocks:
arima_prediction(stock)
我改为使用:
def get_stock_data(dataframe):
get_stock_data = dataframe.iloc[:, 30]
return get_stock_data
stock_data = get_stock_data(dataframe)
arima_prediction(stock_data)
谢谢大家的帮助!