输入类型不支持 ufunc 'isnan',并且根据转换规则 'safe' 无法将输入安全地强制为任何支持的类型

问题描述 投票:0回答:1

我正在尝试训练一些机器学习模型来预测 NASDAQ-100 股票列表中选定的 4 只股票的价格走势。

我对 Python 非常陌生,所以我遇到了一些无法解决的问题。第一个是尝试使用 ARIMA 模型时。执行代码时出现以下错误:

None if faux_endog else np.any(np.isnan(self.endog))) TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

我已经尝试使用

dropna()
fillna()
isna()
来查找/删除 NaN 或 NULL 值。所以应该没有了。

这是我的代码:

# Imports
import os
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import io
from PIL import Image
import statsmodels.api as sm
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error, mean_absolute_error
import math
from sklearn.preprocessing import MinMaxScaler
import tensorflow as tf

# Chosen stocks from NASDAQ-100
chosen_stocks = ['CTSH', 'BKNG', 'REGN', 'MSFT']

def get_data():
    # Get list of tickers
    tickers = open("dataset/nasdaq_100_tickers.txt", "r")
    data = tickers.read().splitlines()

    # Check if the data has already been downloaded, drop NaN values
    if os.path.exists('dataframe.csv'):
        dataframe = pd.read_csv('dataframe.csv', index_col="Date", parse_dates=True).dropna()
    else:
        # Download Close data from Yahoo Finance
        data = yf.download(tickers=data, period='1y', interval='1d')['Close']
        data.to_csv('dataframe.csv')
        # Convert array to Pandas dataframe, drop NaN values
        complete_data = data.dropna()
        dataframe = pd.DataFrame(complete_data)
    dataframe.drop(['GEHC'], axis=1, inplace=True) # Dropping GEHC because it contains NULL values

    return dataframe



def arima_prediction(stock):
    train_data, test_data = stock[3:int(len(dataframe) * 0.5)], stock[int(len(dataframe) * 0.5):]
    train_arima = train_data
    test_arima = test_data

    history = [x for x in train_arima]
    y = test_arima
    predictions = list()
    model = ARIMA(history, order=(1, 1, 0))
    model_fit = model.fit()
    forecast = model_fit.forecast()[0]
    predictions.append(forecast)
    history.append(y[0])

    for i in range(1, len(y)):
        # Predict
        model = ARIMA(history, order=(1, 1, 0))
        model_fit = model.fit()
        forecast = model_fit.forecast()[0]
        # Invert transformed prediction
        predictions.append(forecast)
        # Observation
        observation = y[i]
        history.append(observation)

    # Report performance
    mean_squared = mean_squared_error(y, predictions)
    print('Mean Squared Error: ' + str(mean_squared))
    mean_absolute = mean_absolute_error(y, predictions)
    print('Mean Absolute Error: ' + str(mean_absolute))
    root_mean_squared = math.sqrt(mean_squared_error(y, predictions))
    print('Root Mean Squared Error: ' + str(root_mean_squared))

dataframe = get_data()
for stock in chosen_stocks:
    arima_prediction(stock)

我的数据框如下所示:

                  AAPL        ABNB  ...         ZM          ZS
Date                                ...                       
2022-12-15  136.500000   90.610001  ...  70.199997  117.169998
2022-12-16  134.509995   89.570000  ...  69.860001  114.209999
2022-12-19  132.369995   85.930000  ...  69.089996  112.269997
2022-12-20  132.300003   87.620003  ...  68.559998  113.540001
2022-12-21  135.449997   87.070000  ...  69.930000  112.769997
...                ...         ...  ...        ...         ...
2023-11-28  190.399994  127.559998  ...  67.529999  193.850006
2023-11-29  189.369995  126.480003  ...  67.949997  199.839996
2023-11-30  189.949997  126.339996  ...  67.830002  197.529999
2023-12-01  191.240005  135.020004  ...  70.290001  198.029999
2023-12-04  188.669998  134.539993  ...  67.720001  197.919998

完整的回溯是:


Traceback (most recent call last):
  File "C:/Users/xxx/source/repos/Project/main.py", line 370, in <module>
    arima_prediction(stock)
  File "C:/Users/xxx/source/repos/Project/main.py", line 217, in arima_prediction
    model = ARIMA(history, order=(1, 1, 0))
  File "C:\Users\xxx\source\repos\Project\venv\lib\site-packages\statsmodels\tsa\arima\model.py", line 158, in __init__
    self._spec_arima = SARIMAXSpecification(
  File "C:\Users\xxx\source\repos\Project\venv\lib\site-packages\statsmodels\tsa\arima\specification.py", line 458, in __init__
    None if faux_endog else np.any(np.isnan(self.endog)))
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Process finished with exit code 1

如有任何帮助,我们将不胜感激。

python pandas machine-learning arima
1个回答
0
投票

事实证明,正如评论者所说,问题是由于股票没有正确传递而引起的。

因此,为了确保股票正确传递而不是作为字符串传递,而不是通过以下方式传递股票:

for stock in chosen_stocks:
    arima_prediction(stock)

我改为使用:

def get_stock_data(dataframe):
    get_stock_data = dataframe.iloc[:, 30]
    return get_stock_data

stock_data = get_stock_data(dataframe)
arima_prediction(stock_data)

谢谢大家的帮助!

© www.soinside.com 2019 - 2024. All rights reserved.