尽管已经成功训练和测试了模型,但仍要尝试预测会导致ValueError的新数据

问题描述 投票:0回答:1

我的目标是根据我的回归模型查看第二天​​的比特币价格预测。我不认为错误的原因在于我的原始数据,因为它不包含满足NaN,无穷大或太大描述的值。我能够在训练和测试数据段中构建和评估模型。我怀疑我使用的语言不正确,并且没有告诉预测函数来预测我认为我在这里的情况,而不是在代码中的早些时候。

违规部分:X = BTC [-1:]->打印(regressor.predict(X))

这里是导致此错误的相关部分,我使用的是来自sklearn的DecisionTreeRegressor,并且没有选择功能,

#Import libraries and Data
import pandas as pd
import numpy as np
import talib
import matplotlib.pyplot as plt
%matplotlib inline
dataset = pd.read_csv(r'C:\Users\Admin\BTC.csv')
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split

#name open, high, low, close and volume data from csv 

BTC = dataset = pd.read_csv(r'C:\Users\Admin\BTC.csv')


#Convert Data from Int to Float

BTC.Volume = BTC.Volume.astype(float)
BTC.High = BTC.High.astype(float)
BTC.Low = BTC.Low.astype(float)
BTC.Close = BTC.Close.astype(float)

#Create forward looking columns using shift

BTC['NextDayPrice'] = BTC['Close'].shift(-1)

#Copy dataframe and clean data (remove data consumed by lagging Indicators)

BTC_cleanData = BTC.copy()
BTC_cleanData.dropna(inplace=True)

#Split Data into Training and Testing Set
#separate the features and targets into separate datasets.
#split the data into training and testing sets using a 70/30 split 
#Using splicing, we will separate the features from the target into individual data sets.  
X_all = BTC_cleanData.iloc[:, BTC_cleanData.columns != 'NextDayPrice']  # feature values for all days
y_all = BTC_cleanData['NextDayPrice']  # corresponding targets/labels
print (X_all.head())  # print the first 5 rows
from sklearn.linear_model import LinearRegression

#Split the data into training and testing sets using the given feature as the target
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_all, y_all, test_size=0.30, random_state=42)

#Create a decision tree regressor and fit it to the training set
regressor = LinearRegression()

regressor.fit(X_train,y_train)

print ("Training set: {} samples".format(X_train.shape[0]))
print ("Test set: {} samples".format(X_test.shape[0]))

#Evaluate Model (in-sample Accuracy and Mean Squared Error)
from sklearn.model_selection import cross_validate
from sklearn.model_selection import cross_val_score

scores = cross_val_score(regressor, X_test, y_test, cv=10)
print ("accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() / 2))    

from sklearn.metrics import mean_squared_error

mse = mean_squared_error(y_test, regressor.predict(X_test))
print("MSE: %.4f" % mse)

#Predict Next Day Price

X=BTC[-1:]
print(regressor.predict(X))

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-59-d171bb7e543f> in <module>
      2 
      3 X=BTC[-1:]
----> 4 print(regressor.predict(X))

~\anaconda3\lib\site-packages\sklearn\linear_model\_base.py in predict(self, X)
    223             Returns predicted values.
    224         """
--> 225         return self._decision_function(X)
    226 
    227     _preprocess_data = staticmethod(_preprocess_data)

~\anaconda3\lib\site-packages\sklearn\linear_model\_base.py in _decision_function(self, X)
    205         check_is_fitted(self)
    206 
--> 207         X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
    208         return safe_sparse_dot(X, self.coef_.T,
    209                                dense_output=True) + self.intercept_

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    576         if force_all_finite:
    577             _assert_all_finite(array,
--> 578                                allow_nan=force_all_finite == 'allow-nan')
    579 
    580     if ensure_min_samples > 0:

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
     58                     msg_err.format
     59                     (type_err,
---> 60                      msg_dtype if msg_dtype is not None else X.dtype)
     61             )
     62     # for object dtype data, we only check for NaNs (GH-13254)

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

感谢您提供任何见解或帮助您解决此问题。

python scikit-learn valueerror
1个回答
0
投票

检查X是否包含NaN

从X删除空值并进行如下预测。

print(regressor.predict(X.dropna()))
© www.soinside.com 2019 - 2024. All rights reserved.