用LSTM将单变量时间序列预测转变为多变量时间序列预测。

问题描述 投票:1回答:1

我是人工神经网络世界的新手,所以如果我犯了一些错误,请原谅我,如果你能纠正我。我想使用一个LSTM模型来能够预测市场上比特币的价格。我知道这个模型的实际局限性,但我是为了教育目的而创建的。

我不知道该把它定义为多层模型还是多变量模型(如果有人能解释其中的区别,我将感激不尽)基本上,一个在收盘价上训练的模型称为 "close",可以通过观察前60天的情况来预测第二天的收盘价。

我从这里建立模型没有问题,我只是和你说了一下,问题是我想用其他信息来训练模型,比如成交量或者当天的最高价。重要的是能够决定在模型中插入哪两类信息。我发现了一个网站,在那里 在Keras中使用LSTMs进行多变量时间序列预测。的详细解释,但我无法将其应用于我的具体案例。你能不能帮我把'成交量'这个变量整合到模型中,看看未来'收盘'收盘价的预测能力是提高了还是恶化了?

数据是这种类型的,可以从kaggle--&gt这里下载。下载 enter image description here

import pandas as pd
import math
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
import matplotlib.pyplot as plt


#Create a new dataframe with only the 'Close column
data = df.filter(['close'])
#Convert the dataframe to a numpy array
dataset = data.values
#Get the number of rows to train the model on
training_data_len = math.ceil( len(dataset) * .8 )

#scale data
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(dataset)

#Create the scaled training data set
train_data = scaled_data[0:training_data_len , :]
#Split the data into x_train and y_train data sets
x_train = []
y_train = []

for i in range(60, len(train_data)):
  x_train.append(train_data[i-60:i, 0])
  y_train.append(train_data[i, 0])
  # if i<= 61:
    # print(x_train)
    # print(y_train)
    # print()

#Convert the x_train and y_train to numpy arrays
x_train, y_train = np.array(x_train), np.array(y_train)

#Reshape the data
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
print (x_train.shape)

#Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape= (x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences= False))
model.add(Dense(25))
model.add(Dense(1))

#Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
#Train the model
model.fit(x_train, y_train, batch_size=1, epochs=1)

#Create the testing data set
#Create a new array containing scaled values 
test_data = scaled_data[training_data_len - 60: , :]
#Create the data sets x_test and y_test
x_test = []
y_test = dataset[training_data_len:, :]
for i in range(60, len(test_data)):
  x_test.append(test_data[i-60:i, 0])

#Convert the data to a numpy array
x_test = np.array(x_test)
#Reshape the data

x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1 ))

# print (len(x_test))
# #Get the models predicted price values
predictions = model.predict(x_test)

predictions = scaler.inverse_transform(predictions)
print(predictions)

#Get the root mean squared error (RMSE)
rmse=np.sqrt(np.mean(((predictions- y_test)**2)))
print (rmse)
python tensorflow keras lstm bitcoin
1个回答
0
投票

从代码和评论中,我了解到你是在对单变量数据进行时间序列预测(只有列是... 关闭),现在,想对多变量数据进行时间序列预测(与列。关闭体积).

对你来说,代码的重要部分将是函数。multivariate_data它根据过去60天的历史返回特征和标签,目标日期为1天。

past_history = 60
future_target = 1

完整的Multi_Variate Data的工作代码(直到训练)如下所示。

import pandas as pd
import math
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
import matplotlib.pyplot as plt

df = pd.read_csv('datasets_101543_240726.csv')

#Create a new dataframe with only the 'Close column
data = df.filter(['close', 'Volume'])
#Convert the dataframe to a numpy array
dataset = data.values
#Get the number of rows to train the model on
training_data_len = math.ceil( len(dataset) * .8 )

# scale data
data_mean = dataset[:training_data_len].mean(axis=0)
data_std = dataset[:training_data_len].std(axis=0)
dataset = (dataset-data_mean)/data_std

def multivariate_data(dataset, target, start_index, end_index, history_size,
                      target_size):
  data = []
  labels = []

  start_index = start_index + history_size
  if end_index is None:
    end_index = len(dataset) - target_size

  for i in range(start_index, end_index):
    indices = range(i-history_size, i)
    data.append(dataset[indices])

    labels.append(target[i:i+target_size])

  return np.array(data), np.array(labels)

past_history = 60
future_target = 1

x_train, y_train = multivariate_data(dataset, dataset[:, 0], 0,
                                                   training_data_len, past_history,
                                                   future_target)
x_val, y_val = multivariate_data(dataset, dataset[:, 0],
                                               training_data_len, None, past_history,
                                               future_target)


#Reshape the data
#x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
#print (x_train.shape)

#Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape= (x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences= False))
model.add(Dense(25))
model.add(Dense(1))

#Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
#Train the model
model.fit(x_train, y_train, batch_size=1, epochs=1)

在您的代码中测试数据。x_testy_test 可替换为 x_valy_val 而你可以执行 predictions 在该数据上。

请参考 Tensorflow教程 相关的多变量数据的时间序列预测的完整代码。

希望对大家有所帮助。祝大家学习愉快!

© www.soinside.com 2019 - 2024. All rights reserved.