我正在尝试创建一个 LSTM 模型来预测接下来 10 行的特定值(数据集的第一列,idx 0)。输入序列包含 10 行时间序列和 19 个特征
for i in range(sequence_length, len(data) - 10):
sequences.append(data.iloc[i-sequence_length:i, 2:2+input_size].values)
labels.append(data.iloc[i + 1: i + 11, 0])
样本数据:
c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16,c17,c18,c19,c20,c21
1.084,1.08405,1.0841,1.08405,1.0841,1.084,11240,6.249999999985434e-05,-1.0164458235761842e-05,-5.1788748878102555e-05,1.0840285714285716,1.0840928571428572,1.0840280952380952,1.08405,-0.000937629492890638,0.8237791754445127,-0.009223815892633767,49.223395431868134,-3.13680151375703,0.010743580701520136,1000.2306464528247
1.084,1.08405,1.08405,1.08405,1.0841,1.08405,14158,-2.4999999999941735e-05,-9.32997172098382e-06,-6.046625792230974e-05,1.0840285714285716,1.0840857142857143,1.0840309523809522,1.084046103896104,-0.0008606520795521739,3.185291329162407,-0.009223815892633767,49.223395431868134,-2.9477598235694686,0.009208783458445832,1000.2306464528247
1.0839,1.08395,1.08405,1.08395,1.08405,1.08385,19095,-0.00015749999999981057,-1.6547055257998267e-05,-7.543797446324434e-05,1.0840142857142856,1.0840690476190478,1.0840204761904761,1.0840337662337662,-0.0015264100999568611,8.156945531675506,-0.009224666758912318,41.76004501048701,-4.958497925954123,-0.26489247132130206,1000.2306464528247
1.08395,1.084,1.08395,1.084,1.084,1.08385,12756,-0.0001474999999999671,-1.8024291017937344e-05,-9.06405060916429e-05,1.0840035714285714,1.0840547619047618,1.0840185714285715,1.084027489177489,-0.0016626858514864735,7.660743847017261,0.009225943352706798,46.15600965239905,-5.393125751532237,-0.13593640398949522,1000.2767846809004
损失大大减少(设法将其降至 3.1…e-8),但一个序列的预测始终是相同的数字。
例如,序列的标签可以是
[1.084,1.0845,1.084,1.08395,1.0839,1.0838,1.0839,1.084,1.0845,1.084]
以及我得到的预测
[1.08395,1.08395,1.08395,1.08395,1.08395,1.08395,1.08395,1.08395,1.08395,1.08395]
目前我使用的批量大小为 32,所以我得到的结果大致如下:
[
[1.08395,1.08395,..]
[1.0841,1.0841,..]
..
]
我不明白为什么预测不遵循他们想要预测的价值的动量。显然,即使损失正在减少,在尝试预测接下来的 n 行时获得相同的值也是没有用的......
import torch
import pandas as pd
import torch.nn as nn
import numpy as np
from torch.utils.data import DataLoader, TensorDataset
from torch.optim.lr_scheduler import StepLR
import matplotlib.pyplot as plt
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
class CustomLSTM(nn.Module):
def __init__(self, input_size, hidden_size, output_size, dropout, num_layers):
super(CustomLSTM, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.lstm = nn.LSTM(input_size, hidden_size, num_layers=num_layers, batch_first=True, bidirectional=True)
self.relu = nn.ReLU() # ReLU activation layer
self.bn = nn.BatchNorm1d(hidden_size * 2) # Batch normalization layer
self.dropout = nn.Dropout(dropout)
self.fc = nn.Linear(hidden_size * 2, output_size)
def forward(self, x):
h0 = torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size).double().to(x.device)
c0 = torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size).double().to(x.device)
x = torch.nn.functional.normalize(x)
out, _ = self.lstm(x, (h0, c0))
out = self.relu(out[:, -1, :]) # Apply ReLU activation
out = self.bn(out) # Apply batch normalization
out = self.dropout(out) # Apply dropout
out = self.fc(out)
return out
input_size = 19 # Number of input features
# Loss calculation for regression model
criterion = nn.MSELoss()
data = pd.read_csv('chapter6/a_without_normalization.csv')
# Split the dataset into train and test sets
train_size = int(0.9 * len(data))
test_size = len(data) - train_size
train_dataset, test_dataset = data[:train_size], data[train_size:]
def create_sequences(data, sequence_length):
sequences = []
labels = []
for i in range(sequence_length, len(data) - 10):
sequences.append(data.iloc[i-sequence_length:i, 2:2+input_size].values)
labels.append(data.iloc[i + 1: i + 11, 0])
return np.array(sequences), np.array(labels)
sequence_length = 10
train_sequences, train_labels = create_sequences(train_dataset, sequence_length)
test_sequences, test_labels = create_sequences(test_dataset, sequence_length)
# Convert to PyTorch tensors
train_sequences = torch.from_numpy(train_sequences)
train_labels = torch.from_numpy(train_labels)
test_sequences = torch.from_numpy(test_sequences)
test_labels = torch.from_numpy(test_labels)
# Create a TensorDataset from sequences and labels
train_dataset = TensorDataset(train_sequences, train_labels)
test_dataset = TensorDataset(test_sequences, test_labels)
batch_size = 32
dropout = 0.2
hidden_size = 64
weight_decay = 0.001
lstm_layers = 2
lr = 0.001
output_size = 10 # Number of output features
num_epochs = 101
model_eval_every = 2
print_loss_every = 1
save_model_every = 2500
# Create a DataLoader with the current batch size
train_dataloader = DataLoader(train_dataset, batch_size=batch_size)
test_dataloader = DataLoader(test_dataset, batch_size=batch_size)
train_dataloader_len = len(train_dataloader)
# Instantiate the model
model = CustomLSTM(input_size, hidden_size, output_size, dropout, lstm_layers).double().to(device)
# Define the optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
# Define the scheduler
scheduler = StepLR(optimizer, step_size=30, gamma=0.6)
print(f'Training with weight_decay {weight_decay}')
for epoch in range(num_epochs):
total_loss = 0
for batch in train_dataloader:
# Unpack the batch
batch_sequences, batch_labels = batch[0].to(device), batch[1].to(device)
# Pass the batch through the model
output = model(batch_sequences).squeeze()
# Compute the loss
loss = criterion(output, batch_labels)
total_loss += loss.item()
# Backpropagate the loss
optimizer.zero_grad()
loss.backward()
optimizer.step()
# Update the learning rate
scheduler.step()
这是因为值之间的差异太小了吗?
您遇到的问题可能与输入和输出值的规模有关。在处理 LSTM 网络时,尤其是在使用 ReLU 等激活函数时,输入数据和目标值的规模会对训练过程产生重大影响。
以下是解决此问题的一些建议:
1。标准化:
标准化您的输入和输出数据。您已经对输入数据进行了标准化 (x = torch.nn.function.normalize(x)),但您可能想检查它是否足够。确保输入和输出值缩放到相似的范围。您可以使用最小-最大缩放或 Z 分数归一化等技术。
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
data.iloc[:, 2:] = scaler.fit_transform(data.iloc[:, 2:])
2。损失函数:
考虑使用更适合回归任务且对目标值的规模不太敏感的损失函数。您可以尝试平均绝对误差 (MAE),它对异常值的敏感度比均方误差 (MSE) 稍低一些。
criterion = nn.L1Loss() # Mean Absolute Error
3.模型复杂度:
尝试模型的复杂性。过于复杂的模型可能会过度拟合训练数据,导致泛化能力较差。您可以尝试减少隐藏层或神经元的数量。尝试例如:
hidden_size = 32
dropout = 0.5
lstm_layers = 1
4。渐变裁剪:
实施梯度裁剪以避免梯度爆炸,当梯度规模太大时可能会发生梯度爆炸。这是更新以执行剪辑的代码。我将测试 [0.1, 0.2, 0.5, 1.0, 1.5, 2.0] 中的裁剪值
clip_value = 1.0
for epoch in range(num_epochs):
total_loss = 0
for batch in train_dataloader:
# Unpack the batch
batch_sequences, batch_labels = batch[0].to(device), batch[1].to(device)
# Pass the batch through the model
output = model(batch_sequences).squeeze()
# Compute the loss
loss = criterion(output, batch_labels)
total_loss += loss.item()
# Backpropagate the loss with gradient clipping
optimizer.zero_grad()
loss.backward()
# Clip gradients to prevent exploding gradients
torch.nn.utils.clip_grad_norm_(model.parameters(), clip_value)
optimizer.step()
这是一个梯度裁剪的例子。 https://towardsdatascience.com/what-is-gradient-clipping-b8e815cdfb48