我正在尝试创建一个 LSTM,根据之前 5 场比赛的序列来预测 A 队的第六场体育比赛。我的数据是按照这样的结构设置的。 A 队比赛 1 对阵随机球队,B 队比赛 1 对阵随机球队,... A 队比赛 5 对阵随机球队,B 队比赛 5 对阵随机球队。 B队是A队在第六场比赛中对阵的球队,结果就是输出。每个序列由 124 个特征组成,是 A 队第 i'th 游戏和 B 队第 i'th 游戏的组合。
我的问题是我的测试损失立即上升,而且我似乎无法让它持续下降。我已经搞乱了超参数,但它们似乎都没有明显的效果。我能做什么?
import numpy as np
import torch
from torch import nn
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from torch.utils.data import TensorDataset, DataLoader
def main():
# Device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(torch.cuda.is_available())
print(f'Running on device: {device}')
# Process data
data = pd.read_csv('matchup_data.csv')
# Columns that are one-hot encoded for the label
labels = data['label']
# Remove the original one-hot encoded label columns from the features data
data = data.drop('label', axis=1)
num_features = 124
samples = 531
timesteps = 5
# Convert features and labels to tensors
dataT = torch.tensor(data.values).float()
dataT = dataT.view(samples, timesteps, num_features)
labelsT = torch.tensor(labels.values).float()
labelsT = labelsT.unsqueeze(1)
print(dataT)
# Split to test and train data
train_data, test_data, train_labels, test_labels = train_test_split(dataT, labelsT, test_size=.1)
train_dataset = TensorDataset(train_data, train_labels)
test_dataset = TensorDataset(test_data, test_labels)
batch_size = 2 # Choose a batch size that fits your data and model
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)
# Layer parameters
input_size = 124
hidden_size = 64
num_layers = 2
output_size = 1
# Net and net parameters
net = LSTMnet(input_size, output_size, hidden_size, num_layers).to(device)
print(net)
loss_function = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.00001)
train_accuracy, train_losses, test_accuracy, test_losses = trainModel(100, net, optimizer, loss_function,
train_loader, test_loader, device)
print(np.max(train_accuracy))
print(np.min(train_losses))
print(np.max(test_accuracy))
print(np.min(test_losses))
# Plot accuracy and loss
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.plot(train_accuracy, label='Train Accuracy')
plt.plot(test_accuracy, label='Test Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(train_losses, label='Train Loss')
plt.plot(test_losses, label='Test Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.tight_layout()
plt.show()
class LSTMnet(nn.Module):
def __init__(self, input_size, output_size, num_hidden, num_layers):
super().__init__()
self.input_size = input_size
self.num_hidden = num_hidden
self.num_layers = num_layers
# RNN layer
self.lstm = nn.LSTM(input_size, num_hidden, num_layers)
self.dropout = nn.Dropout(0.6)
# linear layer for output
self.out = nn.Linear(num_hidden, output_size)
def forward(self, x):
# Run through RNN layer
y, hidden = self.lstm(x)
# pass through dropout
y = self.dropout(y)
# Pass to linear layer
output = self.out(y)
return output, hidden
def trainModel(num_epochs, net, optimizer, loss_function, train_data, test_data, device):
# Variable initialization
train_accuracy = np.zeros(num_epochs)
train_losses = np.zeros(num_epochs)
test_accuracy = np.zeros(num_epochs)
test_losses = np.zeros(num_epochs)
for epochi in range(num_epochs):
net.train()
segment_loss = []
segment_accuracy = []
for X, y in train_data:
X = X.to(device)
y = y.to(device)
output, _ = net(X) # Unpack the tuple to get the output
output = output[:, -1, :]
loss = loss_function(output, y) # Use .squeeze() to ensure the dimensions match
optimizer.zero_grad()
loss.backward()
optimizer.step()
# Convert output logits to probabilities using sigmoid
probabilities = torch.sigmoid(output)
# Convert probabilities to binary predictions
predicted = (probabilities > 0.5).float()
# Calculate accuracy
acc = (predicted == y).float().mean() * 100
segment_loss.append(loss.item())
segment_accuracy.append(acc.item())
train_losses[epochi] = np.mean(segment_loss)
train_accuracy[epochi] = np.mean(segment_accuracy)
net.eval()
test_loss = []
test_acc = []
with torch.no_grad():
for X, y in test_data:
X = X.to(device)
y = y.to(device)
output, _ = net(X) # Unpack the tuple to get the output
output = output[:, -1, :]
loss = loss_function(output, y) # Use .squeeze() to ensure the dimensions match
# Convert output logits to probabilities using sigmoid
probabilities = torch.sigmoid(output)
# Convert probabilities to binary predictions
predicted = (probabilities > 0.5).float()
# Calculate accuracy
acc = (predicted == y).float().mean() * 100
test_loss.append(loss.item())
test_acc.append(acc.item())
test_losses[epochi] = np.mean(test_loss)
test_accuracy[epochi] = np.mean(test_acc)
return train_accuracy, train_losses, test_accuracy, test_losses
if __name__ == "__main__":
main()
我的理解是,
X
中net(X)
的形状是(2, 5, 62 + 62),y
中loss_function(output, y)
的形状是(2, 1)。输入序列为[A vs.rand + B vs.rand(游戏1),...,A vs.rand + B vs.rand(游戏5)]。输出对应于[A vs. B (game 6)]。
如果您的数据已安排好
(seq_length, batch_size, n_features)
,我认为您上面的评论就是这种情况,那么我认为您需要更改该行:
#output = output[:, -1, :] #accesses last sample only
output = output[-1, :, :] #corrected - access last frame from each sample
目前,它是从每个批次中读出最后一个样本,而不是从每个样本中读出最后一帧。因此,它仅基于每批次 1 个样本进行优化,这可能会导致过度拟合行为。
需要在训练和测试循环中应用校正。
为了避免过度拟合,请尝试将
n_layers
减少到 1,将 hidden_size
减少到 16(也许从 lr=
中删除 Adam
以使用默认值)。这是因为输入维度看起来有点大,而样本数量相对而言可能很小。这应该会导致它不会那么快地发散。
如果可行,您可能想尝试的另一件事是使用
Conv1d
层将 124 维输入“压缩”到更小的值。这应该保留重要信息,同时减少数据的维数,减轻过度拟合。这是修改现有代码的简单方法,无需进行任何其他更改:
#Just the LSTM
#net = LSTMnet(input_size, output_size, hidden_size, num_layers).to(device)
#Conv1d to reduce feature size from 124 down to 32, followed by an LSTM
net = nn.Sequential(
nn.Conv1d(in_channels=124, out_channels=64, kernel_size=1),
nn.BatchNorm1d(64),
nn.ReLU(),
nn.Conv1d(in_channels=64, out_channels=32, kernel_size=1),
nn.BatchNorm1d(32),
nn.ReLU(),
#Finally, the LSTM
LSTMnet(input_size=32, output_size=output_size, num_hidden=32, num_layers=2)
).to(device)
您可以从第一行
nn.Conv1d
开始,然后根据您认为是否值得添加其他行。 net
的输入和输出形状不会改变;只是它在内部将特征尺寸缩小到更小的东西,然后再将其输入 LSTM。