LSTM 上的测试损失立即上升

问题描述 投票:0回答:1

我正在尝试创建一个 LSTM,根据之前 5 场比赛的序列来预测 A 队的第六场体育比赛。我的数据是按照这样的结构设置的。 A 队比赛 1 对阵随机球队,B 队比赛 1 对阵随机球队,... A 队比赛 5 对阵随机球队,B 队比赛 5 对阵随机球队。 B队是A队在第六场比赛中对阵的球队,结果就是输出。每个序列由 124 个特征组成,是 A 队第 i'th 游戏和 B 队第 i'th 游戏的组合。

我的问题是我的测试损失立即上升,而且我似乎无法让它持续下降。我已经搞乱了超参数,但它们似乎都没有明显的效果。我能做什么?

import numpy as np
import torch
from torch import nn
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from torch.utils.data import TensorDataset, DataLoader


def main():
    # Device
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print(torch.cuda.is_available())
    print(f'Running on device: {device}')

    # Process data
    data = pd.read_csv('matchup_data.csv')

    # Columns that are one-hot encoded for the label
    labels = data['label']

    # Remove the original one-hot encoded label columns from the features data
    data = data.drop('label', axis=1)

    num_features = 124
    samples = 531
    timesteps = 5

    # Convert features and labels to tensors
    dataT = torch.tensor(data.values).float()
    dataT = dataT.view(samples, timesteps, num_features)

    labelsT = torch.tensor(labels.values).float()
    labelsT = labelsT.unsqueeze(1)

    print(dataT)

    # Split to test and train data
    train_data, test_data, train_labels, test_labels = train_test_split(dataT, labelsT, test_size=.1)

    train_dataset = TensorDataset(train_data, train_labels)
    test_dataset = TensorDataset(test_data, test_labels)

    batch_size = 2  # Choose a batch size that fits your data and model

    train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
    test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)

    # Layer parameters
    input_size = 124
    hidden_size = 64
    num_layers = 2
    output_size = 1

    # Net and net parameters
    net = LSTMnet(input_size, output_size, hidden_size, num_layers).to(device)
    print(net)
    loss_function = nn.BCEWithLogitsLoss()
    optimizer = torch.optim.Adam(net.parameters(), lr=0.00001)

    train_accuracy, train_losses, test_accuracy, test_losses = trainModel(100, net, optimizer, loss_function,
                                                                          train_loader, test_loader, device)



    print(np.max(train_accuracy))
    print(np.min(train_losses))
    print(np.max(test_accuracy))
    print(np.min(test_losses))

    # Plot accuracy and loss
    plt.figure(figsize=(12, 6))
    plt.subplot(1, 2, 1)
    plt.plot(train_accuracy, label='Train Accuracy')
    plt.plot(test_accuracy, label='Test Accuracy')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.legend()

    plt.subplot(1, 2, 2)
    plt.plot(train_losses, label='Train Loss')
    plt.plot(test_losses, label='Test Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend()

    plt.tight_layout()
    plt.show()


class LSTMnet(nn.Module):
    def __init__(self, input_size, output_size, num_hidden, num_layers):
        super().__init__()

        self.input_size = input_size
        self.num_hidden = num_hidden
        self.num_layers = num_layers

        # RNN layer
        self.lstm = nn.LSTM(input_size, num_hidden, num_layers)
        self.dropout = nn.Dropout(0.6)

        # linear layer for output
        self.out = nn.Linear(num_hidden, output_size)

    def forward(self, x):
        # Run through RNN layer
        y, hidden = self.lstm(x)

        # pass through dropout
        y = self.dropout(y)
        # Pass to linear layer
        output = self.out(y)

        return output, hidden


def trainModel(num_epochs, net, optimizer, loss_function, train_data, test_data, device):
    # Variable initialization
    train_accuracy = np.zeros(num_epochs)
    train_losses = np.zeros(num_epochs)
    test_accuracy = np.zeros(num_epochs)
    test_losses = np.zeros(num_epochs)

    for epochi in range(num_epochs):
        net.train()

        segment_loss = []
        segment_accuracy = []
        for X, y in train_data:
            X = X.to(device)
            y = y.to(device)
            output, _ = net(X)  # Unpack the tuple to get the output
            output = output[:, -1, :]
            loss = loss_function(output, y)  # Use .squeeze() to ensure the dimensions match

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            # Convert output logits to probabilities using sigmoid
            probabilities = torch.sigmoid(output)
            # Convert probabilities to binary predictions
            predicted = (probabilities > 0.5).float()
            # Calculate accuracy
            acc = (predicted == y).float().mean() * 100

            segment_loss.append(loss.item())
            segment_accuracy.append(acc.item())

        train_losses[epochi] = np.mean(segment_loss)
        train_accuracy[epochi] = np.mean(segment_accuracy)

        net.eval()
        test_loss = []
        test_acc = []

        with torch.no_grad():
            for X, y in test_data:
                X = X.to(device)
                y = y.to(device)
                output, _ = net(X)  # Unpack the tuple to get the output
                output = output[:, -1, :]
                loss = loss_function(output, y)  # Use .squeeze() to ensure the dimensions match

                # Convert output logits to probabilities using sigmoid
                probabilities = torch.sigmoid(output)
                # Convert probabilities to binary predictions
                predicted = (probabilities > 0.5).float()
                # Calculate accuracy
                acc = (predicted == y).float().mean() * 100

                test_loss.append(loss.item())
                test_acc.append(acc.item())

            test_losses[epochi] = np.mean(test_loss)
            test_accuracy[epochi] = np.mean(test_acc)

    return train_accuracy, train_losses, test_accuracy, test_losses


if __name__ == "__main__":
    main()

deep-learning pytorch neural-network lstm recurrent-neural-network
1个回答
0
投票

我的理解是,

X
net(X)
的形状是(2, 5, 62 + 62),
y
loss_function(output, y)
的形状是(2, 1)。输入序列为[A vs.rand + B vs.rand(游戏1),...,A vs.rand + B vs.rand(游戏5)]。输出对应于[A vs. B (game 6)]。

如果您的数据已安排好

(seq_length, batch_size, n_features)
,我认为您上面的评论就是这种情况,那么我认为您需要更改该行:

#output = output[:, -1, :] #accesses last sample only
output = output[-1, :, :] #corrected - access last frame from each sample

目前,它是从每个批次中读出最后一个样本,而不是从每个样本中读出最后一帧。因此,它仅基于每批次 1 个样本进行优化,这可能会导致过度拟合行为。

需要在训练和测试循环中应用校正。

为了避免过度拟合,请尝试将

n_layers
减少到 1,将
hidden_size
减少到 16(也许从
lr=
中删除
Adam
以使用默认值)。这是因为输入维度看起来有点大,而样本数量相对而言可能很小。这应该会导致它不会那么快地发散。

如果可行,您可能想尝试的另一件事是使用

Conv1d
层将 124 维输入“压缩”到更小的值。这应该保留重要信息,同时减少数据的维数,减轻过度拟合。这是修改现有代码的简单方法,无需进行任何其他更改:

#Just the LSTM
#net = LSTMnet(input_size, output_size, hidden_size, num_layers).to(device)

#Conv1d to reduce feature size from 124 down to 32, followed by an LSTM
net = nn.Sequential(
  nn.Conv1d(in_channels=124, out_channels=64, kernel_size=1),
  nn.BatchNorm1d(64),
  nn.ReLU(),

  nn.Conv1d(in_channels=64, out_channels=32, kernel_size=1),
  nn.BatchNorm1d(32),
  nn.ReLU(),

  #Finally, the LSTM
  LSTMnet(input_size=32, output_size=output_size, num_hidden=32, num_layers=2)
).to(device)

您可以从第一行

nn.Conv1d
开始,然后根据您认为是否值得添加其他行。
net
的输入和输出形状不会改变;只是它在内部将特征尺寸缩小到更小的东西,然后再将其输入 LSTM。

© www.soinside.com 2019 - 2024. All rights reserved.