我对深度学习有点陌生,并且为 CIFAR-10 和 MNIST 数据集创建了神经网络。我想尝试具有不同最终目标的更大数据集,因此我选择了 Pytorch 中的 Country211 数据集。我创建了神经网络(我使用了三个卷积层,因为否则扁平化的输入会很大),但打印的损失似乎几乎没有减少。我只是运行的时间不够长吗?或者我的神经网络有根本性的问题吗?
我的模型如下:
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
# Use CUDA device
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(device)
# Define transform to Normalize images (input is PIL image)
transform = transforms.Compose([transforms.ToTensor(),
transforms.Resize((300, 300)),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
batch_size = 64
# Define training and test set
trainset = torchvision.datasets.Country211(root='./data', split='train',
transform=transform, download=True)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
shuffle=True, num_workers=0)
testset = torchvision.datasets.Country211(root='./data', split='test',
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
shuffle=False, num_workers=0)
# classes = ()
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 8, 5)
self.pool = nn.MaxPool2d(2)
self.conv2 = nn.Conv2d(8, 12, 5)
self.conv3 = nn.Conv2d(12, 16, 5)
self.fc1 = nn.Linear(16 * 34 * 34, 4096)
self.fc2 = nn.Linear(4096, 1024)
self.fc3 = nn.Linear(1024, 211)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.pool(F.relu(self.conv3(x)))
x = torch.flatten(x, 1)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
我的训练循环如下:
net = Net()
net.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)
epochs = 6
for epoch in range(10):
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data[0].to(device), data[1].to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
running_loss = 0.0
我的损失输出在这里:
[1, 2000] loss: 5.352
[1, 4000] loss: 5.351
[1, 6000] loss: 5.350
[2, 2000] loss: 5.322
[2, 4000] loss: 5.320
[3, 2000] loss: 5.276
[3, 4000] loss: 5.272
[3, 6000] loss: 5.258
[4, 2000] loss: 5.211
[4, 4000] loss: 5.197
[4, 6000] loss: 5.212
[5, 4000] loss: 5.114
[5, 6000] loss: 5.140
预先感谢您的帮助!
就像评论中建议的那样,您可能需要稍微调整一下超参数:学习率、优化器选择、内核参数等。关于要训练多少个纪元:没有神奇的数字;建议跟踪您的训练和验证损失曲线以监控训练。
当训练损失不断减少但验证损失开始增加时,这通常是过度拟合的迹象,并且表明需要停止训练或调整学习率。