我目前正在做一个项目,试图对乳腺癌肿瘤的图像进行正确分类,以预测它们是良性(正常)还是恶性(危险),我正在使用 Pytorch。我已经设置了扩展 ResNet18 模型的网络并添加了更多层。对于超参数,本来是我自己设置的,但想用随机搜索等方法来找到最好的超参数。然而,在运行验证循环管道以获得准确性后,我发现它超过了 100%?
我像这样加载数据:
import torch.utils
import torch.utils.data
from torchvision.transforms import transforms
import torch.utils.data as data
transformed_data = transforms.Compose([ # using Compose() to chain data transformations
transforms.ToTensor()
])
training_dataset = DataSetClass(split="train", transform=transformed_data, download=download)
# testing_dataset = DataSetClass(split="test", transform=transformed_data, download=download)
train_size = int(training_ratio * len(training_dataset))
validation_size = int(validation_ratio * len(training_dataset))
testing_size = len(training_dataset) - train_size - validation_size
train_set, validation_set, testing_dataset = torch.utils.data.random_split(
training_dataset, [train_size, validation_size, testing_size]
)
# convert data into dataloader form
train_loader = data.DataLoader(dataset=train_set, batch_size=BATCH_SIZE, shuffle=True)
validation_loader = data.DataLoader(dataset=validation_set, batch_size=2*BATCH_SIZE, shuffle=False)
test_loader = data.DataLoader(dataset=testing_dataset, batch_size=BATCH_SIZE, shuffle=False)
print(training_dataset)
print("=======================")
print(testing_dataset)
我尝试创建一个列表,存储每个超参数的不同值,如下所示,同时执行验证循环:
import torch.optim as optim
import numpy as np
# Define your hyperparameters space
lr_space = [0.01, 0.02, 0.03, 0.04, 0.05]
epochs_space = [10, 20, 30, 40, 50]
batch_size_space = [32, 64, 128, 256]
# Initialize your network
network = ExtendedNetwork(resnet18)
network.to(device=device)
# Define your loss function
loss_function = nn.BCEWithLogitsLoss()
network.eval()
best_accuracy = 0
best_hyperparameters = None
validation_accuracy = 0
# Perform random search
for _ in range(100):
lr = np.random.choice(lr_space)
epochs = np.random.choice(epochs_space)
batch_size = np.random.choice(batch_size_space)
# Use the selected hyperparameters to train your model
optimizer = optim.Adam(network.parameters(), lr=lr)
for epoch in range(epochs):
with torch.no_grad():
for inputs, targets in validation_loader:
inputs, targets = inputs.to(device), targets.to(device)
# Forward
output = network(inputs)
# Calculate and accumulate accuracy
validation_accuracy += accuracy(output, targets)
# Calculate average accuracy over all validation batches
validation_accuracy /= len(validation_loader)
print('Validation accuracy:', validation_accuracy)
# If the current model is better than all previous models, update the best accuracy and best hyperparameters
if validation_accuracy > best_accuracy:
best_accuracy = validation_accuracy
best_hyperparameters = {'lr': lr, 'epochs': epochs, batch_size': batch_size}
print('Best accuracy:', best_accuracy)
print('Best hyperparameters:', best_hyperparameters)
这导致结果:
Validation accuracy: 0.045871559633027525
Validation accuracy: 0.09174311926605505
...
Validation accuracy: 135.73394495412717
Validation accuracy: 135.77981651376018
Best accuracy: 135.77981651376018
Best hyperparameters: {'lr': 0.03, 'epochs': 40, 'batch_size': 256}
但是准确率应该限制在100%?我不明白我的错误是否来自我加载数据的方式或验证循环管道
您应该在每个纪元开始时将
validation_accuracy
重置为 0
。