具有进化策略优化器的神经网络在 MNIST - Pytorch 上保持输出相同的精度

问题描述 投票:0回答:1

我的任务是创建一个以进化策略算法作为优化器的人工神经网络(无推导)。我使用的数据集是 MNIST。目前,我只是尝试使用线性人工神经网络来实现这一点。

我发现一个 Colab 笔记本可以做同样的事情,但是是在 sklearn“make_moons”数据集上。我尝试合并笔记本上的内容,代码运行没有问题;但它输出相同的精度。通常前几个输出是不同的,然后它在训练集中“收敛”为 0.0987,在测试集中“收敛”为 0.098。此外,训练需要很长时间。也许存在冗余迭代?

Colab Notebook,如果你想查看一下: https://colab.research.google.com/drive/1SY38Evy4U9HfUDkofPZ2pLQzEnwvYC81?usp=sharing

我尝试了一些 StackOverflow 的建议,例如调整超参数(学习率、隐藏单元),以及在“dying ReLu”的情况下使用 Leaky ReLu;他们都不起作用。这让我相信问题出在 ES 优化器中。

我是Pytorch的新手,所以如果有任何明显的不当行为,请指出!

import torch
import torch.nn as nn
from torch.utils.data import DataLoader
import torchvision.datasets as datasets
import torchvision.transforms as transforms
import numpy as np
from tqdm import tqdm
    
# Set decive to CUDA
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# NN & DataLoader hyperparameters
input_size = 784
num_classes = 10
learning_rate = 0.01
batch_size = 64
num_epochs = 1 

# Load data
train_dataset = datasets.MNIST(root='dataset/', train=True, transform=transforms.ToTensor(), download=False)
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True) 

test_dataset = datasets.MNIST(root='dataset/', train=False, transform=transforms.ToTensor(), download=False) 
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=True) 

# Connected NN
model = nn.Sequential(
      nn.Linear(input_size, 40),
      nn.ReLU(0.1),
      nn.Linear(40, 20),
      nn.ReLU(0.1),
      nn.Linear(20, num_classes),
      nn.ReLU(0.1),
)
model = model.float().to(device)

# Custom loss function
loss_func = nn.CrossEntropyLoss()

def loss(y_pred, y_true):
  return 1/loss_func(y_pred, y_true) # We are maximizing the loss in ES, so take the reciprocal
  # Now, increasing loss means the model is learning

# Fitness function
def fitness_func(solution, scores, targets):
  # Solution is a vector of parameters like mother_parameters
  nn.utils.vector_to_parameters(solution, model.parameters())
  return loss(scores, targets)

# In ES, our population is a slightly altered version of the mother parameters, so we implement a jitter function
def jitter(mother_params, state_dict):
  params_try = mother_params + SIGMA*state_dict.to(device)
  return params_try

# Now, we calculate the fitness of entire population
def calculate_population_fitness(pop, mother_vector, scores, targets):
  fitness = torch.zeros(pop.shape[0])
  for i, params in enumerate(pop):
    p_try = jitter(mother_vector, params)
    fitness[i] = fitness_func(p_try, scores, targets)
  return fitness

# Calculating number of parameters
n_params = nn.utils.parameters_to_vector(model.parameters()).shape[0]

# now, implementing the training algorithm
mother_parameters = model.parameters()
mother_vector = nn.utils.parameters_to_vector(mother_parameters)

# ES hyperparameters
SIGMA = 0.01
LR = 0.01
POPULATION_SIZE=50
ITERATIONS = 500 

# Train network
for epoch in range(num_epochs):
    for batch_idx, (data, targets) in enumerate(train_loader):

        data = data.to(device=device)
        targets = targets.to(device=device)

        # Correcting shape
        data = data.reshape(data.shape[0], -1)

        scores = model(data)

        print(f"{batch_idx} out of {len(train_loader)}")
        
        # ES optimizer
        with torch.no_grad(): # No need for gradients
            for iteration in tqdm(range(ITERATIONS)):
                pop = torch.from_numpy(np.random.randn(POPULATION_SIZE, n_params)).float().to(device)
                fitness = calculate_population_fitness(pop, mother_vector, scores, targets)
                # Normalize the fitness
                normalized_fitness = ((fitness - torch.mean(fitness)) / torch.std(fitness)).to(device)
                # Update mother vector with the fitness values
                mother_vector = mother_vector.to(device) + (LR / (POPULATION_SIZE * SIGMA)) * torch.matmul(pop.t(), normalized_fitness)

        # Update the model parameters
        nn.utils.vector_to_parameters(mother_vector, model.parameters())

        # Computing accuracy
        num_correct = 0
        num_samples = 0

        for x, y in train_loader:
              x = x.to(device=device)
              y = y.to(device=device)
              x = x.reshape(x.shape[0], -1)

              scores = model(x)
              _, predictions = scores.max(1)
              num_correct += (predictions == y).sum()
              num_samples += predictions.size(0)
        
        print(num_correct, num_samples)
        print(f"accuracy {float(num_correct)/float(num_samples)*100:.2f}")
        print("------------------------------------------")
python pytorch neural-network mnist evolutionary-algorithm
1个回答
0
投票

最明显的问题是,在开始循环总体之前,您只评估模型一次(在

scores = model(data)
行中)。

您需要针对“母”向量的每次扰动更新和评估模型。

© www.soinside.com 2019 - 2024. All rights reserved.