Softmax 在神经网络中产生 inf

问题描述 投票:0回答:1

我正在尝试使用 Iris 数据集编写一个简单的神经网络实现。然而,当我尝试使用 softmax 作为最后一个激活层时。我收到以下错误:

运行时警告:减法中遇到无效值 exp_score = np.exp(X-np.max(X, axis=1,keepdims=True))

此外,如果我将 ReLU 更改为交叉熵,我也会得到类似的错误。我检查了数据集,没有丢失数据。有人可以帮我找出为什么会出现这样的错误吗?

class ReLU():
    def __call__(self, X):
        return np.maximum(0, X)
    
    def derivative(self, X):
        return (X>0).astype(float)

class SoftMax():
    def __call__(self, X):
        exp_score = np.exp(X-np.max(X, axis=1,keepdims=True))
        prob = exp_score/np.sum(exp_score, axis=1, keepdims=True)
        return prob
    

import numpy as np
class MultiLayerNet():
    def __init__(self, input_size, hidden_size, output_size, activation_func = ReLU(), loss = "mse", reg_lambda = 0.01, mini_batch =100):
        self.input_size = input_size
        self.output_size = output_size
        self.hidden_sizes = hidden_size
        self.activation = activation_func
        if loss == 'mse':
            self.loss_function = self.cross_entropy_loss
            self.loss_derivative = self.cross_entropy_loss_derivative
        self.minibatch = mini_batch
        self.params = {}
        self.reg_lambda = reg_lambda
        self.layers = [input_size] + hidden_size+[output_size]
        for i in range(1, len(self.layers)):
            self.params[f"W{i}"] = np.random.randn(self.layers[i-1], self.layers[i])/np.sqrt(self.layers[i-1])
            self.params[f"b{i}"] = np.zeros((1, self.layers[i]))
    
    
    def forward(self, X):
        A = X
        self.params["A0"] = X
        for i in range(1, len(self.layers)):
            Z = A @ self.params[f"W{i}"] + self.params[f"b{i}"]
            if i == len(self.layers) - 1:
                A = SoftMax()(Z)
            else:
                A = self.activation(Z)
            self.params[f"Z{i}"] = Z
            self.params[f"A{i}"] = A
        return A
    
    def backward(self, X, y):
        y = y.reshape(-1, 1) if len(y.shape) == 1 else y
        output = self.params[f"A{len(self.layers)-1}"]
        d = self.loss_derivative(output, y)
        for i in reversed(range(1, len(self.layers))):
            self.params[f"dw{i}"] = self.params[f"A{i-1}"].T @ d + self.reg_lambda * self.params[f"W{i}"]
            self.params[f"db{i}"] = np.sum(d, axis=0, keepdims = True)
            if i > 1:
                d = d @ self.params[f"W{i}"].T * self.activation.derivative(self.params[f"Z{i-1}"])
                
    
    def train(self, X, y, num_epochs=500, learning_rate=0.01):
        for i in range(num_epochs):
            self.forward(X)
            self.backward(X,y)
            for i in range(1, len(self.layers)):
                self.params[f"W{i}"] -= learning_rate * self.params[f"dw{i}"]
                self.params[f"b{i}"] -= learning_rate * self.params[f"db{i}"]
            loss = self.loss_function(self.params[f"A{len(self.layers)-1}"], y.reshape(-1, 1))
            if i%100 == 0:
                print("loss {}".format(loss))

更新:我发现,对于较小的纪元,它的准确性非常差。但是,如果我增加 epoch,前向函数中 SoftMax 的输入将开始有 inf,这会导致错误。

def forward(self, X):
        A = X
        self.params["A0"] = X
        for i in range(1, len(self.layers)):
            
            Z = A @ self.params[f"W{i}"] + self.params[f"b{i}"]
            if i == len(self.layers)-1:
              #problem here when having larger epoch number  
            A = SoftMax()(Z) 
python numpy deep-learning
1个回答
0
投票

Softmax 类缺少派生函数。没有它,它甚至不能正常工作。

现在,损失函数和激活函数之间也存在一些区别。您不能将交叉熵切换为 ReLU,因为交叉熵是损失函数,而 ReLU 是激活函数。 Softmax本身也是一个激活函数,它可以与交叉熵(损失函数)一起使用来解决分类问题。损失函数计算网络的损失,因此它们通常接受输入和标签,而激活函数仅接受输入。

您应该发布更多代码,以便我们查找错误。交叉熵的代码和用于训练网络的代码会有很大帮助。

© www.soinside.com 2019 - 2024. All rights reserved.