我正在尝试使用 Iris 数据集编写一个简单的神经网络实现。然而,当我尝试使用 softmax 作为最后一个激活层时。我收到以下错误:
运行时警告:减法中遇到无效值 exp_score = np.exp(X-np.max(X, axis=1,keepdims=True))
此外,如果我将 ReLU 更改为交叉熵,我也会得到类似的错误。我检查了数据集,没有丢失数据。有人可以帮我找出为什么会出现这样的错误吗?
class ReLU():
def __call__(self, X):
return np.maximum(0, X)
def derivative(self, X):
return (X>0).astype(float)
class SoftMax():
def __call__(self, X):
exp_score = np.exp(X-np.max(X, axis=1,keepdims=True))
prob = exp_score/np.sum(exp_score, axis=1, keepdims=True)
return prob
import numpy as np
class MultiLayerNet():
def __init__(self, input_size, hidden_size, output_size, activation_func = ReLU(), loss = "mse", reg_lambda = 0.01, mini_batch =100):
self.input_size = input_size
self.output_size = output_size
self.hidden_sizes = hidden_size
self.activation = activation_func
if loss == 'mse':
self.loss_function = self.cross_entropy_loss
self.loss_derivative = self.cross_entropy_loss_derivative
self.minibatch = mini_batch
self.params = {}
self.reg_lambda = reg_lambda
self.layers = [input_size] + hidden_size+[output_size]
for i in range(1, len(self.layers)):
self.params[f"W{i}"] = np.random.randn(self.layers[i-1], self.layers[i])/np.sqrt(self.layers[i-1])
self.params[f"b{i}"] = np.zeros((1, self.layers[i]))
def forward(self, X):
A = X
self.params["A0"] = X
for i in range(1, len(self.layers)):
Z = A @ self.params[f"W{i}"] + self.params[f"b{i}"]
if i == len(self.layers) - 1:
A = SoftMax()(Z)
else:
A = self.activation(Z)
self.params[f"Z{i}"] = Z
self.params[f"A{i}"] = A
return A
def backward(self, X, y):
y = y.reshape(-1, 1) if len(y.shape) == 1 else y
output = self.params[f"A{len(self.layers)-1}"]
d = self.loss_derivative(output, y)
for i in reversed(range(1, len(self.layers))):
self.params[f"dw{i}"] = self.params[f"A{i-1}"].T @ d + self.reg_lambda * self.params[f"W{i}"]
self.params[f"db{i}"] = np.sum(d, axis=0, keepdims = True)
if i > 1:
d = d @ self.params[f"W{i}"].T * self.activation.derivative(self.params[f"Z{i-1}"])
def train(self, X, y, num_epochs=500, learning_rate=0.01):
for i in range(num_epochs):
self.forward(X)
self.backward(X,y)
for i in range(1, len(self.layers)):
self.params[f"W{i}"] -= learning_rate * self.params[f"dw{i}"]
self.params[f"b{i}"] -= learning_rate * self.params[f"db{i}"]
loss = self.loss_function(self.params[f"A{len(self.layers)-1}"], y.reshape(-1, 1))
if i%100 == 0:
print("loss {}".format(loss))
更新:我发现,对于较小的纪元,它的准确性非常差。但是,如果我增加 epoch,前向函数中 SoftMax 的输入将开始有 inf,这会导致错误。
def forward(self, X):
A = X
self.params["A0"] = X
for i in range(1, len(self.layers)):
Z = A @ self.params[f"W{i}"] + self.params[f"b{i}"]
if i == len(self.layers)-1:
#problem here when having larger epoch number
A = SoftMax()(Z)
Softmax 类缺少派生函数。没有它,它甚至不能正常工作。
现在,损失函数和激活函数之间也存在一些区别。您不能将交叉熵切换为 ReLU,因为交叉熵是损失函数,而 ReLU 是激活函数。 Softmax本身也是一个激活函数,它可以与交叉熵(损失函数)一起使用来解决分类问题。损失函数计算网络的损失,因此它们通常接受输入和标签,而激活函数仅接受输入。
您应该发布更多代码,以便我们查找错误。交叉熵的代码和用于训练网络的代码会有很大帮助。