Numpy神经网络成本计算:首次运行后的结果变化

问题描述 投票:0回答:1

在python3.7中,我的神经网络成本计算存在问题。 当我第一次运行compute_cost_nn时,我得到正确的成本0.28762916516131887,但在所有后续运行中,成本变为0.3262751145707298,这非常令人讨厌。 看起来这个问题来自我的params;如果我每次在计算成本之前重新加载它们,它就可以正常工作。但我无法使用不同的参数重新运行该函数,并且无需再次运行整个脚本即可获得正确的成本。

神经网络有400个输入单元,1个隐藏层,25个单元,10个输出单元。

以下是输入:

data = loadmat("ex4data1.mat")
y = data['y']
X = data['X']
X = np.c_[np.ones((X.shape[0], 1)), X]

weights = loadmat("ex4weights.mat")
Theta1 = weights['Theta1']
Theta2 = weights['Theta2']
params = np.r_[Theta1.ravel(), Theta2.ravel()]

矩阵形状:

>> X: (5000, 401)
>> y: (5000, 1)
>> Theta1: (25, 401)
>> Theta2: (10, 26)
>> params: (10285,)

和成本函数:

def compute_cost_nn(params,
                    input_layer_size,
                    hidden_layer_size,
                    num_labels,
                    X, y, lambda_):

    m = len(y)

    # Retrieve Theta1 and Theta2 from flattened params
    t1_items = (input_layer_size + 1) * hidden_layer_size
    Theta1 = params[0:t1_items].reshape(
        hidden_layer_size, 
        input_layer_size+1
        )
    Theta2 = params[t1_items:].reshape(
        num_labels, 
        hidden_layer_size+1
        )

    # transform y vector column (5000x1) with labels 
    # into 5000x10 matrix with 0s and 1s
    y_mat = np.eye(num_labels)[(y-1).ravel(), :]

    # Forward propagation
    a1 = X
    z2 = a1 @ Theta1.T
    a2 = sigmoid(z2)
    a2 = np.c_[np.ones((m,1)), a2]
    z3 = a2 @ Theta2.T
    a3 = sigmoid(z3)

    # Compute cost
    func = y_mat.T @ np.log(a3) + (1-y_mat).T @ np.log(1-a3)
    cost = func.trace()
    t1reg = (Theta1[:,1:].T @ Theta1[:,1:]).trace()
    t2reg = (Theta2[:,1:].T @ Theta2[:,1:]).trace()
    cost_r = -1/m * cost + lambda_/(2*m) * (t1reg + t2reg)

    # Gradients (excluding Theta0)
    d3 = a3 - y_mat
    d2 = (d3 @ Theta2[:,1:]) * sigmoid_gradient(z2) #5000*25

    Delta1 = d2.T @ a1
    Delta2 = d3.T @ a2
    Theta1_grad = 1/m * Delta1
    Theta2_grad = 1/m * Delta2

    # Gradient regularization
    Theta1[:,1] = 0
    Theta2[:,1] = 0
    Theta1_grad = Theta1_grad + lambda_/m * Theta1
    Theta2_grad = Theta2_grad + lambda_/m * Theta2

    return cost_r, Theta1_grad, Theta2_grad

我通过运行来获得成本:

compute_cost_nn(params, 400, 25, 10, X, y, 0)[0]

首先:0.28762916516131887 然后:0.3262751145707298

任何提示非常感谢:)

python numpy machine-learning neural-network data-science
1个回答
1
投票

我没有使用虚拟数据测试您的代码,但是从快速浏览一下,您似乎从.mat(MATLAB)文件导入权重。 MATLAB以列主要顺序(a.k.a Fortran样式顺序)存储数组元素,而Python是行主要(C样式顺序)。

因此当你第一次使用ravel()你的体重时,Numpy会假设C风格的顺序将阵列变平。当你重塑你的功能中的ravelled权重时,同样的故事。您可以将订单作为参数添加到任一函数中,因此:

params = np.r_[Theta1.ravel(order='F'), Theta2.ravel('F')]

应该解决你的问题。

如果您从未遇到过关于行和列主要订单的读者,可能会阅读:https://en.wikipedia.org/wiki/Row-_and_column-major_order

© www.soinside.com 2019 - 2024. All rights reserved.