梯度下降和批处理时仅使用 python numpy 构建 CNN 时出现问题

问题描述 投票:0回答:1

我目前正在学习 Andrew W. Trask 的《Grokking Deep Learning》一书。但我在理解本书第 10 章中关于仅使用 python 和 numpy 构建 CNN 的代码时遇到了问题:

import numpy as np, sys
np.random.seed(1)
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
images, labels = (x_train[0:1000].reshape(1000, 28*28)/255, y_train[0:1000])
one_hot_labels = np.zeros((len(labels), 10))
for i, l in enumerate(labels):
    one_hot_labels[i][l] = 1
labels = one_hot_labels
test_images = x_test.reshape(len(x_test), 28*28)/255
test_labels = np.zeros((len(y_test), 10))
for i, l in enumerate(y_test):
    test_labels[i][l] = 1
def tanh(x):
    return np.tanh(x)
def tanh2deriv(output):
    return 1-(output**2)
def softmax(x):
    temp = np.exp(x)
    return temp/np.sum(temp, axis=1, keepdims=True)
alpha, iterations = (2, 300)
pixels_per_image, num_labels = (784, 10)
batch_size = 128
input_rows = 28
input_cols = 28
kernel_rows = 3
kernel_cols = 3
num_kernels = 16
hidden_size = ((input_rows-kernel_rows)*(input_cols-kernel_cols))*num_kernels
kernels = 0.02*np.random.random((kernel_rows*kernel_cols,num_kernels))-0.01 
weights_1_2 = 0.2*np.random.random((hidden_size, num_labels))-0.1 
def get_image_section(layer, row_from, row_to, col_from, col_to): 
    section = layer[:,row_from:row_to, col_from:col_to]
    return section.reshape(-1, 1, row_to-row_from, col_to-col_from)
for j in range(iterations):
    correct_cnt = 0
    for i in range(int(len(images)/batch_size)):
        batch_start, batch_end = ((i*batch_size), ((i+1)*batch_size))
        layer_0 = images[batch_start:batch_end]
        layer_0 = layer_0.reshape(layer_0.shape[0], 28, 28)
        sects = list()
        for row_start in range(layer_0.shape[1]-kernel_rows):
            for col_start in range(layer_0.shape[2]-kernel_cols):
                sect = get_image_section(layer_0, row_start, row_start+kernel_rows, col_start, col_start+kernel_cols)
                sects.append(sect) 
        expanded_input = np.concatenate(sects, axis=1) 
        es = expanded_input.shape 
        flattened_input = expanded_input.reshape(es[0]*es[1],-1) 
        kernel_output = flattened_input.dot(kernels) 
        layer_1 = tanh(kernel_output.reshape(es[0], -1)) 
        dropout_mask = np.random.randint(2, size=layer_1.shape)
        layer_1 *= dropout_mask*2
        layer_2 = softmax(np.dot(layer_1, weights_1_2)) 
        for k in range(batch_size):
            labelset = labels[batch_start+k:batch_start+k+1]
            _inc = int(np.argmax(layer_2[k:k+1])==np.argmax(labelset))
            correct_cnt+=_inc
        layer_2_delta = (labels[batch_start:batch_end]-layer_2) / (batch_size*layer_2.shape[0])
        layer_1_delta = layer_2_delta.dot(weights_1_2.T)*tanh2deriv(layer_1)
        layer_1_delta*=dropout_mask
        weights_1_2 += alpha*layer_1.T.dot(layer_2_delta)
        l1d_reshape = layer_1_delta.reshape(kernel_output.shape)
        k_update = flattened_input.T.dot(l1d_reshape)
        kernels -= alpha*k_update 
    print("I:"+str(j), "Train-Acc:", correct_cnt/float(len(images)))

我知道代码格式非常密集并且有点难以阅读。但它运行正常,测试准确率达到预期的 87.5% 左右。但是,我想更多地了解代码为何有效,并对代码的某些地方有一些疑问:

首先,关于配料部分。

layer_2_delta = (labels[batch_start:batch_end]-layer_2) / (batch_size*layer_2.shape[0])

我已经知道,由于批量大小为 128,因此您一次处理 128 张图像,而

(labels[batch_start:batch_end]-layer_2)
增量实际上是所有 128 幅图像的总增量。因此,通常应将“增量”除以 128 以获得该层的平均增量。然而,这次代码再次将该增量除以
layer_2.shape[0]
,这也与批量大小 128 相同。我不明白为什么我应该这样做。如果我删除多余的 128,代码将无法运行。该程序最终会在 exp 函数(softmax)上出现 numpy 警告“溢出”,然后训练准确率仅保持在 8.7%。为什么这个额外的“/128”对代码至关重要?

梯度下降的另一个问题。我在书的前面了解到,梯度下降是这样的:

delta = target_output - real_output
weight += input*delta*alpha

但是这里我面对的是代码

kernels -= alpha*k_update 

我以前还以为代码有问题。但将“-”“修正”为“+”后,我在AI模型上得到了类似的结果,测试准确率为86%。这怎么可能?梯度下降一定有一些我还没有完全理解的基本原理。梯度下降中的减号和加号有什么区别以及如何使用它们?

python deep-learning conv-neural-network gradient-descent
1个回答
0
投票

(1) 只是一个被删除的常数。它实际上对应于“学习率”。如果你删除它,你会得到这种现象,你的学习率太高了:


(2) 如果您更改所有体重更新:

for weight in weights:
    weight -= learning_rate * dloss_dweight

至:

    weight += learning_rate * dloss_dweight

...那么就不再寻找最小损失了。现在正在寻找损失的最大值,即最坏的可能模型。

但是,在您的情况下,您只更改了一层的权重,因此可能发生的情况是您的其他参数对其进行了补偿。

© www.soinside.com 2019 - 2024. All rights reserved.