使用 keras 生成器进行流式训练数据会产生奇怪的张量大小不匹配错误 - 张量流代码太不透明，无法调试问题

Question

我正在张量流中训练神经网络，并且因为在训练加载整个训练集（输入图像和“地面实况”图像）时内存不足，所以我尝试使用生成器流式传输数据，以便仅一次加载几张图像。我的代码获取每个图像并将其细分为一组许多图像。这是我正在使用的生成器类的代码，基于我在网上找到的教程：

class DataGenerator(keras.utils.all_utils.Sequence):
        'Generates data for Keras'
        def __init__(self, 
                     channel,
                     pairs, 
                     prediction_size=200, 
                     input_normalizing_function_name='standardize', 
                     label="",
                     batch_size=1):
            'Initialization'
            self.channel = channel
            self.prediction_size = prediction_size
            self.batch_size = batch_size
            self.pairs = pairs
            self.id_list = list(self.pairs.keys())
            self.input_normalizing_function_name = input_normalizing_function_name
            self.label = label
            self.on_epoch_end()
    
        def __len__(self):
            'Denotes the number of batches per epoch'
            return int(np.floor(len(self.id_list) / self.batch_size))
    
        def __getitem__(self, index):
            'Generate one batch of data'
            # Generate indexes of the batch
            indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
            print("{} Indexes is {}".format(self.label, indexes))
            # Find list of IDs
            subset_pair_id_list = [self.id_list[k] for k in indexes]
            print("\t{} subset_pair_id_list is {}".format(self.label, subset_pair_id_list))
            # Generate data
            normalized_input_frames, normalized_gt_frames = self.__data_generation(subset_pair_id_list)
    
            print("in __getitem, returning data batch")
            return normalized_input_frames, normalized_gt_frames
    
        def on_epoch_end(self):
            'Updates indexes after each epoch'
            self.indexes = list(range(len(self.id_list)))
    
        def __data_generation(self, subset_pair_id_list):
            'subdivides each image into an array of multiple images'
            # Initialization
            normalized_input_frames, normalized_gt_frames = get_normalized_input_and_gt_dataframes(
                channel = self.channel, 
                pairs_for_training = self.pairs,
                pair_ids=subset_pair_id_list,
                input_normalizing_function_name = self.input_normalizing_function_name,
                prediction_size=self.prediction_size
            )
    
            print("\t\t\t~~~In data generation: input shape: {}, gt shape: {}".format(normalized_input_frames.shape, normalized_gt_frames.shape))
    
            return input_frames, gt_frames

我使用这个生成器来生成一组用于训练的数据，然后还使用它的另一个实例进行验证，例如：

training_data_generator = DataGenerator(
            pairs=pairs_for_training, 
            prediction_size=prediction_size,
            input_normalizing_function_name=input_normalizing_function_name,
            batch_size=batch_size,
            channel=channel,
            label="training generator"
        )

然后我开始训练，我使用 model.fit 运行：

        callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=patience, restore_best_weights=True)
        learning_rate = 0.0001

        opt = tf.keras.optimizers.Adam(learning_rate)

        l = tf.keras.losses.MeanSquaredError()
        print("Compiling model...")
        model.compile(loss=l, optimizer=opt)

        print('\tTraining model...')
        with tf.device('/device:GPU:0'):
            model_history = model.fit(
                training_data_generator,
                validation_data=validation_data_generator, 
                epochs=eps,
                callbacks=[callback]
            )

这是失败前打印输出的最后一位：

Epoch 1/1000
training generator Indexes is [0]
        training generator subset_pair_id_list is ['A']
                Loading batch of 1 pairs...
                        ['A']
                num data is 1
               
                        ~~~In data generation: input shape: (5, 100, 100, 1), gt shape: (5, 100, 100, 1)
in __getitem, returning data batch

但是，此步骤失败了，出现了关于张量大小不匹配的奇怪错误，这是由于我使用了生成器（以前没有生成器时不会发生这种情况）：

 File "/root/micromamba/envs/training/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError:  All dimensions except 3 must match. Input 1 has shape [5 25 25 32] and doesn't match input 0 with shape [5 24 24 64].
         [[node gradient_tape/model/concatenate/ConcatOffset (defined at /bin/train.py:633) ]] [Op:__inference_train_function_1982]

我尝试使用断点深入研究张量流代码并找出它生成这些张量的原因，但找不到实际生成它们的函数，并且无法了解到底发生了什么。您可以看到每个返回的输入和真实数据集的形状为 (5, 100, 100, 1)，因此我不知道该错误消息中的 25、24、32 和 64 值来自何处。这里可能发生了什么？我假设每个批次都已返回并用于训练，然后在生成器获取下一个批次之前被丢弃，但似乎正在根据错误消息尝试某种串联操作。

Answer 1

事实证明我使用发电机的方式没有任何问题。相反，我为图像指定的大小意味着它在模型层前进的过程中被下采样，然后再次上采样（这是一个 u-net），因为图像大小不是倍数16 存在舍入误差，因此我最终得到了试图连接的不同大小的层。这解释了它：https://stackoverflow.com/questions/68266736/tensorflow-python-framework-errors-impl-invalidargumenterror-all-dimensions-exc#:~:text=It%20is%20likely%20originates%20from，摘要()%20.

使用 keras 生成器进行流式训练数据会产生奇怪的张量大小不匹配错误 - 张量流代码太不透明，无法调试问题

问题描述投票：0回答：1

1个回答

最新问题

使用 keras 生成器进行流式训练数据会产生奇怪的张量大小不匹配错误 - 张量流代码太不透明，无法调试问题

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1