ValueError:使用 ModelCheckpoint 保存我的模型时无法创建数据集(名称已存在)

问题描述 投票:0回答:3

我正在尝试运行 Keras 官方代码示例“使用 Swin Transformers 进行图像分类”。该代码起初工作正常,但在我添加了一个 ModelCheckpoint 以将 hdf5 模型保存在 model.fit 方法的回调参数中{即model.fit(..., callbacks=[ModelCheckpoint(...)], ..., )},我收到以下错误 [ValueError: 无法创建数据集(名称已存在)]。这里的“名”指的是什么?我该如何解决这个问题?

我在本地设备(windows10,tensorflow2.8.0)和Google Colab(tensorflow2.8.2)上运行代码,都出现了上述错误。

完整的代码示例可以在这里找到 [https://keras.io/examples/vision/swin_transformers/] ,我的代码和代码示例之间的唯一区别是我为 ModelCheckpoint 添加了一行代码。添加代码的位置和报错信息如下所示

代码片段

model = keras.Model(input, output)
model.compile(
    loss=keras.losses.CategoricalCrossentropy(label_smoothing=label_smoothing),
    optimizer=tfa.optimizers.AdamW(
        learning_rate=learning_rate, weight_decay=weight_decay
    ),
    metrics=[
        keras.metrics.CategoricalAccuracy(name="accuracy"),
        keras.metrics.TopKCategoricalAccuracy(5, name="top-5-accuracy"),
    ],
)

history = model.fit(
    x_train,
    y_train,
    batch_size=batch_size,
    epochs=num_epochs,
    validation_split=validation_split,
    # 👇 I added one line of code
    callbacks = keras.callbacks.ModelCheckpoint('lowest_loss.hdf5', monitor='loss', verbose=0, save_best_only=True, save_weights_only=True)
)

这是我得到的错误

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-10-c96b13609516> in <module>()
     18     validation_split=validation_split,
     19     # 👇 I added one line of code
---> 20     callbacks = keras.callbacks.ModelCheckpoint('lowest_loss.hdf5', monitor='loss', verbose=0, save_best_only=True, save_weights_only=True)
     21 )

2 frames
/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
     65     except Exception as e:  # pylint: disable=broad-except
     66       filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67       raise e.with_traceback(filtered_tb) from None
     68     finally:
     69       del filtered_tb

/usr/local/lib/python3.7/dist-packages/h5py/_hl/group.py in create_dataset(self, name, shape, dtype, data, **kwds)
    146                     group = self.require_group(parent_path)
    147 
--> 148             dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
    149             dset = dataset.Dataset(dsid)
    150             return dset

/usr/local/lib/python3.7/dist-packages/h5py/_hl/dataset.py in make_new_dset(parent, shape, dtype, data, name, chunks, compression, shuffle, fletcher32, maxshape, compression_opts, fillvalue, scaleoffset, track_times, external, track_order, dcpl, allow_unknown_filter)
    135 
    136 
--> 137     dset_id = h5d.create(parent.id, name, tid, sid, dcpl=dcpl)
    138 
    139     if (data is not None) and (not isinstance(data, Empty)):

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5d.pyx in h5py.h5d.create()

ValueError: Unable to create dataset (name already exists)
python tensorflow keras transformer-model
3个回答
0
投票

很可能,当层名称在预训练模型和下游任务网络的命名空间中重复时,就会发生此错误。选择一个唯一的名称来调用下游任务网络的每一层可能很有用。为您创建的所有图层添加

name='some_unique_name'
以解决问题。


0
投票

试试这个,它对我有用

for i in range(len(model.weights)):
    model.weights[i]._handle_name = model.weights[i].name + "_" + str(i)

-1
投票

如果您使用TensorFlow 2.0或以上版本,您可以尝试将“.hdf5”文件更改为“.tf”。遇到同样的问题,我将文件扩展名更改如下:

save_dir = os.path.join(os.getcwd(), "save_models")
filepath = "cnn_cnn_weights.{epoch:02d}-{val_loss:.4f}--0fold.tf"
checkpoint = ModelCheckpoint(os.path.join(save_dir, filepath),
                             monitor="val_loss", verbose=1, save_best_only=False, mode='min')
© www.soinside.com 2019 - 2024. All rights reserved.