我提出了一个示例,其中tf.keras
模型无法从非常简单的数据中学习。我正在使用tensorflow-gpu==2.0.0
,keras==2.3.0
和Python 3.7。在文章结尾,我给出了Python代码来重现我观察到的问题。
样本是形状为(6、16、16、16、16、3)的Numpy数组。为了使事情变得非常简单,我只考虑充满1和0的数组。带有1的数组的标号为1,带有0的数组的标号为0。我可以使用以下代码生成一些样本(以下为n_samples = 240
):
def generate_fake_data():
for j in range(1, 240 + 1):
if j < 120:
yield np.ones((6, 16, 16, 16, 3)), np.array([0., 1.])
else:
yield np.zeros((6, 16, 16, 16, 3)), np.array([1., 0.])
为了在tf.keras
模型中输入此数据,我使用以下代码创建了tf.data.Dataset
的实例。这实际上将创建BATCH_SIZE = 12
个样本的改组批次。
def make_tfdataset(for_training=True):
dataset = tf.data.Dataset.from_generator(generator=lambda: generate_fake_data(),
output_types=(tf.float32,
tf.float32),
output_shapes=(tf.TensorShape([6, 16, 16, 16, 3]),
tf.TensorShape([2])))
dataset = dataset.repeat()
if for_training:
dataset = dataset.shuffle(buffer_size=1000)
dataset = dataset.batch(BATCH_SIZE)
dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
return dataset
我提出以下模型对样本进行分类:
def create_model(in_shape=(6, 16, 16, 16, 3)):
input_layer = Input(shape=in_shape)
reshaped_input = Lambda(lambda x: K.reshape(x, (-1, *in_shape[1:])))(input_layer)
conv3d_layer = Conv3D(filters=64, kernel_size=8, strides=(2, 2, 2), padding='same')(reshaped_input)
relu_layer_1 = ReLU()(conv3d_layer)
pooling_layer = GlobalAveragePooling3D()(relu_layer_1)
reshape_layer_1 = Lambda(lambda x: K.reshape(x, (-1, in_shape[0] * 64)))(pooling_layer)
expand_dims_layer = Lambda(lambda x: K.expand_dims(x, 1))(reshape_layer_1)
conv1d_layer = Conv1D(filters=1, kernel_size=1)(expand_dims_layer)
relu_layer_2 = ReLU()(conv1d_layer)
reshape_layer_2 = Lambda(lambda x: K.squeeze(x, 1))(relu_layer_2)
out = Dense(units=2, activation='softmax')(reshape_layer_2)
return Model(inputs=[input_layer], outputs=[out])
[模型使用Adam(具有默认参数)和binary_crossentropy
损失进行了优化:
clf_model = create_model()
clf_model.compile(optimizer=Adam(),
loss='categorical_crossentropy',
metrics=['accuracy', 'categorical_crossentropy'])
clf_model.summary()
的输出是:
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 6, 16, 16, 16, 3) 0
_________________________________________________________________
lambda (Lambda) (None, 16, 16, 16, 3) 0
_________________________________________________________________
conv3d (Conv3D) (None, 8, 8, 8, 64) 98368
_________________________________________________________________
re_lu (ReLU) (None, 8, 8, 8, 64) 0
_________________________________________________________________
global_average_pooling3d (Gl (None, 64) 0
_________________________________________________________________
lambda_1 (Lambda) (None, 384) 0
_________________________________________________________________
lambda_2 (Lambda) (None, 1, 384) 0
_________________________________________________________________
conv1d (Conv1D) (None, 1, 1) 385
_________________________________________________________________
re_lu_1 (ReLU) (None, 1, 1) 0
_________________________________________________________________
lambda_3 (Lambda) (None, 1) 0
_________________________________________________________________
dense (Dense) (None, 2) 4
=================================================================
Total params: 98,757
Trainable params: 98,757
Non-trainable params: 0
该模型训练了500个纪元,如下所示:
train_ds = make_tfdataset(for_training=True)
history = clf_model.fit(train_ds,
epochs=500,
steps_per_epoch=ceil(240 / BATCH_SIZE),
verbose=1)
在500个时期内,模型损失保持在0.69附近,并且永远不会低于0.69。如果我将学习率设置为
1e-2
而不是1e-3
,则也是如此。数据非常简单(仅为0和1)。天真的,我希望模型具有比0.6更好的准确性。实际上,我希望它可以快速达到100%的准确性。我做错了吗?
import numpy as np
import tensorflow as tf
import tensorflow.keras.backend as K
from math import ceil
from tensorflow.keras.layers import Input, Dense, Lambda, Conv1D, GlobalAveragePooling3D, Conv3D, ReLU
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
BATCH_SIZE = 12
def generate_fake_data():
for j in range(1, 240 + 1):
if j < 120:
yield np.ones((6, 16, 16, 16, 3)), np.array([0., 1.])
else:
yield np.zeros((6, 16, 16, 16, 3)), np.array([1., 0.])
def make_tfdataset(for_training=True):
dataset = tf.data.Dataset.from_generator(generator=lambda: generate_fake_data(),
output_types=(tf.float32,
tf.float32),
output_shapes=(tf.TensorShape([6, 16, 16, 16, 3]),
tf.TensorShape([2])))
dataset = dataset.repeat()
if for_training:
dataset = dataset.shuffle(buffer_size=1000)
dataset = dataset.batch(BATCH_SIZE)
dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
return dataset
def create_model(in_shape=(6, 16, 16, 16, 3)):
input_layer = Input(shape=in_shape)
reshaped_input = Lambda(lambda x: K.reshape(x, (-1, *in_shape[1:])))(input_layer)
conv3d_layer = Conv3D(filters=64, kernel_size=8, strides=(2, 2, 2), padding='same')(reshaped_input)
relu_layer_1 = ReLU()(conv3d_layer)
pooling_layer = GlobalAveragePooling3D()(relu_layer_1)
reshape_layer_1 = Lambda(lambda x: K.reshape(x, (-1, in_shape[0] * 64)))(pooling_layer)
expand_dims_layer = Lambda(lambda x: K.expand_dims(x, 1))(reshape_layer_1)
conv1d_layer = Conv1D(filters=1, kernel_size=1)(expand_dims_layer)
relu_layer_2 = ReLU()(conv1d_layer)
reshape_layer_2 = Lambda(lambda x: K.squeeze(x, 1))(relu_layer_2)
out = Dense(units=2, activation='softmax')(reshape_layer_2)
return Model(inputs=[input_layer], outputs=[out])
train_ds = make_tfdataset(for_training=True)
clf_model = create_model(in_shape=(6, 16, 16, 16, 3))
clf_model.summary()
clf_model.compile(optimizer=Adam(lr=1e-3),
loss='categorical_crossentropy',
metrics=['accuracy', 'categorical_crossentropy'])
history = clf_model.fit(train_ds,
epochs=500,
steps_per_epoch=ceil(240 / BATCH_SIZE),
verbose=1)
由于您的标签可以是0或1,所以我建议将激活函数更改为softmax
,并将输出神经元的数量更改为2。现在,最后一层(输出)将如下所示:
out = Dense(units=2, activation='softmax')(reshaped_conv_features)
我之前也遇到过同样的问题,并发现由于是1或0的概率是相关的,从某种意义上说,这不是多标签分类问题,所以Softmax是更好的选择。 Sigmoid分配概率,而不考虑其他可能的输出标签。