对于我的第一次机器学习经验,我有一个基本的分类要做。
我有3个不同的文件夹。
train_path = './dataset/pneumonia/train/'
test_path = './dataset/pneumonia/test/'
val_path = './dataset/pneumonia/val/
每个文件夹。
os.listdir(train_path)
返回
['NORMAL', 'PNEUMONIA']
在每一组:
我使用tensorflow 。
from tensorflow.keras.preprocessing.image import ImageDataGenerator
image_gen = ImageDataGenerator(
rotation_range=10, # rotate the image 10 degrees
width_shift_range=0.10, # Shift the pic width by a max of 5%
height_shift_range=0.10, # Shift the pic height by a max of 5%
rescale=1/255, # Rescale the image by normalzing it.
shear_range=0.1, # Shear means cutting away part of the image (max 10%)
zoom_range=0.1, # Zoom in by 10% max
horizontal_flip=True, # Allow horizontal flipping
fill_mode='nearest' # Fill in missing pixels with the nearest filled value
)
image_gen.flow_from_directory(train_path)
image_gen.flow_from_directory(test_path)
我创建一个模型(基本模型)。
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(image_width, image_height, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), input_shape=(image_width, image_height, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, (3, 3), input_shape=(image_width, image_height, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(256, (3, 3), input_shape=(image_width, image_height, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(512, (3, 3), input_shape=(image_width, image_height, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
# Dropouts help reduce overfitting by randomly turning neurons off during training.
# Here we say randomly turn off 50% of neurons.
model.add(Dropout(0.5))
# Last layer, remember its binary so we use sigmoid
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
然后我训练它。
train_image_gen = image_gen.flow_from_directory(
train_path,
target_size=image_shape[:2],
color_mode='rgb',
batch_size=batch_size,
class_mode='binary'
)
results = model.fit_generator(train_image_gen,epochs=20,
validation_data=test_image_gen,
callbacks=[early_stop, board])
到目前为止,这么好的结果是正确的。
pred_probabilities = model.predict_generator(test_image_gen)
predictions = pred_probabilities > 0.5
confusion_matrix(test_image_gen.classes,predictions)
我获得了相当不错的结果 :
我的问题是,当我想预测图像时,它返回的结果远非正确。
val_image_gen = image_gen.flow_from_directory(
val_path,
target_size=image_shape[:2],
color_mode='rgb',
class_mode='binary',
)
pred_probabilities = model.predict_generator(val_image_gen)
predictions = pred_probabilities > 0.5
以下是我获得的一些输出结果 :
precision recall f1-score support
0 0.51 0.57 0.53 317
1 0.51 0.44 0.47 317
accuracy 0.51 634
macro avg 0.51 0.51 0.50 634
weighted avg 0.51 0.51 0.50 634
这个数据集的混乱矩阵如下。
[[180 137]
[176 141]]
你的代码有几个问题。
ImageDataGenerator(rescale=1/255)
用于测试和验证。您的测试结果看起来像是从未经训练的模型中得到的。检查是否有 model
您正在运行测试的对象与您正在训练的对象是相同的。您可能想使用 model.save()
和 load_model()
函数来保存训练后的模型权重。
我将.NET函数替换为
val_image_gen = image_gen.flow_from_directory(
val_path,
target_size=image_shape[:2],
color_mode='rgb',
class_mode='binary',
)
用。
val_image_gen = image_gen.flow_from_directory(
val_path,
target_size=image_shape[:2],
color_mode='rgb',
batch_size=batch_size,
class_mode='binary',
shuffle=False
)
我得到了很好的结果。
[[269 48]
[ 3 314]]