我想提高准确性,并且我有不平衡数据集:akiec:229,bcc:360,bkl:769,df:81,mel:779,vasc:99。为了解决这个问题,我选择将分类加权损失机制集成到模型中。然而,尽管进行了这样的调整,我还是注意到准确性随后下降了。这个意想不到的结果让我怀疑实施过程中出现了错误。您能否帮助我识别和解决任何潜在的错误以优化模型的性能?
# Define the directories
train_dir = '/content/drive/MyDrive/ikinciasamadataset/Train'
test_dir = '/content/drive/MyDrive/ikinciasamadataset/Test'
validation_dir = '/content/drive/MyDrive/ikinciasamadataset/Validation'
# Determine number of classes
numClasses = len(os.listdir(train_dir))
# Define grid of hyperparameters
param_grid = {
'learning_rate': [0.001],
'batch_size': [16],
}
best_accuracy = 0
best_params = None
# Perform grid search
for params in ParameterGrid(param_grid):
# Load pre-trained VGG19 model for each grid search iteration
base_model = VGG19(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
for layer in base_model.layers:
layer.trainable = False
# Define function to extract features from the last convolutional layer
def extract_features(generator, model):
features = model.predict(generator)
return features.reshape((len(generator.filenames), -1))
# Create data generators
train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
validation_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(224, 224),
batch_size=params['batch_size'],
class_mode='categorical'
)
validation_generator = validation_datagen.flow_from_directory(
validation_dir,
target_size=(224, 224),
batch_size=params['batch_size'],
class_mode='categorical'
)
'''
Probably there is a mistake here
'''
# Define class indices
class_indices = {
'akiec': 0,
'bcc': 1,
'bkl': 2,
'df': 3,
'mel': 4,
'vasc': 5
}
## Calculate class counts
class_counts = {}
for class_name in os.listdir(train_dir):
class_counts[class_name] = len(os.listdir(os.path.join(train_dir, class_name)))
# Compute class weights
class_weights = {}
total_samples = sum(class_counts.values())
for class_name, class_count in class_counts.items():
class_weights[class_indices[class_name]] = total_samples / (class_count * len(class_counts))
# Define the model architecture to accept extracted features as input
inputs = Input(shape=(combined_data_train.shape[1],))
x = Dense(256, activation='relu')(inputs)
predictions = Dense(numClasses, activation='softmax')(x)
model = Model(inputs=inputs, outputs=predictions)
# Compile the model with current hyperparameters and class weights
model.compile(optimizer=SGD(learning_rate=params['learning_rate']), loss='sparse_categorical_crossentropy', metrics=['accuracy'], sample_weight_mode='temporal')
# Define early stopping
early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
# Train the model with early stopping
num_epochs = 50 # You can adjust the number of epochs here
history = model.fit(
x=combined_data_train,
y=train_generator.labels,
epochs=num_epochs,
batch_size=params['batch_size'],
validation_data=(combined_data_validation, validation_generator.labels),
callbacks=[early_stopping],
class_weight=class_weights,
verbose=1
)
model.save('best_vgg19_model_with_age.h5')
# Evaluate the model on validation data
_, val_accuracy = model.evaluate(combined_data_validation, validation_generator.labels, verbose=0)
# Update best accuracy and best parameters if necessary
if val_accuracy > best_accuracy:
best_accuracy = val_accuracy
best_params = params
# Print best parameters and accuracy
print('Best parameters:', best_params)
print('Best validation accuracy:', best_accuracy)
# Load the best model
best_model = load_model('best_vgg19_model_with_age.h5')
调整类别权重计算:通过将样本总数除以每个类别的样本数来计算类别权重,为样本较少的类别赋予更多权重。
查看模型架构和训练参数:确保您的模型架构和训练参数适合您的数据集。调整层数、单元数、学习率、批量大小和时期。
使用不同的损失函数: 根据您的标签格式尝试使用不同的损失函数,例如“categorical_crossentropy”或“binary_crossentropy”而不是“sparse_categorical_crossentropy”。
应用数据增强:使用旋转、移位、缩放和翻转等技术增加训练数据的多样性,以提高模型的泛化能力。
微调预训练模型:如果您的数据集足够大以允许模型学习更多特定特征,请考虑微调预训练 VGG19 模型,而不是冻结所有层。