将分类加权损失函数集成到我的代码中后,准确性下降了

问题描述 投票:0回答:1

我想提高准确性,并且我有不平衡数据集:akiec:229,bcc:360,bkl:769,df:81,mel:779,vasc:99。为了解决这个问题,我选择将分类加权损失机制集成到模型中。然而,尽管进行了这样的调整,我还是注意到准确性随后下降了。这个意想不到的结果让我怀疑实施过程中出现了错误。您能否帮助我识别和解决任何潜在的错误以优化模型的性能?

# Define the directories
train_dir = '/content/drive/MyDrive/ikinciasamadataset/Train'
test_dir = '/content/drive/MyDrive/ikinciasamadataset/Test'
validation_dir = '/content/drive/MyDrive/ikinciasamadataset/Validation'

# Determine number of classes
numClasses = len(os.listdir(train_dir))

# Define grid of hyperparameters
param_grid = {
    'learning_rate': [0.001],
    'batch_size': [16],
}

best_accuracy = 0
best_params = None

# Perform grid search
for params in ParameterGrid(param_grid):
    # Load pre-trained VGG19 model for each grid search iteration
    base_model = VGG19(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
    for layer in base_model.layers:
        layer.trainable = False

    # Define function to extract features from the last convolutional layer
    def extract_features(generator, model):
        features = model.predict(generator)
        return features.reshape((len(generator.filenames), -1))

    # Create data generators
    train_datagen = ImageDataGenerator(
        rescale=1./255,
        rotation_range=20,
        width_shift_range=0.2,
        height_shift_range=0.2,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True,
        fill_mode='nearest')

    validation_datagen = ImageDataGenerator(rescale=1./255)

    train_generator = train_datagen.flow_from_directory(
        train_dir,
        target_size=(224, 224),
        batch_size=params['batch_size'],
        class_mode='categorical'
    )

    validation_generator = validation_datagen.flow_from_directory(
        validation_dir,
        target_size=(224, 224),
        batch_size=params['batch_size'],
        class_mode='categorical'
    )
'''

Probably there is a mistake here

'''

    # Define class indices
    class_indices = {
        'akiec': 0,
        'bcc': 1,
        'bkl': 2,
        'df': 3,
        'mel': 4,
        'vasc': 5
    }

    ## Calculate class counts
    class_counts = {}
    for class_name in os.listdir(train_dir):
        class_counts[class_name] = len(os.listdir(os.path.join(train_dir, class_name)))

    # Compute class weights
    class_weights = {}
    total_samples = sum(class_counts.values())
    for class_name, class_count in class_counts.items():
        class_weights[class_indices[class_name]] = total_samples / (class_count * len(class_counts))



    # Define the model architecture to accept extracted features as input
    inputs = Input(shape=(combined_data_train.shape[1],))
    x = Dense(256, activation='relu')(inputs)
    predictions = Dense(numClasses, activation='softmax')(x)
    model = Model(inputs=inputs, outputs=predictions)

    # Compile the model with current hyperparameters and class weights
    model.compile(optimizer=SGD(learning_rate=params['learning_rate']), loss='sparse_categorical_crossentropy', metrics=['accuracy'], sample_weight_mode='temporal')

    # Define early stopping
    early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

    # Train the model with early stopping
    num_epochs = 50  # You can adjust the number of epochs here
    history = model.fit(
        x=combined_data_train,
        y=train_generator.labels,
        epochs=num_epochs,
        batch_size=params['batch_size'],
        validation_data=(combined_data_validation, validation_generator.labels),
        callbacks=[early_stopping],
        class_weight=class_weights,
        verbose=1
    )

    model.save('best_vgg19_model_with_age.h5')

    # Evaluate the model on validation data
    _, val_accuracy = model.evaluate(combined_data_validation, validation_generator.labels, verbose=0)

    # Update best accuracy and best parameters if necessary
    if val_accuracy > best_accuracy:
        best_accuracy = val_accuracy
        best_params = params

# Print best parameters and accuracy
print('Best parameters:', best_params)
print('Best validation accuracy:', best_accuracy)


# Load the best model
best_model = load_model('best_vgg19_model_with_age.h5')

python machine-learning artificial-intelligence
1个回答
0
投票

调整类别权重计算:通过将样本总数除以每个类别的样本数来计算类别权重,为样本较少的类别赋予更多权重。

查看模型架构和训练参数:确保您的模型架构和训练参数适合您的数据集。调整层数、单元数、学习率、批量大小和时期。

使用不同的损失函数: 根据您的标签格式尝试使用不同的损失函数,例如“categorical_crossentropy”或“binary_crossentropy”而不是“sparse_categorical_crossentropy”。

应用数据增强:使用旋转、移位、缩放和翻转等技术增加训练数据的多样性,以提高模型的泛化能力。

微调预训练模型:如果您的数据集足够大以允许模型学习更多特定特征,请考虑微调预训练 VGG19 模型,而不是冻结所有层。

© www.soinside.com 2019 - 2024. All rights reserved.