My One Class Autoencoder 的 ROC 曲线似乎是倒置的

问题描述 投票:0回答:0

我正在用 50,000 个良性样本训练一级自动编码器。然后,在测试模型时,我给它 10,000 个良性样本和 10,000 个异常样本。模型的 F1 和召回分数分别为 0.974 和 0.992。它能够对有 9600 个异常样本和 10,400 个良性样本进行分类(因此并不完美,但表现非常好)。然而,当我去绘制 ROC 曲线并计算 AUC 时,我遇到了 0.005 的值。

这怎么可能?

我正在使用 sklearns roc_curve() 函数来计算 fpr 和 tpr,这样我就可以绘制曲线,然后使用 auc() 函数来获取数值。

我不明白我的模型怎么可能表现得这么好,但不知何故仍然有这么低的 AUC。如果有人有任何提示,他们将不胜感激。我在下面附上了我的自动编码器模型:随时询问更多信息

from keras.layers import Input, Dense, Dropout
from keras.models import Model 
from keras import regularizers
from keras.callbacks import EarlyStopping
from sklearn.metrics import roc_curve, auc, f1_score, recall_score
from sklearn.svm import OneClassSVM
from sklearn import metrics
import numpy as np

class Autoencoder:
  
  def __init__(self, encoding_dim=64, activity_regularizer=10e-6):
    self.encoding_dim = encoding_dim
    self.activity_regularizer = activity_regularizer
    self.autoencoder = None
    self.threshold = None
    
  def fit(self, x_train, x_valid, epochs=50, batch_size=256, earlystop_patience=10):
    # Input Shape
    input_dim = x_train.shape[1]

    # Input Layer
    input_layer = Input(shape = (input_dim,))

    # Encoder Layers
    hidden_layer1 = Dense(512, activation='tanh', activity_regularizer=regularizers.l1(self.activity_regularizer))(input_layer)
    hidden_layer1 = Dropout(0.5)(hidden_layer1)

    hidden_layer2 = Dense(256, activation='tanh', activity_regularizer=regularizers.l1(self.activity_regularizer))(hidden_layer1)
    hidden_layer2 = Dropout(0.5)(hidden_layer2)

    encoded = Dense(self.encoding_dim, activation='tanh', activity_regularizer=regularizers.l1(self.activity_regularizer))(hidden_layer2)

    # Decoder Layers
    hidden_layer4 = Dense(256, activation='tanh', activity_regularizer=regularizers.l1(self.activity_regularizer))(encoded)
    hidden_layer4 = Dropout(0.5)(hidden_layer4)

    hidden_layer5 = Dense(512, activation='tanh', activity_regularizer=regularizers.l1(self.activity_regularizer))(hidden_layer4)
    hidden_layer5 = Dropout(0.5)(hidden_layer5)

    decoded = Dense(input_dim, activation='sigmoid')(hidden_layer5)

    # Define autoencoder
    self.autoencoder = Model(inputs = input_layer, outputs = decoded)

    # Compile Autoencoder
    self.autoencoder.compile(optimizer = 'adam', loss = 'mean_squared_error')

    # Early stopping
    earlystop_callback = EarlyStopping(monitor='val_loss', patience=earlystop_patience, verbose=1, mode='min')

    # Train
    self.autoencoder.fit(x_train, x_train, epochs = epochs, batch_size = batch_size, validation_data = (x_valid, x_valid), callbacks=[earlystop_callback])

  def evaluate(self, x_test, true):
    pred = self.autoencoder.predict(x_test)

    # Reconstruction Error
    mse = np.mean(np.power(x_test - pred, 2), axis = 1)
    np.savetxt('mse.csv', mse, delimiter=',')
    # Threshold Calculation
    self.threshold = np.mean(mse)
    print("Threshold: ")
    print(self.threshold)
    print("\n")

    # True Label Calculations for AE
    ae_test = np.where(mse <= self.threshold, 1, -1)
    anomoly_counter = 0
    normal_counter = 0
    np.savetxt('ae_test.csv', ae_test, delimiter=',')

    for val in ae_test:
      if val == -1:
        anomoly_counter += 1
      elif val == 1:
        normal_counter += 1

    # AUC Calculations
    fpr, tpr, _ = roc_curve(true, mse, pos_label=1)
    auc_num = auc(fpr, tpr)


    # F1 Score
    f1 = f1_score(true, ae_test)


    # Recall Score
    recall = recall_score(true, ae_test)

    print('AUC: {:.3f}'.format(auc_num))       
    print('Recall: {:.3f}'.format(recall))
    print('F1 Score: {:.3f}'.format(f1))
    print('Anomaly: {:.3f}'.format(anomoly_counter))
    print('Positive Class: {:.3f}'.format(normal_counter))

    return fpr, tpr

  def save_model(self, model_file):
    with open(model_file, 'wb') as file:
      pickle.dump(self, file)
      

我将预测的二进制值与实际值进行了比较,它们确实对齐了。因此,所有预测为异常值 (1) 的值实际上都是异常值,而预测为异常值 (-1) 的值实际上是异常值。

此外,我正在用相同的数据训练 OCSVM,并获得 0.997 的 AUC 以及反射 f1 和召回分数。

我几乎 100% 肯定 ROC 只是以某种方式为 AE 反转,但我不知道如何证明它或反转它。

enter image description here

machine-learning autoencoder roc auc one-class-classification
© www.soinside.com 2019 - 2024. All rights reserved.