Keras 二元分类器总是预测同一类

Question

我在使用 Keras 构建神经网络时遇到了一个问题，基本上网络学会了始终预测训练集中具有更多实例的类。

我已将代码分为几个部分：

数据准备

data = purchase_data.copy()
labelencoder = LabelEncoder()
target_sum = 120
data.loc[data['sales'] <= target_sum, 'sales'] = False
data.loc[data['sales'] > target_sum, 'sales'] = True

print("\n\nColumn Names & formatting:\n")
for col in data.columns.values.tolist():
    if data[col].dtype == "object" or data[col].dtype == "bool":
        print("{:<30}".format(col), ":", "{:<30}".format(str(data[col].dtype)) , "Formatting to LabelEncoding")
        data[col] = labelencoder.fit_transform(data[col])
    else:
        print("{:<30}".format(col), ":", "{:<30}".format(str(data[col].dtype)) , "No formatting required.")

# Converting datetime to float
data['accessed_date'] = data['accessed_date'].apply(lambda x: x.timestamp())

array = data.values 
class_column = 'sales' # The column I want to predict

X = np.delete(array, data.columns.get_loc(class_column), axis=1) # Removing class_column column
Y = array[:,data.columns.get_loc(class_column)] # Selecting class_column column
Y = Y[:, np.newaxis] # Resetting the shape value

# Normalizing the input values (excluding the class value)
scaler = preprocessing.Normalizer().fit(X)
X = scaler.transform(X)

分割数据集

seed = 1
X_train, X_test, Y_train, Y_test  = train_test_split(X, Y, test_size=0.33, random_state=seed, shuffle = True, stratify=(Y))

神经网络

tf.random.set_seed(seed)


# Building the neural network
modeldl = Sequential()

modeldl.add(Dense(64, input_dim=X.shape[1], activation='relu', kernel_initializer=he_normal()))
modeldl.add(Dropout(0.2))

modeldl.add(Dense(32, activation='relu', kernel_initializer=he_normal()))
modeldl.add(Dropout(0.2))

modeldl.add(Dense(1, activation='sigmoid', kernel_initializer=he_normal()))


# Compile model
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-04)
modeldl.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['acc'])

results = modeldl.fit(X_train, Y_train, epochs=80, batch_size=1000, verbose=1)

混淆矩阵 训练结束时，我们得到这个混淆矩阵：

	积极	负面
积极	0（TP）	21719（FN）
负面	0（FP）	22620（田纳西州）

问题我该如何解决这个问题？

我已经尝试过：

玩转超参数
预测另一列而不是“销售额”
调整网络大小
固定一定的学习率
更改激活函数
添加/删除滤除层

其他信息 该数据集包含大约 50% 的“0”类和 50% 的“1”类。我正在使用这个数据集，但我删除了与回报相关的列和行。

Answer 1

从您的代码来看，没有犯任何明显的错误。我建议你在这里做一些事情：

研究EDA部分
尝试将线性回归拟合到 EDA 中确定的有前景的特征（例如成员资格状态），并通过添加更多您认为有用的特征来改进您的分类器

神经网络一般只有在用尽传统统计方法时才会使用，因为数据量非常大。但是，您需要在要预测的输入和输出特征之间具有潜在的相关性。

在您的情况下，您处理时间序列数据和一组不同的其他输入（例如压缩的字节），并且网络可能在这些组合特征中没有可以学习的任何有用的相关性。

Keras 二元分类器总是预测同一类

问题描述投票：0回答：1

1个回答

最新问题

Keras 二元分类器总是预测同一类

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1