TF / Keras的损失函数波动性

问题描述 投票:0回答:1

基于30个功能,我正在运行一个相当简单的Keras分类。我还不了解的是,如果我增加了进入模型的行数,为什么损失函数变得更加不稳定:

df = pd.read_csv("cancer_classification.csv")
df = df.iloc[:50]

# split data
X = df.drop("benign_0__mal_1", axis=1).values
y = df["benign_0__mal_1"].values

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.50, random_state=101)

# scale
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit_transform(X_train)
scaler.transform(X_test)

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

print(X_train.shape)
# ---> (50, 30)

model = Sequential()
model.add(Dense(30, activation="relu"))
model.add(Dense(15, activation="relu"))
model.add(Dense(5, activation="relu"))

# binary classification - so last layer has sigmoid activation function
model.add(Dense(1, activation="sigmoid"))
model.compile(loss="binary_crossentropy", optimizer="adam")

# we will overfit to show how it looks like - so 1000 epochs
model.fit(X_train, y_train, epochs=1000, validation_data=(X_test, y_test))

# plotting it out - we leave out first 10 rows so we dont skew chart too much with high loss number on the beginning
loss_df = pd.DataFrame(model.history.history)

loss_df = loss_df.iloc[10:]
loss_df.plot()
plt.show()

最初的想法是在val_loss开始上升而损耗持续下降时可视化过度拟合。我不知道为什么放置500行会在损失函数中产生如此大的波动。

50 df rows

500 df rows

tensorflow keras loss
1个回答
0
投票

尝试随机整理数据。并检查前50行和500行的数字类分布,我认为这可能是未改组的结果。也可能是由于您要进入模型的功能引起的。

© www.soinside.com 2019 - 2024. All rights reserved.