Keras CNN - 总是在平衡数据集中预测相同的类,但准确性很高 - 为什么?

问题描述 投票:2回答:1

我面临以下问题,我首先想给你代码,然后详细解释:

#Just try to implement the modular
from keras.models import Sequential
from keras.layers import Convolution1D, MaxPooling1D
from keras.layers import Dense, Dropout, Activation, Flatten, BatchNormalization
from keras.optimizers import SGD
import numpy
from numpy import newaxis

dataset = numpy.loadtxt("example.csv", delimiter = ",")
X = dataset[:, 0:200]
Y = dataset[:, 200]
s1 = X.shape[0]
s2 = X.shape[1]
newshape = (s1, s2, 1)
X = numpy.reshape(X, newshape)
#print(X.shape[2])
model = Sequential()
model.add(Convolution1D(16, 3, border_mode = "same", input_shape = (200, 1)))
#model.add(Dense(12, input_dim=200, init='uniform', activation='relu'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling1D(pool_length = 2))
model.add(Convolution1D(32, 3, border_mode = "same"))
model.add(Convolution1D(32, 3, border_mode = "same"))
model.add(Activation('relu'))
model.add(MaxPooling1D(pool_length = 2))
model.add(Convolution1D(32, 3, border_mode = "same", activation = 'tanh'))
model.add(Convolution1D(32, 3, border_mode = "same", activation = 'tanh'))
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(100, activation = 'tanh'))
model.add(Dropout(0.2))
model.add(Dense(50, activation = 'tanh'))
model.add(Dropout(0.2))
model.add(Dense(20, activation = 'tanh'))
model.add(Dropout(0.2))
model.add(Dense(1))
model.add(Activation('sigmoid'))
print("here1")

sgd = SGD(lr=0.1, decay=0.001, momentum=0.9, nesterov=True)
model.compile(loss = "binary_crossentropy", optimizer = sgd, metrics = ['accuracy'] )
print('here2')
model.fit(X, Y, batch_size = 64, nb_epoch = 1)
#print("here3")
#scores = model.evaluate(X, Y)
score = model.evaluate(X, Y, verbose = 0)
print(score)
output = model.predict(X, batch_size = 20,  verbose = 0)
print(output[0:100])
#print("%s: %.2f%%" % (model.metrics_names[1], score[1]*100))
#scores = model.evaluate(X, Y)

我正在做的是以下内容:作为输入(X),我提供网络DNA代码(编码为数字),标签(Y)是二进制(0或1)。我想预测Y.当我运行模型时,它表现得非常奇怪,至少在某种我无法理解的方式:

enter image description here

现在想象一下,这是我的问题:在预测的标签输出(行打印的结果(输出[0:100]))上,模型总是预测为0.但是,如上所述,准确度似乎非常高。这是为什么?请注意,数据集是平衡的,这意味着一半的观测值标记为1,其中一半标记为0.因此,将所有值预测为0应该会导致准确率为0.5。

编辑:

当我被要求提供数据时,这里有一个截图。每行的最后一个数字是标签。

enter image description here

python keras
1个回答
0
投票

也许您的数据没有正确缩放。作为调试步骤,您可以在最后一层使用线性激活功能并查看结果。

© www.soinside.com 2019 - 2024. All rights reserved.