Keras.fit_generator花费更多的时间

Question

我正在使用Keras进行图像分类，在训练样本中有8k图像（输入），在测试样本中有2k图像（输入），定义为25。我注意到纪元非常慢（第一次迭代大约需要一个小时）。

有人可以建议我如何克服这个问题，这是需要很多时间的原因吗？

下面的代码..

PART-1
initialise neural network
from keras.models import Sequential

#package to perfom first layer , which is convolution , using 2d as it is for image , for video it will be 3d
from keras.layers import Convolution2D

#to perform max pooling on convolved layer
from keras.layers import MaxPool2D

#to convert the pool feature map into large feature vector, will be input for ANN
from keras.layers import Flatten 

#to add layeres on ANN
from keras.layers import Dense

#STEP -1
#Initializing CNN
classifier = Sequential()

#add convolution layer
classifier.add(Convolution2D(filters=32,kernel_size=(3,3),strides=(1, 1),input_shape= (64,64,3),activation='relu'))

#filters - Number of feature detecters that we are going to apply in image

#kernel_size - dimension of feature detector

#strides moving thru one unit at a time

#input shape - shape of the input image on which we are going to apply filter thru convolution opeation,
#we will have to covert the image into that shape in image preprocessing before feeding it into convolution
#channell 3 for rgb and 1 for bw , and  dimension of pixels

#activation - function we use to avoid non linearity in image

#STEP -2 

#add pooling
#this step will significantly reduce the size of feature map , and makes it easier for computation

classifier.add(MaxPool2D(pool_size=(2,2)))

#pool_size - factor by which to downscale


#STEP -3
#flattern the feature map

classifier.add(Flatten())

#STEP -4 
#hidden layer
classifier.add(Dense(units=128,activation='relu',kernel_initializer='uniform'))

#output layer
classifier.add(Dense(units=1,activation='sigmoid'))


#Compiling the CNN using stochastic gradient descend

classifier.compile(optimizer='adam',loss = 'binary_crossentropy',
                  metrics=['accuracy'])

#loss function should be categorical_crossentrophy if output is more than 2 class

#PART2 - Fitting CNN to image

#copied from keras documentation 

from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1./255)

training_set = train_datagen.flow_from_directory(
        '/Users/arunramji/Downloads/Sourcefiles/CNN_Imageclassification/Convolutional_Neural_Networks/dataset/training_set',
        target_size=(64, 64),
        batch_size=32,
        class_mode='binary')

test_set = test_datagen.flow_from_directory(
    '/Users/arunramji/Downloads/Sourcefiles/CNN_Imageclassification/Convolutional_Neural_Networks/dataset/test_set',
        target_size=(64, 64),
        batch_size=32,
        class_mode='binary')

classifier.fit_generator(
        training_set,
        steps_per_epoch=8000,   #number of input (image)
        epochs=25,
        validation_data=test_set,
        validation_steps=2000)          # number of training sample

 classifier.fit(
                 training_set,
        steps_per_epoch=8000,   #number of input (image)
        epochs=25,
        validation_data=test_set,
        validation_steps=2000)

Answer 1

您将steps_per_epoch设置为错误的值（这就是为什么它花费比必要的时间更长的原因，它没有设置为数据点的数量）。 steps_per_epoch应设置为数据集的大小除以批次大小，对于您的训练集，应为8000/32 = 250，对于验证集应为63。

Answer 2

更新：

[正如Matias在他的回答中指出的那样，您在steps_per_epoch方法中的fit参数设置导致了每个时期的巨大减慢。从fit_generator documentation：

steps_per_epoch：整数。总步骤数（一批样品）在声明一个纪元完成之前从发生器屈服，并且开始下一个时代。通常应等于ceil（num_samples / batch_size）对于序列是可选的：如果未指定，将使用len（generator）作为许多步骤。validation_steps：仅当validation_data是生成器时才相关。从中得出的步骤总数（样品批次）在每个纪元结束之前停止之前的validation_data生成器。通常应等于您的样本数量验证数据集除以批次大小。序列的可选：如果未指定，则将len（validation_data）用作多个步骤。

实际上Keras在处理这两个参数时存在不一致之处，因为如果您使用简单的dataset而不是datagenerator并设置类似fit的参数，则Valuerror方法会引发batch_size=batch_size, steps_per_epoch=num_samples：] >

ValueError: Number of samples 60000 is less than samples required for specified batch_size 200 and steps 60000

但是当数据来自datagenerator

时，它不会处理相同的问题，让您遇到像当前问题一样的问题。

我制作了一些示例代码来检查这些内容。

[fit和steps_per_epoch=num_samples的方法：

Number of samples: 60000 Number of samples per batch: 200 Train for 60000 steps, validate for 50 steps Epoch 1/5 263/60000 [..............................] - ETA: 4:07:09 - loss: 0.2882 - accuracy: 0.9116

带ETA

（估计时间）：4:07:09，

因为这是60000个步骤，每批次200个样品中的每个。

与fit相同的steps_per_epoch=num_samples // batch_size：

Number of samples: 60000 Number of samples per batch: 200 Train for 300 steps, validate for 50 steps Epoch 1/5 28/300 [=>............................] - ETA: 1:15 - loss: 1.0946 - accuracy: 0.6446

带有ETA

：1:15

解决方案：

steps_per_epoch=(training_set.shape[0] // batch_size)
validation_steps=(validation_set.shape[0] // batch_size)
关于性能的其他可能问题：

正如@SajanGohil在其注释train_datagen.flow_from_director中所写的，在实际的转换过程之前进行了一些任务，例如文件操作

，预处理，这有时会花费更多时间作为转换本身。

因此，避免这些额外的时间，您可以在整个转换过程仅单独一次

前执行预处理任务。然后，您可以在转换时使用这些预处理的数据。

无论如何，具有大量图像的CNN都是相当耗时和资源的任务，因此，它假定使用GPU。

Keras.fit_generator花费更多的时间

问题描述投票：2回答：2

2个回答

最新问题

Keras.fit_generator花费更多的时间

问题描述 投票：2回答：2

2个回答

最新问题

问题描述投票：2回答：2