使用tensorflow在CPU和GPU上高效实现推理

Question

我希望我的模型在 CPU 或 GPU 上运行，这取决于我为我的程序提供的参数这是它的一段代码

if flag == 1:
        with tf.device('/GPU:0'):
            start_time = time.time()
            result = test_model.predict(prepare_dataset.test_dataset)
            end_time = time.time()
            duration = end_time - start_time
    else:
        with tf.device('/CPU:0'):
            start_time = time.time()
            result = test_model.predict(prepare_dataset.test_dataset)
            end_time = time.time()
            duration = end_time - start_time

现在，根据“标志”值，模型在 cpu 或 gpu 上运行 'flag' 的值在每个时期都会发生变化然而，当我检查推理时间时，CPU 明显优于 GPU

我认为这背后的原因是缓存因为我不断地改变运行模型的设备（cpu、gpu），所以 cpu 或 gpu 上的缓存内存被冲走了，我不希望它发生

有什么办法可以在运行程序的时候把模型参数放到各个设备的缓存上吗？

我的意思是，有没有办法将所有模型参数移动到 cpu 和 gpu 以加快推理速度？

我检查了很多关于它的问题，但找不到 tensorflow 2 的解决方案

谢谢

编辑

我是 SO 的新手，很抱歉让你感到困惑

1。模型结构

def build_model():
# set input shape
IMG_SIZE = (160, 160)
IMG_SHAPE = IMG_SIZE + (3,)

# define core model and additional layers
rescale = tf.keras.layers.Rescaling(1./127.5, offset=-1) # rescaling for mobilenet usage
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
                                           include_top=False,
                                           weights='imagenet')

# turn off 'trainable' of base_model (MobileNetv2)
base_model.trainable = False

# declare additional layers
global_average_layer = tf.keras.layers.GlobalAveragePooling2D()
prediction_layer = tf.keras.layers.Dense(1)

# define complete model structure
inputs = tf.keras.Input(shape=(160, 160, 3))
x = rescale(inputs)
x = base_model(x, training=False)
x = global_average_layer(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = prediction_layer(x)
model = tf.keras.Model(inputs, outputs)

# set running rate and compile
base_learning_rate = 0.0001
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=base_learning_rate),
            loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
            metrics=['accuracy'])

# return the model
return model

我用 mobilenetv2 做了迁移学习

2。数据集

我使用了猫狗数据集

URL = 'https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip'

3。实际推理代码

# create infinitely iteratable dataset
test_data = prepare_dataset.test_dataset
test_data_inf = test_data.repeat()

# counter variable for iteration management
count = 0

# list for inference time
gpu_inftime = []
cpu_inftime = []

for images, _ in test_data_inf:
    for image in tf.data.Dataset.from_tensor_slices(images).take(1):
        image = tf.expand_dims(image, axis=0)
        if count < 500:
            if count == 0:
                print("gpu inference start!")
            if count % 100 == 0:
                print(f"count : {count}")
            with tf.device('/GPU:0'):
                start_time = time.time()
                result = test_model.predict(image, verbose=0)
                end_time = time.time()
                duration = end_time - start_time
                gpu_inftime.append(duration)
                count += 1
        elif (count >= 500 and count < 1000):
            if count == 500:
                print("cpu inference start!")
            if count % 100 == 0:
                print(f'count : {count}')
            with tf.device('/CPU:0'):
                start_time = time.time()
                result = test_model.predict(image, verbose=0)
                end_time = time.time()
                duration = end_time - start_time
                cpu_inftime.append(duration)
                count += 1
        else:
            break
        
    if count == 1000:
        break
        
avg_gpu_inftime = round(sum(gpu_inftime) / len(gpu_inftime), 3)
avg_cpu_inftime = round(sum(cpu_inftime) / len(cpu_inftime), 3)

print(f"gpu avf inference time : {avg_gpu_inftime}")
print(f"cpu avg inference time : {avg_cpu_inftime}")

和结果 gpu avf 推理时间：0.034 CPU 平均推理时间：0.035

+) 我说过 CPU 仅凭一张图像就明显优于 GPU，抱歉造成混淆。但是，它们的推理时间仍然相似。我认为 GPU 的性能应该比 CPU 好得多，因为 MobileNet 是基于密集的卷积运算

使用tensorflow在CPU和GPU上高效实现推理

问题描述投票：0回答：0

编辑

1。模型结构

2。数据集

3。实际推理代码

最新问题

使用tensorflow在CPU和GPU上高效实现推理

问题描述 投票：0回答：0

编辑

1。模型结构

2。数据集

3。实际推理代码

最新问题

问题描述投票：0回答：0