Tenserflow在启用GPU的情况下运行推理时挂起。

Question

我是AI和TensorFlow的新手，我正在尝试在windows上使用TensorFlow对象检测API。我目前的目标是在视频流中进行实时的人体检测。为此，我修改了TensorFlow Model Garden中的一个python例子(https:/github.comtensorflowmodels。). 目前，它检测到了所有的对象（不仅仅是人），并使用 opencv 显示了边界框。

当我禁用 GPU 时，它工作得很好 (os.environ["CUDA_VISIBLE_DEVICES"] = "-1")，但当我启用 GPU 并启动脚本时，它在第一帧就挂掉了。

输出。

2020-04-22 16:00:53.597492: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-22 16:00:56.942141: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-04-22 16:00:56.976635: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 960M computeCapability: 5.0
coreClock: 1.176GHz coreCount: 5 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 74.65GiB/s
2020-04-22 16:00:56.989129: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-22 16:00:57.000622: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-22 16:00:57.012247: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-22 16:00:57.020575: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-22 16:00:57.031536: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-22 16:00:57.042564: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-22 16:00:57.066289: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-22 16:00:57.075760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-04-22 16:00:59.239211: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-04-22 16:00:59.256577: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1f3f73cd670 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-04-22 16:00:59.264241: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-04-22 16:00:59.272280: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 960M computeCapability: 5.0
coreClock: 1.176GHz coreCount: 5 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 74.65GiB/s
2020-04-22 16:00:59.281409: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-22 16:00:59.288204: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-22 16:00:59.293112: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-22 16:00:59.298222: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-22 16:00:59.305446: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-22 16:00:59.310590: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-22 16:00:59.316250: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-22 16:00:59.324588: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-04-22 16:01:00.831569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-22 16:01:00.839147: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0
2020-04-22 16:01:00.842279: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N
2020-04-22 16:01:00.846140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1024 MB memory) -> physical GPU (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0)
2020-04-22 16:01:00.865546: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1f39174cba0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-04-22 16:01:00.873656: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 960M, Compute Capability 5.0
[<tf.Tensor 'image_tensor:0' shape=(None, None, None, 3) dtype=uint8>]
2020-04-22 16:01:10.876733: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-22 16:01:11.814909: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
2020-04-22 16:01:11.852909: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-22 16:01:12.149312: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.04GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.179484: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.04GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.209036: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.237205: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.05GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.266147: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.09GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.295182: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.08GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.325645: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.15GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.357550: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.15GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.405332: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.14GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.436336: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.27GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

这是我使用的代码

#!/usr/bin/env python
# coding: utf-8

import os
import pathlib

if "models" in pathlib.Path.cwd().parts:
  while "models" in pathlib.Path.cwd().parts:
    os.chdir('..')

import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile

from collections import defaultdict
from io import StringIO
from PIL import Image
from IPython.display import display

import cv2 
cap = cv2.VideoCapture(1)

from object_detection.utils import ops as utils_ops
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util

# patch tf1 into `utils.ops`
utils_ops.tf = tf.compat.v1

# Patch the location of gfile
tf.gfile = tf.io.gfile

# os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

def load_model(model_name):
  base_url = 'http://download.tensorflow.org/models/object_detection/'
  model_file = model_name + '.tar.gz'
  model_dir = tf.keras.utils.get_file(
    fname=model_name, 
    origin=base_url + model_file,
    untar=True)

  model_dir = pathlib.Path(model_dir)/"saved_model"

  model = tf.saved_model.load(str(model_dir))
  model = model.signatures['serving_default']

  return model

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = 'models/research/object_detection/data/mscoco_label_map.pbtxt'
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)

model_name = 'ssd_mobilenet_v1_coco_2017_11_17'
# model_name= 'faster_rcnn_inception_v2_coco_2017_11_08';
detection_model = load_model(model_name)

print(detection_model.inputs)

detection_model.output_dtypes
detection_model.output_shapes

def run_inference_for_single_image(model, image):
    image = np.asarray(image)
    # The input needs to be a tensor, convert it using `tf.convert_to_tensor`.
    input_tensor = tf.convert_to_tensor(image)
    # The model expects a batch of images, so add an axis with `tf.newaxis`.
    input_tensor = input_tensor[tf.newaxis,...]

    # Run inference (it hangs here)
    output_dict = model(input_tensor)

    # All outputs are batches tensors.
    # Convert to numpy arrays, and take index [0] to remove the batch dimension.
    # We're only interested in the first num_detections.
    num_detections = int(output_dict.pop('num_detections'))
    output_dict = {key:value[0, :num_detections].numpy() 
                 for key,value in output_dict.items()}
    output_dict['num_detections'] = num_detections

    # detection_classes should be ints.
    output_dict['detection_classes'] = output_dict['detection_classes'].astype(np.int64)

    # Handle models with masks:
    if 'detection_masks' in output_dict:
        # Reframe the the bbox mask to the image size.
        detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(output_dict['detection_masks'], output_dict['detection_boxes'],image.shape[0], image.shape[1])      
        detection_masks_reframed = tf.cast(detection_masks_reframed > 0.5,tf.uint8)
        output_dict['detection_masks_reframed'] = detection_masks_reframed.numpy()

    return output_dict

def show_inference(model):
    # the array based representation of the image will be used later in order to prepare the
    # result image with boxes and labels on it.
    ret, image_np = cap.read()

    #percent by which the image is resized
    #scale_percent = 30

    #calculate the 50 percent of original dimensions
    #width = int(image_np.shape[1] * scale_percent / 100)
    #height = int(image_np.shape[0] * scale_percent / 100)

    # dsize
    #dsize = (width, height)

    # resize image
    #image_np = cv2.resize(image_np, dsize)

    # Actual detection.
    output_dict = run_inference_for_single_image(model, image_np)

    # Visualization of the results of a detection.
    vis_util.visualize_boxes_and_labels_on_image_array(
      image_np,
      output_dict['detection_boxes'],
      output_dict['detection_classes'],
      output_dict['detection_scores'],
      category_index,
      instance_masks=output_dict.get('detection_masks_reframed', None),
      use_normalized_coordinates=True,
      line_thickness=8)

    cv2.imshow('object detection', cv2.resize(image_np, (800,600)))

while True:
  show_inference(detection_model)
  if cv2.waitKey(25) & 0xFF == ord('q'):
    cv2.destroyAllWindows()
    break

我安装了以下版本 Python: 3.7 64位 Tensorflow: 2.2.0-rc3 Cuda: 10.1 cudnn 7.6.5.32

我在2台不同的机器上试了一下。机器1： - CPU：i7-6700HQ - RAM：16 GB - GPU：NVIDIA GeForce GTX 960M Machine： NVIDIA GeForce GTX 960M 机器2: - CPU：i5-6400 - RAM：16 GB - GPU：NVIDIA GeForce GTX 960M。NVIDIA GeForce GTX 960 我不知道如何调试。我在两台不同的机器上试过同样的代码，结果几乎一样。唯一不同的是它挂起的时间。机器1会立即挂起，机器2大概需要30秒。机器2能够处理视频和检测对象，直到挂起。

我研究了'Allocator (GPU_0_bfc) ran out of memory'的警告。我尝试了一些选项，限制可用的GPU内存大小，但这并没有帮助。

也有多个帖子建议减少批次大小。我的解释是，这只对训练自己的模型有帮助。因为我使用的是预先训练好的模型，所以这并不适用。

我还尝试使用不同的模型：ssd_mobilenet_v1_coco_2017_11_17 和 faster_rcnn_inception_v2_coco_2017_11_08。这两种模式的结果都是一样的。

最后我尝试的是在处理图像之前减小图像大小。这也没有帮助。

任何帮助将是非常感激

更新我也在RTX2070超级GPU上试了一下。没有出现内存分配的警告。这也是无法完成一次推理。为了完整起见，这是控制台的输出[在运行推理之前，会打印'推理开始'的文字。如果推理会完成，会打印'推理结束']。

2020-04-24 11:30:16.579805: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-24 11:30:18.916146: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-04-24 11:30:18.941805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.785GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2020-04-24 11:30:18.946134: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-24 11:30:18.951172: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-24 11:30:18.954809: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-24 11:30:18.957258: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-24 11:30:18.961662: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-24 11:30:18.965553: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-24 11:30:18.978671: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-24 11:30:18.980998: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-24 11:30:18.982226: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-04-24 11:30:18.984167: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.785GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2020-04-24 11:30:18.987291: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-24 11:30:18.988809: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-24 11:30:18.990303: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-24 11:30:18.991792: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-24 11:30:18.993320: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-24 11:30:18.996960: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-24 11:30:18.998497: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-24 11:30:19.000191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-24 11:30:19.430864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-24 11:30:19.433076: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2020-04-24 11:30:19.434566: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2020-04-24 11:30:19.436400: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6281 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
[<tf.Tensor 'image_tensor:0' shape=(None, None, None, 3) dtype=uint8>]
inference start
2020-04-24 11:30:24.728554: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-24 11:30:25.608426: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation. This message will be only logged once.
2020-04-24 11:30:25.625904: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll

更新2 当禁用Eager模式时，一切都运行得很好（甚至在GPU上），但随后我无法检索发现的对象。接下来我尝试了用会话来运行它（我想就像TensorFlow 1一样）。这里的函数session.run()在GPU上无限期地阻塞。而在CPU上又能正常工作。

Answer 1

如果你使用的是GPU，请尝试安装tensorflow-gpu。你正在使用的tensorflow似乎基于文档支持GPU，但你可以尝试并指定是隐式的。先在python虚拟环境中试试。

    pip uninstall tensorflow

uninstall tensorflow-gpu: (即使你不确定是否安装了它，也要确保运行这个命令)

    pip uninstall tensorflow-gpu

安装特定的 tensorflow-gpu 版本。

    pip install tensorflow-gpu==2.0.0

Tenserflow在启用GPU的情况下运行推理时挂起。

问题描述投票：0回答：1

1个回答

最新问题

Tenserflow在启用GPU的情况下运行推理时挂起。

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1