Tensorflow无法注册CUDNN处理程序

问题描述 投票:2回答:1

我不确定这是否是此问题的正确stackexchange,但这里有。

我已经安装了最新的CUDA驱动程序和Tensorflow 1.14,但是当我尝试训练卷积层时,Tensorflow说它找不到实现,因为它无法创建cudnn处理程序。我不确定该怎么办。

Tensorflow错误。

2019-11-29 22:54:16.276690: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-29 22:54:16.321772: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3408000000 Hz
2019-11-29 22:54:16.322826: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b7b1203d70 executing computations on platform Host. Devices:
2019-11-29 22:54:16.322992: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-11-29 22:54:16.327949: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-11-29 22:54:17.028426: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning
NUMA node zero
2019-11-29 22:54:17.029075: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b7b1993230 executing computations on platform CUDA. Devices:
2019-11-29 22:54:17.029093: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce RTX 2080, Compute Capability 7.5
2019-11-29 22:54:17.030709: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning
NUMA node zero
2019-11-29 22:54:17.031088: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.785
pciBusID: 0000:01:00.0
2019-11-29 22:54:17.031304: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-11-29 22:54:17.032299: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-11-29 22:54:17.033255: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-11-29 22:54:17.033855: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-11-29 22:54:17.035048: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-11-29 22:54:17.036049: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-11-29 22:54:17.038877: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-11-29 22:54:17.039240: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning
NUMA node zero
2019-11-29 22:54:17.039666: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning
NUMA node zero
2019-11-29 22:54:17.040184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-11-29 22:54:17.040583: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-11-29 22:54:17.042475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-29 22:54:17.042517: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0
2019-11-29 22:54:17.042681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N
2019-11-29 22:54:17.043278: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning
NUMA node zero
2019-11-29 22:54:17.043675: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning
NUMA node zero
2019-11-29 22:54:17.044663: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7466 MB memory) -> physical GPU (device: 0,
name: GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5)
Epoch 1/3
2019-11-29 22:54:18.565862: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-11-29 22:54:18.776295: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0x55b7b1d9a810
2019-11-29 22:54:18.776389: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-11-29 22:54:19.028300: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-11-29 22:54:19.037596: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-11-29 22:54:19.678671: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-11-29 22:54:19.685432: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
  File "mnist_example.py", line 268, in <module>
    res_dict_interpolated = run_model(build_interpolated, "Interpolated", verbose)
  File "mnist_example.py", line 216, in run_model
    validation_data=(x_test, y_test))
  File "/home/kasperfred/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 780, in fit
    steps_name='steps_per_epoch')
  File "/home/kasperfred/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 363, in model_iteration
    batch_outs = f(ins_batch)
  File "/home/kasperfred/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/backend.py", line 3292, in __call__
    run_metadata=self.run_metadata)
  File "/home/kasperfred/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1458, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
         [[{{node interpolated_conv2d/Conv2D}}]]
  (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
         [[{{node interpolated_conv2d/Conv2D}}]]
         [[loss/mul/_73]]
0 successful operations.
0 derived errors ignored.

nvidia-smi输出

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21       Driver Version: 435.21       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2080    Off  | 00000000:01:00.0 Off |                  N/A |
| 38%   41C    P0    N/A /  N/A |      0MiB /  7979MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

cat /usr/local/cuda-10.0/include/cudnn.h | grep CUDNN_MAJOR -A 2的输出

#define CUDNN_MAJOR 7
#define CUDNN_MINOR 4
#define CUDNN_PATCHLEVEL 2
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"

tensorflow cudnn
1个回答
1
投票

[TF 1.14中曾经有some problem,可以通过设置来解决

config.gpu_options.allow_growth = True

或对于TF 2.0:

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)
© www.soinside.com 2019 - 2024. All rights reserved.