Tensorflow GPU CUDA 无法加载动态库“libcufft.so.10”;错误

问题描述 投票:0回答:1

我担心这会被标记为重复,但我找到了带有

libcudart
libcublas
的示例,但没有
libcufft
(这是我的问题)。

我安装了 TensorFlow,并且想使用 GPU。因此,我在 this 链接上运行脚本。

运行 TensorFlow 来训练网络时,我收到以下消息:

2021-09-23 11:19:22.158959: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-23 11:19:22.162563: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2021-09-23 11:19:22.162651: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2021-09-23 11:19:22.162730: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2021-09-23 11:19:22.162806: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2021-09-23 11:19:22.162989: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1835] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-09-23 11:19:22.163345: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

使用

tf.config.list_physical_devices()
我得到:

2021-09-23 11:30:18.327648: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-23 11:30:18.329447: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/extras/CUPTI/lib64
2021-09-23 11:30:18.329510: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/extras/CUPTI/lib64
2021-09-23 11:30:18.329573: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/extras/CUPTI/lib64
2021-09-23 11:30:18.329687: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/extras/CUPTI/lib64
2021-09-23 11:30:18.329814: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1835] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]

我有一个名为

/usr/local/cuda-11.0
的文件夹,但不是单独的
cuda
,其中也没有
extras
文件夹。 确实,它说适用于 Ubuntu 18.04,而我有 Ubuntu 20.04。

如果我尝试按照建议运行

sudo apt install nvidia-cuda-toolkit
这里我得到:

Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 nvidia-cuda-toolkit : Depends: nvidia-cuda-dev (= 10.1.243-3) but it is not going to be installed
                       Recommends: nsight-compute (= 10.1.243-3)
                       Recommends: nsight-systems (= 10.1.243-3)
E: Unable to correct problems, you have held broken packages.

whereis cuda
的输出是
cuda:
(空)。

nvidia-smi
的输出:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   40C    P8    31W / 300W |    626MiB / 11016MiB |     15%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1141      G   /usr/lib/xorg/Xorg                 59MiB |
|    0   N/A  N/A      1749      G   /usr/lib/xorg/Xorg                315MiB |
|    0   N/A  N/A      1886      G   /usr/bin/gnome-shell               59MiB |
|    0   N/A  N/A      1907      G   ...mviewer/tv_bin/TeamViewer        2MiB |
|    0   N/A  N/A      2463      G   ...ble-features=SpareRendere        4MiB |
|    0   N/A  N/A      3825      G   ...AAAAAAAAA= --shared-files      105MiB |
|    0   N/A  N/A      4682      G   .../debug.log --shared-files       36MiB |
|    0   N/A  N/A     20600      G   ...AAAAAAAAA= --shared-files       24MiB |
+-----------------------------------------------------------------------------+

我害怕安装东西来解决这个问题,并以 20 个版本的 CUDA 相互冲突的典型情况结束。

tensorflow cuda
1个回答
1
投票

所以我按照评论中的建议做了,并以非常积极的方式卸载所有内容:

sudo apt clean
sudo apt update
sudo apt purge cuda
sudo apt purge nvidia-* 
sudo apt autoremove

然后我按照说明进行安装:

  • CUDA
  • CUDA Toolkit(虽然我觉得是一样的,只是加了一个命令
    sudo apt-get install nvidia-gds
    ,我也不知道有没有必要)
  • CUDNN

现在看来有效了。

© www.soinside.com 2019 - 2024. All rights reserved.