我一直在努力下载 tesnorflow-gpu 库所需的所有必要驱动程序。我想使用 GPU 而不是 CPU 来编译我的模型。我正在使用 Linux Mint。这是我的neofetch
...-:::::-...
.-MMMMMMMMMMMMMMM-. -----------
.-MMMM`..-:::::::-..`MMMM-. OS: Linux Mint 21.3 x86_64
.:MMMM.:MMMMMMMMMMMMMMM:.MMMM:. Kernel: 5.15.0-101-generic
-MMM-M---MMMMMMMMMMMMMMMMMMM.MMM- Uptime: 2 hours, 33 mins
`:MMM:MM` :MMMM:....::-...-MMMM:MMM:` Packages: 3307 (dpkg), 13 (flatpak)
:MMM:MMM` :MM:` `` `` `:MMM:MMM: Shell: bash 5.1.16
.MMM.MMMM` :MM. -MM. .MM- `MMMM.MMM. Resolution: 1920x1080
:MMM:MMMM` :MM. -MM- .MM: `MMMM-MMM: DE: Cinnamon
:MMM:MMMM` :MM. -MM- .MM: `MMMM:MMM: WM: Mutter (Muffin)
:MMM:MMMM` :MM. -MM- .MM: `MMMM-MMM: WM Theme: WhiteSur-Dark (Sweet-Dark-v40)
.MMM.MMMM` :MM:--:MM:--:MM: `MMMM.MMM. Theme: Sweet-Dark-v40 [GTK2/3]
:MMM:MMM- `-MMMMMMMMMMMM-` -MMM-MMM: Icons: candy-icons [GTK2/3]
:MMM:MMM:` `:MMM:MMM: Terminal: gnome-terminal
.MMM.MMMM:--------------:MMMM.MMM. CPU: Intel i5-3570 (4) @ 3.800GHz
'-MMMM.-MMMMMMMMMMMMMMM-.MMMM-' GPU: NVIDIA GeForce GTX 1060 6GB
'.-MMMM``--:::::--``MMMM-.' Memory: 2070MiB / 7883MiB
'-MMMMMMMMMMMMM-'
``-:::::-``
这是我的 nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce GTX 1060 6GB Off | 00000000:01:00.0 On | N/A |
| 25% 40C P8 7W / 120W | 316MiB / 6144MiB | 11% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1142 G /usr/lib/xorg/Xorg 148MiB |
| 0 N/A N/A 1881 G cinnamon 45MiB |
| 0 N/A N/A 9746 G /app/extra/viber/Viber 27MiB |
| 0 N/A N/A 14935 G ...seed-version=20240322-165906.502000 90MiB |
+-----------------------------------------------------------------------------------------+
我还安装了tensorrt、cuDNN 和tensorflow-gpu。大多数安装都是使用 pip 进行的。这是我的张量版本
import tensorrt
print(tensorrt.__version__)
8.6.1
assert tensorrt.Builder(tensorrt.Logger())
我收到的错误如下:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-03-25 12:49:24.151959: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-25 12:49:24.939265: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-25 12:49:24.973806: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]
我不确定冲突是由于版本不匹配还是由于路径引起的,但是当我回显路径时,我得到 回显$LD_LIBRARY_PATH :/home/vuk/miniconda3/lib/python3.1/site-packages/tensorrt_libs
这已经困扰我几个月了......
我多次尝试安装和卸载库、配置路径、通过docker安装tensorflow gpu,但到目前为止没有任何效果。问题可能在于我正在使用的库不匹配,但我不确定......
我通过为我的 conda 环境创建两个 bash 脚本来解决这个问题。在 Conda 环境目录中,导航到 etc/conda/activate.d 和 etc/conda/deactivate.d 目录。如果这些目录不存在,您可以创建它们。然后,在两个目录中创建一个脚本文件(例如 set_env_vars.sh)。
第一个是 activate.d ,如下所示:
#!/bin/sh
export NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))
export LD_LIBRARY_PATH=$(echo ${NVIDIA_DIR}/*/lib/ | sed -r 's/\s+/:/g')${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
第二个是deactivate.d,其中包含:
#!/bin/sh
unset NVIDIA_DIR
unset LD_LIBRARY_PATH
然后我给两者添加了执行权限
chmod +x /path/to/your/conda/env/etc/conda/activate.d/set_env_vars.sh
chmod +x /path/to/your/conda/env/etc/conda/deactivate.d/set_env_vars.sh