在 Windows 10 下的 WSL2 中为 Tensorflow 启用 GPU 陷入困境

Question

我无法让 Tensorflow 2 在 WSL2 下使用我的 GPU。我知道这个问题，但 GPU 支持现在（据说）不再是实验性的。

Windows 为所需的 21H2 版本，应支持 WSL2 GPU 连接。

Windows 10 Pro, 21H2, build 19044.1706

PC 有两个 GPU：

GPU 0: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-19c8549a-4b8d-5d70-456b-776ceece4b0f)
GPU 1: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-2a946756-0472-fb90-f1a4-b40cce1bba4f)

我前段时间在WSL2下安装了Ubuntu：

PS C:\Users\jem-m> wsl --status
Default Distribution: Ubuntu-20.04
Default Version: 2
...
Kernel version: 5.10.16

在 Windows PowerShell 中，我可以运行

nvidia-smi.exe

，这给了我

PS C:\Users\jem-m> nvidia-smi.exe
Mon May 16 18:13:27 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 512.77       Driver Version: 512.77       CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ... WDDM  | 00000000:08:00.0  On |                  N/A |
| 23%   31C    P8    10W / 250W |    753MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ... WDDM  | 00000000:41:00.0 Off |                  N/A |
| 23%   31C    P8    12W / 250W |    753MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

而 WSL2 Ubuntu shell 中的

nvidia-smi

给出

(testenv) jem-mosig:~/ $ nvidia-smi                                   [17:48:30]
Mon May 16 17:49:53 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.68.02    Driver Version: 512.77       CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:08:00.0  On |                  N/A |
| 23%   34C    P8    10W / 250W |    784MiB / 11264MiB |      8%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:41:00.0 Off |                  N/A |
| 23%   34C    P8    13W / 250W |    784MiB / 11264MiB |     12%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

注意驱动程序和 CUDA 版本相同，但 NVIDIA-SMI 版本不同。

这似乎表明 CUDA 按照预期在 WSL2 下工作。但当我跑步时

import tensorflow as tf

print(tf.config.list_physical_devices('GPU'))

# 2022-05-17 12:13:05.016328: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
# []

在 WSL2 内的 python 中，我得到

[]

，因此 Tensorflow 无法识别任何 GPU。这是在 Ubuntu WSL2 内的新 Miniconda 环境中新安装的 Python 3.8.0 和 Tensorflow 2.4.1。我不知道出了什么问题。有什么建议吗？

附录

导入 Tensorflow 时没有收到任何错误消息。但在使用它时会产生一些警告。例如，当我跑步时

import tensorflow as tf

print(tf.__version__)
model = tf.keras.Sequential([tf.keras.layers.Dense(3)])
model.compile(loss="mse")
print(model.predict([[0.]]))

我明白了

2.4.1
2022-05-17 10:38:28.792209: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-05-17 10:38:28.792411: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-05-17 10:38:28.794356: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2022-05-17 10:38:28.853557: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2022-05-17 10:38:28.860126: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3792975000 Hz
[[0. 0. 0.]]

不过，这些似乎与 GPU 无关。

Answer 1

博士。史努比让我走上了正轨：尽管

TF 网站这么说

TensorFlow pip 软件包包括对启用 CUDA® 的卡的 GPU 支持

，我仍然需要运行

conda install tensorflow-gpu

，它成功了！现在

import tensorflow as tf
from tensorflow.python.client import device_lib

print("devices: ", [d.name for d in device_lib.list_local_devices()])
print("GPUs:    ", tf.config.list_physical_devices('GPU'))
print("TF v.:   ", tf.__version__)

提供大量调试消息并且

devices:  ['/device:CPU:0', '/device:GPU:0', '/device:GPU:1']
GPUs:     [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]
TF v.:    2.4.1

Answer 2

如tensorflow网站上所指定：

https://www.tensorflow.org/install/pip#windows-wsl2

python3 -m pip install tensorflow[and-cuda]
# Verify the installation:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

[and-cuda]

将安装当前的cuda runtine（截至2023年11月6日，即nvidia-cuda-runtime-cu11==11.8.89），因此如果您想尝试手动单独安装，我认为这会起作用没有测试过。

其他人请注意，就像您所说，您必须拥有 2021 年 11 月 (21H2/19044) 或 Windows 11 之后的 Windows 10 版本。

我强烈建议不要使用另一个答案中指定的 conda 包，因为在 2.10 中删除了对 windows-native gpu 使用的官方支持，因此这就是 tensorflow-gpu 的目标版本

在 Windows 10 下的 WSL2 中为 Tensorflow 启用 GPU 陷入困境

问题描述投票：0回答：2

2个回答

最新问题

在 Windows 10 下的 WSL2 中为 Tensorflow 启用 GPU 陷入困境

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2