我创建了一个conda环境并安装了tensorflow,如下所示:
conda create -n foo python=3.10
conda activate foo
conda install mamba
mamba install tensorflow -c conda-forge
mamba install cudnn cudatoolkit
这安装了 TensorFlow 2.10.0。我已经安装了 CUDA 11.2 和 cuDNN 8.1,然后尝试运行以下命令:
import tensorflow as tf
print(f"GPUs available: {tf.config.list_physical_devices('GPU')}")
但它只返回一个空列表。我想将 3060ti 用于我的 ML 项目,但 TensorFlow 未检测到它。我发现了与我类似的问题,例如 this、this 和 this,但他们使用旧版本的 TensorFlow,它将安装
tensorflow-gpu
并且不再受支持。我该如何解决这个问题,或者尝试解决它。
我使用的是 Windows 10 机器
nvidia-smi
的输出:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 528.24 Driver Version: 528.24 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... WDDM | 00000000:09:00.0 On | N/A |
| 30% 43C P8 16W / 200W | 809MiB / 8192MiB | 3% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 7176 C+G ...perience\NVIDIA Share.exe N/A |
| 0 N/A N/A 9240 C+G C:\Windows\explorer.exe N/A |
| 0 N/A N/A 12936 C+G ...cw5n1h2txyewy\LockApp.exe N/A |
| 0 N/A N/A 13652 C+G ...e\PhoneExperienceHost.exe N/A |
| 0 N/A N/A 14020 C+G ...2txyewy\TextInputHost.exe N/A |
| 0 N/A N/A 14888 C+G ...ser\Application\brave.exe N/A |
| 0 N/A N/A 15112 C+G ...5n1h2txyewy\SearchApp.exe N/A |
| 0 N/A N/A 16516 C+G ...oft OneDrive\OneDrive.exe N/A |
| 0 N/A N/A 18296 C+G ...aming\Spotify\Spotify.exe N/A |
| 0 N/A N/A 18624 C+G ...in7x64\steamwebhelper.exe N/A |
| 0 N/A N/A 18672 C+G ...\app-1.0.9010\Discord.exe N/A |
| 0 N/A N/A 18828 C+G ...lPanel\SystemSettings.exe N/A |
| 0 N/A N/A 19284 C+G ...Central\Razer Central.exe N/A |
| 0 N/A N/A 20020 C+G ...arp.BrowserSubprocess.exe N/A |
| 0 N/A N/A 22912 C+G ...8wekyb3d8bbwe\Cortana.exe N/A |
| 0 N/A N/A 24848 C+G ...ontend\Docker Desktop.exe N/A |
| 0 N/A N/A 25804 C+G ...y\ShellExperienceHost.exe N/A |
| 0 N/A N/A 27064 C+G ...8bbwe\WindowsTerminal.exe N/A |
+-----------------------------------------------------------------------------+
nvcc -V
的输出:
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_22:08:44_Pacific_Standard_Time_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
我运行了这样的虚拟代码:
import tensorflow as tf
import numpy as np
def make_nn():
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(1, input_shape=(1,)))
model.compile(loss='mean_squared_error', optimizer='sgd')
return model
def dataset():
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
return tf.data.Dataset.from_tensor_slices((x, y)).batch(1)
def main():
model = make_nn()
model.fit(dataset(), epochs=1, steps_per_epoch=9)
if __name__ == '__main__':
print(f"GPUs available: {tf.config.list_physical_devices('GPU')}")
print(f"Built with cuda: {tf.test.is_built_with_cuda()}")
main()
它给了我以下日志:
GPUs available: []
Built with cuda: False
2023-02-06 09:47:32.744450: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-06 09:47:32.779280: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
看起来它正在使用CPU构建
可能不是最好的解决方案,但我将 TensorFlow 降级回之前安装的 2.6.0 版本并且可以正常工作,这很糟糕,我想尝试一些更新的功能,但目前看来这就足够了。如果有人面临同样的问题,这是我当前使用的 conda 环境
感谢@Corralien,我对 Areias 也有同样的问题,但我通过从 NVIDIA 网站下载正确版本的 cudnn 解决了我的问题。之前不知道Win11和conda虚拟环境都需要安装cudnn
如果您使用conda-forge,您可能需要设置环境变量 CONDA_OVERRIDE_CUDA 强制安装支持 GPU 的 Tensorflow 版本,如下所述https://conda-forge.org/docs/user/tipsandtricks.html#installing-cuda-enabled-packages-like-tensorflow-and-pytorch。在 bash 下,会是这样的
CONDA_OVERRIDE_CUDA="11.2" conda install "tensorflow==2.8" -c conda-forge