OpenAI Whisper 允许我在命令行上使用 cpu 设备，但在解释器中强制使用 cuda 并失败

Question

我可以成功使用whisper cli 转录音频wav 文件。我使用命令：

whisper --language en --model tiny --device cpu .tmp/audio/chunk1.wav

位于此处，并使用 python 3.11：

dev@host ~/Development $ whereis whisper
whisper: /home/dev/Development/whispervm/.direnv/python-3.11/bin/whisper

然后我创建一个脚本，理论上应该做完全相同的事情，但它识别我的 nvidia 卡，尝试使用 cuda，即使我明确声明我想使用“cpu”设备，它也会失败。

#!/usr/bin/env python

import whisper

# whisper has multiple models that you can load as per size and requirements
model = whisper.load_model("tiny").to("cpu")

# path to the audio file you want to transcribe
PATH = ".tmp/audio/chunk1.wav"

result = model.transcribe(PATH, fp16=False)
print(result["text"])

输出是这样的：

   Found GPU0 Quadro K4000 which is of cuda capability 3.0.
    PyTorch no longer supports this GPU because it is too old.
    The minimum cuda capability supported by this library is 3.7.
    
  warnings.warn(old_gpu_warn % (d, name, major, minor, min_arch // 10, min_arch % 10))
Traceback (most recent call last):
  File "/home/dev/Development/whisper/test.py", line 2, in <module>
    model = whisper.load_model("tiny").to("cpu")

            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dev/Development/whispervm/.direnv/python-3.11/lib/python3.11/site-packages/whisper/__init__.py", line 149, in load_model
    model.load_state_dict(checkpoint["model_state_dict"])
  File "/home/dev/Development/whispervm/.direnv/python-3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Whisper:
        While copying the parameter named "encoder.blocks.0.attn.query.weight", whose dimensions in the model are torch.Size([384, 384]) and whose dimensions in the checkpoint are torch.Size([384, 384]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n',).
        While copying the parameter named "encoder.blocks.0.attn.key.weight", whose dimensions in the model are torch.Size([384, 384]) and whose dimensions in the checkpoint are torch.Size([384, 384]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n',).

还有更多参数错误。它不转录。我认为这可能是一个错误。

tl;dr：Whisper 在 cli 上转录时不会在 cpu 上转录为 python 脚本

编辑：已安装的 pip 软件包列表

Package                  Version
------------------------ ----------
bcrypt                   4.0.1
certifi                  2023.7.22
cffi                     1.16.0
charset-normalizer       3.3.0
cmake                    3.27.6
cryptography             41.0.4
decorator                5.1.1
Deprecated               1.2.14
fabric                   3.2.2
filelock                 3.12.4
idna                     3.4
invoke                   2.2.0
Jinja2                   3.1.2
lit                      17.0.2
llvmlite                 0.41.0
MarkupSafe               2.1.3
more-itertools           10.1.0
mpmath                   1.3.0
networkx                 3.1
numba                    0.58.0
numpy                    1.25.2
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-cupti-cu11   11.7.101
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
nvidia-cufft-cu11        10.9.0.58
nvidia-curand-cu11       10.2.10.91
nvidia-cusolver-cu11     11.4.0.1
nvidia-cusparse-cu11     11.7.4.91
nvidia-nccl-cu11         2.14.3
nvidia-nvtx-cu11         11.7.91
openai-whisper           20230918
paramiko                 3.3.1
pip                      23.2.1
pycparser                2.21
pydub                    0.25.1
PyNaCl                   1.5.0
regex                    2023.10.3
requests                 2.31.0
setuptools               68.1.2
sympy                    1.12
tiktoken                 0.3.3
torch                    2.0.1
tqdm                     4.66.1
triton                   2.0.0
typing_extensions        4.8.0
urllib3                  2.0.6
wheel                    0.41.2
wrapt                    1.15.0

Answer 1

发现错误。

我查看了源代码，似乎我需要在

load_model()

函数调用中传递设备，而不是我在博客上阅读的内容。所以正确的脚本如下所示：

import whisper

audio_file = "/home/dev/Development/whispervm/.tmp/audio/chunk1.wav"
audio = whisper.load_audio(audio_file)

model = whisper.load_model("tiny", device='cpu')
result = model.transcribe(audio)

print(result["text"])

我读到，如果你不指定设备，它应该默认为 cpu。如果 cuda 可以并且被检测到，它就会默认为 cuda，并且当您的卡对于最新版本来说太旧时，它会失败。

OpenAI Whisper 允许我在命令行上使用 cpu 设备，但在解释器中强制使用 cuda 并失败

问题描述投票：0回答：1

1个回答

最新问题

OpenAI Whisper 允许我在命令行上使用 cpu 设备，但在解释器中强制使用 cuda 并失败

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1