我可以成功使用whisper cli 转录音频wav 文件。我使用命令:
whisper --language en --model tiny --device cpu .tmp/audio/chunk1.wav
位于此处,并使用 python 3.11:
dev@host ~/Development $ whereis whisper
whisper: /home/dev/Development/whispervm/.direnv/python-3.11/bin/whisper
然后我创建一个脚本,理论上应该做完全相同的事情,但它识别我的 nvidia 卡,尝试使用 cuda,即使我明确声明我想使用“cpu”设备,它也会失败。
#!/usr/bin/env python
import whisper
# whisper has multiple models that you can load as per size and requirements
model = whisper.load_model("tiny").to("cpu")
# path to the audio file you want to transcribe
PATH = ".tmp/audio/chunk1.wav"
result = model.transcribe(PATH, fp16=False)
print(result["text"])
输出是这样的:
Found GPU0 Quadro K4000 which is of cuda capability 3.0.
PyTorch no longer supports this GPU because it is too old.
The minimum cuda capability supported by this library is 3.7.
warnings.warn(old_gpu_warn % (d, name, major, minor, min_arch // 10, min_arch % 10))
Traceback (most recent call last):
File "/home/dev/Development/whisper/test.py", line 2, in <module>
model = whisper.load_model("tiny").to("cpu")
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dev/Development/whispervm/.direnv/python-3.11/lib/python3.11/site-packages/whisper/__init__.py", line 149, in load_model
model.load_state_dict(checkpoint["model_state_dict"])
File "/home/dev/Development/whispervm/.direnv/python-3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Whisper:
While copying the parameter named "encoder.blocks.0.attn.query.weight", whose dimensions in the model are torch.Size([384, 384]) and whose dimensions in the checkpoint are torch.Size([384, 384]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n',).
While copying the parameter named "encoder.blocks.0.attn.key.weight", whose dimensions in the model are torch.Size([384, 384]) and whose dimensions in the checkpoint are torch.Size([384, 384]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n',).
还有更多参数错误。它不转录。我认为这可能是一个错误。
tl;dr:Whisper 在 cli 上转录时不会在 cpu 上转录为 python 脚本
编辑:已安装的 pip 软件包列表
Package Version
------------------------ ----------
bcrypt 4.0.1
certifi 2023.7.22
cffi 1.16.0
charset-normalizer 3.3.0
cmake 3.27.6
cryptography 41.0.4
decorator 5.1.1
Deprecated 1.2.14
fabric 3.2.2
filelock 3.12.4
idna 3.4
invoke 2.2.0
Jinja2 3.1.2
lit 17.0.2
llvmlite 0.41.0
MarkupSafe 2.1.3
more-itertools 10.1.0
mpmath 1.3.0
networkx 3.1
numba 0.58.0
numpy 1.25.2
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
openai-whisper 20230918
paramiko 3.3.1
pip 23.2.1
pycparser 2.21
pydub 0.25.1
PyNaCl 1.5.0
regex 2023.10.3
requests 2.31.0
setuptools 68.1.2
sympy 1.12
tiktoken 0.3.3
torch 2.0.1
tqdm 4.66.1
triton 2.0.0
typing_extensions 4.8.0
urllib3 2.0.6
wheel 0.41.2
wrapt 1.15.0
发现错误。
我查看了源代码,似乎我需要在
load_model()
函数调用中传递设备,而不是我在博客上阅读的内容。
所以正确的脚本如下所示:
import whisper
audio_file = "/home/dev/Development/whispervm/.tmp/audio/chunk1.wav"
audio = whisper.load_audio(audio_file)
model = whisper.load_model("tiny", device='cpu')
result = model.transcribe(audio)
print(result["text"])
我读到,如果你不指定设备,它应该默认为 cpu。如果 cuda 可以并且被检测到,它就会默认为 cuda,并且当您的卡对于最新版本来说太旧时,它会失败。