通过 OpenAi 的耳语转录:AssertionError:不正确的音频形状

问题描述 投票:0回答:1

我正在尝试使用 OpenAI 的开源 Whisper 库来转录音频文件。

这是我的脚本的源代码:

import whisper

model = whisper.load_model("large-v2")

# load the entire audio file
audio = whisper.load_audio("/content/file.mp3")
#When i write that code snippet here ==> audio = whisper.pad_or_trim(audio) the first 30 secs are converted and without any problem they are converted.

# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio).to(model.device)

# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

# decode the audio
options = whisper.DecodingOptions(fp16=False)
result = whisper.decode(model, mel, options)

# print the recognized text if available
try:
    if hasattr(result, "text"):
        print(result.text)
except Exception as e:
    print(f"Error while printing transcription: {e}")

# write the recognized text to a file
try:
    with open("output_of_file.txt", "w") as f:
        f.write(result.text)
        print("Transcription saved to file.")
except Exception as e:
    print(f"Error while saving transcription: {e}")

在这里:

# load the entire audio file
audio = whisper.load_audio("/content/file.mp3")

当我在下面写下:“audio = tweet.pad_or_trim(audio)”时,声音文件的前 30 秒会毫无问题地转录,并且语言检测也能正常工作,

但是当我删除它并希望转录整个文件时,我收到以下错误:

断言错误:音频形状不正确

我该怎么办?我应该更改声音文件的结构吗?如果是,我应该使用哪个库以及应该编写什么类型的脚本?

python python-3.x ffmpeg openai-api openai-whisper
1个回答
4
投票

我遇到了同样的问题,经过一番挖掘后我发现

whisper.decode
旨在提取有关输入的元数据,例如语言,因此限制为 30 秒。 (请参阅解码函数的源代码此处

为了转录(即使是超过 30 秒的音频),您可以使用

whisper.transcribe
,如以下代码片段所示

import whisper

model = whisper.load_model("large-v2")

# load the entire audio file
audio = whisper.load_audio("/content/file.mp3")

options = {
    "language": "en", # input language, if omitted is auto detected
    "task": "translate" # or "transcribe" if you just want transcription
}
result = whisper.transcribe(model, audio, **options)
print(result["text"])

您可以在源代码中找到一些关于 transcribe 方法的文档以及一些关于 DecodingOptions 结构

的文档
© www.soinside.com 2019 - 2024. All rights reserved.