通过 OpenAi 的耳语转录：AssertionError：不正确的音频形状

Question

我正在尝试使用 OpenAI 的开源 Whisper 库来转录音频文件。

这是我的脚本的源代码：

import whisper

model = whisper.load_model("large-v2")

# load the entire audio file
audio = whisper.load_audio("/content/file.mp3")
#When i write that code snippet here ==> audio = whisper.pad_or_trim(audio) the first 30 secs are converted and without any problem they are converted.

# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio).to(model.device)

# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

# decode the audio
options = whisper.DecodingOptions(fp16=False)
result = whisper.decode(model, mel, options)

# print the recognized text if available
try:
    if hasattr(result, "text"):
        print(result.text)
except Exception as e:
    print(f"Error while printing transcription: {e}")

# write the recognized text to a file
try:
    with open("output_of_file.txt", "w") as f:
        f.write(result.text)
        print("Transcription saved to file.")
except Exception as e:
    print(f"Error while saving transcription: {e}")

在这里：

# load the entire audio file
audio = whisper.load_audio("/content/file.mp3")

当我在下面写下：“audio = tweet.pad_or_trim(audio)”时，声音文件的前 30 秒会毫无问题地转录，并且语言检测也能正常工作，

但是当我删除它并希望转录整个文件时，我收到以下错误：

断言错误：音频形状不正确

我该怎么办？我应该更改声音文件的结构吗？如果是，我应该使用哪个库以及应该编写什么类型的脚本？

Answer 1

我遇到了同样的问题，经过一番挖掘后我发现

whisper.decode

旨在提取有关输入的元数据，例如语言，因此限制为 30 秒。（请参阅解码函数的源代码此处）

为了转录（即使是超过 30 秒的音频），您可以使用

whisper.transcribe

，如以下代码片段所示

import whisper

model = whisper.load_model("large-v2")

# load the entire audio file
audio = whisper.load_audio("/content/file.mp3")

options = {
    "language": "en", # input language, if omitted is auto detected
    "task": "translate" # or "transcribe" if you just want transcription
}
result = whisper.transcribe(model, audio, **options)
print(result["text"])

您可以在源代码中找到一些关于 transcribe 方法的文档以及一些关于 DecodingOptions 结构

的文档

通过 OpenAi 的耳语转录：AssertionError：不正确的音频形状

问题描述投票：0回答：1

1个回答

最新问题

通过 OpenAi 的耳语转录：AssertionError：不正确的音频形状

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1