无法使用 Azure 语音转文本服务将语音转换为文本

问题描述 投票:0回答:1

我正在使用以下代码通过 Azure 语音到文本服务将语音转换为文本。我想将我的音频文件转换为文本。下面是相同的代码:

import  os
import  azure.cognitiveservices.speech  as  speechsdk



def  recognize_from_microphone():

# This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"

    speech_config = speechsdk.SpeechConfig(subscription=my_key, region=my_region)

    speech_config.speech_recognition_language="en-US"

  
    audio_config = speechsdk.audio.AudioConfig(filename="C:\\Users\\DELL\\Desktop\\flowlly.com\\demo\\003. Class 3 - Monolith, Microservices, gRPC, Webhooks.mp4")

    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

  

    speech_recognition_result = speech_recognizer.recognize_once_async().get()

  

    if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:

        print("Recognized: {}".format(speech_recognition_result.text))

    elif  speech_recognition_result.reason == speechsdk.ResultReason.NoMatch:
        
        print("No speech could be recognized: {}".format(speech_recognition_result.no_match_details))

    elif  speech_recognition_result.reason == speechsdk.ResultReason.Canceled:

        cancellation_details = speech_recognition_result.cancellation_details

        print("Speech Recognition canceled: {}".format(cancellation_details.reason))

        if  cancellation_details.reason == speechsdk.CancellationReason.Error:

            print("Error details: {}".format(cancellation_details.error_details))

            print("Did you set the speech resource key and region values?")

  
recognize_from_microphone()

但是我在尝试运行转录器时遇到此错误:

 File "C:\Users\DELL\Desktop\flowlly.com\demo\transcriber.py", line 48, in <module>
    recognize_from_microphone()
  File "C:\Users\DELL\Desktop\flowlly.com\demo\transcriber.py", line 18, in recognize_from_microphone
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\DELL\AppData\Local\Programs\Python\Python312\Lib\site-packages\azure\cognitiveservices\speech\speech.py", line 1006, in __init__
    _call_hr_fn(
  File "C:\Users\DELL\AppData\Local\Programs\Python\Python312\Lib\site-packages\azure\cognitiveservices\speech\interop.py", line 62, in _call_hr_fn
    _raise_if_failed(hr)
  File "C:\Users\DELL\AppData\Local\Programs\Python\Python312\Lib\site-packages\azure\cognitiveservices\speech\interop.py", line 55, in _raise_if_failed
    __try_get_error(_spx_handle(hr))
  File "C:\Users\DELL\AppData\Local\Programs\Python\Python312\Lib\site-packages\azure\cognitiveservices\speech\interop.py", line 50, in __try_get_error
    raise RuntimeError(message)
RuntimeError: Exception with error code:
[CALL STACK BEGIN]

    > pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - recognizer_create_speech_recognizer_from_config
    - recognizer_create_speech_recognizer_from_config

[CALL STACK END]

Exception with an error code: 0xa (SPXERR_INVALID_HEADER)

我已经安装了相同的 sdk,但它不起作用。我现在该怎么办?

python azure speech-recognition speech-to-text
1个回答
0
投票

Azure 语音转文本服务当前支持的格式是 WAV(16 kHz 或 8 kHz、16 位和单声道 PCM)。

  • 将 .mp4 文件转换为 WAV 格式。确保转换后的 WAV 文件具有以下规格:采样率:16 kHz 或 8 kHz,位深度:16 位。
    filename="path/to/your/converted_file.wav"
import os
import azure.cognitiveservices.speech as speechsdk

def recognize_from_audio_file():
    # Replace 'my_key' and 'my_region' with your actual subscription key and region
    my_key = "YourSubscriptionKey"
    my_region = "YourRegion"


    speech_config = speechsdk.SpeechConfig(subscription=my_key, region=my_region)
    speech_config.speech_recognition_language = "en-US"

    # Provide the path to your WAV audio file
    audio_file_path = r"C:\Users\samplest 3.wav"
    audio_config = speechsdk.audio.AudioConfig(filename=audio_file_path)

    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

    speech_recognition_result = speech_recognizer.recognize_once_async().get()

    if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print("Recognized: {}".format(speech_recognition_result.text))
    elif speech_recognition_result.reason == speechsdk.ResultReason.NoMatch:
        print("No speech could be recognized: {}".format(speech_recognition_result.no_match_details))
    elif speech_recognition_result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = speech_recognition_result.cancellation_details
        print("Speech Recognition canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            print("Error details: {}".format(cancellation_details.error_details))
            print("Did you set the speech resource key and region values?")

recognize_from_audio_file()

输出: enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.