使用 Azure TTS 读取字节 AudioDataStream 并播放

问题描述 投票:0回答:1

我不明白如何在 python 中读取 TTS azure 服务的字节流 - 并重新播放流

来自文档:https://learn.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.audiodatastream?view=azure-python

bool = can_read_data(requested_bytes: int, pos: int) 和 int = read_data(audio_buffer: bytes, pos: int | None = None)

所以

import azure.cognitiveservices.speech as speechsdk
speech_config = speechsdk.SpeechConfig(subscription='key', region='uksouth')
speech_config.set_speech_synthesis_output_format(speechsdk.SpeechSynthesisOutputFormat.Riff16Khz16BitMonoPcm)
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=None)


text = "Hello, world!"
# Synthesize the speech
result = speech_synthesizer.speak_text_async(text).get()

# Create an AudioDataStream from the synthesized result
if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
    print("Speech synthesized for text [{}]".format(text))
    audio_data_stream = speechsdk.AudioDataStream(result)
    audio_data_stream.save_to_wav_file("output.wav")
    # Reset the stream position to the beginning since saving to file puts the position to end.
    audio_data_stream.position = 0

    # Reads data from the stream
    audio_buffer = bytes(16000)
    total_size = 0
    filled_size = audio_data_stream.read_data(audio_buffer)
    while filled_size > 0:
        print("{} bytes received.".format(filled_size))
        total_size += filled_size
        filled_size = audio_data_stream.read_data(audio_buffer)
    print("Totally {} bytes received for text [{}].".format(total_size, text))
        # Initialize playing

    from pydub import AudioSegment
    import io

    audio_segment = AudioSegment(
        data=audio_buffer,  # The raw audio data you received
        sample_width=2,  # Bytes per sample
        frame_rate=16000,  # Sampling frequency
        channels=1  # Mono
    )
    
    from pydub.playback import play
    play(audio_segment)

elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech synthesis canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))
        

它的流媒体和保存。但这条流听起来不太对劲。我错了什么?

azure text-to-speech azure-cognitive-services
1个回答
0
投票

我将以下行添加到您的代码中,并能够获取输出流,成功将音频保存到 output.wav 文件。

from  pydub.playback  import  play
audio_segment = AudioSegment(data=audio_buffer[:filled_size]

代码:

完整代码如下:

import azure.cognitiveservices.speech as speechsdk
from pydub import AudioSegment
from pydub.playback import play
import io

speech_config = speechsdk.SpeechConfig(subscription='<speech_key>', region='<speech_region>')
speech_config.set_speech_synthesis_output_format(speechsdk.SpeechSynthesisOutputFormat.Riff16Khz16BitMonoPcm)
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=None)

text = "Hello, world!"
result = speech_synthesizer.speak_text_async(text).get()

if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
    print("Speech synthesized for text [{}]".format(text))
    audio_data_stream = speechsdk.AudioDataStream(result)
    audio_data_stream.save_to_wav_file("output.wav")
    audio_data_stream.position = 0

    audio_buffer = bytes(16000)
    total_size = 0
    filled_size = audio_data_stream.read_data(audio_buffer)
    while filled_size > 0:
        print("{} bytes received.".format(filled_size))
        total_size += filled_size
        audio_segment = AudioSegment(data=audio_buffer[:filled_size], sample_width=2, frame_rate=16000, channels=1)

        play(audio_segment)
        filled_size = audio_data_stream.read_data(audio_buffer)
    print("Totally {} bytes received for text [{}].".format(total_size, text))

elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech synthesis canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))

输出:

上面的代码运行成功,我能够听到音频流。

enter image description here

C:\Users\xxxxxxxx\Documents\xxxxxxxx>python app.py
Speech synthesized for text [Hello, world!]
16000 bytes received.
16000 bytes received.
12000 bytes received.
Totally 44000 bytes received for text [Hello, world!].
© www.soinside.com 2019 - 2024. All rights reserved.