Azure 认知服务语音 SDK Python:使用合成回调发出声音

问题描述 投票:0回答:1

使用

synthesizing
回调,我们如何正确地将音频数据流式传输到文件?我想在音频数据发生时立即写入文件,这不是我的最终意图,但如果这有效,我以后可以继续使用更多功能。

我必须使用

synthesizing
回调。

下面的代码中

server_bad_audio
有跳动的声音,
server_audio
一切都好。

这里有什么问题吗?有什么提示吗?

audio_queue = asyncio.Queue()
async def send_audio(self, queue):
        with wave.open("server_bad_audio.wav", "wb") as wav_file:
            wav_file.setnchannels(1)
            wav_file.setsampwidth(SAMPLE_WIDTH)
            wav_file.setframerate(FRAME_RATE)
            while True:
                audio_data = await queue.get()
                if audio_data is None:
                    break
                self.logger.info('Sending audio chunk of length {}'.format(len(audio_data)))
                wav_file.writeframes(audio_data)
def synthesize_callback(evt: SpeechSynthesisEventArgs):
            audio = evt.result.audio_data
            self.logger.info('Audio chunk received of length {}, duration {}'.format(len(audio), evt.result.audio_duration))
            audio_queue.put_nowait(audio)
...
audio_config = AudioOutputConfig(filename="server_audio.wav")
        synthesizer = SpeechSynthesizer(speech_config=self.speech_config, audio_config=audio_config)


synthesizer.synthesizing.connect(synthesize_callback)
result = synthesizer.speak_ssml_async(ssml_text).get()
...
audio_queue.put_nowait(None)
await send_audio_task
python azure azure-cognitive-services azure-python-sdk azure-speech
1个回答
0
投票

问题是 WAV 文件格式要求在写入音频数据本身之前先写入正确的音频属性的标头。

  • 修改
    send_audio
    函数,在写入音频数据之前先写入WAV文件头。使用
    send_audio
    调用
    audio_queue
    函数。现在音频数据将通过回调接收。
import asyncio
import wave
import logging
import azure.cognitiveservices.speech as speechsdk

# Replace these with your Azure Speech Service credentials
SUBSCRIPTION_KEY = "YOUR_SUBSCRIPTION_KEY"
REGION = "YOUR_REGION"

# Global variables for audio properties
SAMPLE_WIDTH = 2  # 2 bytes per sample (16-bit audio)
FRAME_RATE = 16000  # 16 kHz sample rate

# Create a logger
logger = logging.getLogger("audio_logger")
logger.setLevel(logging.INFO)
handler = logging.StreamHandler()
formatter = logging.Formatter("%(asctime)s - %(levelname)s - %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)

# Audio queue to hold audio data
audio_queue = asyncio.Queue()

async def send_audio(queue):
    with wave.open("generated_audio.wav", "wb") as wav_file:
        wav_file.setnchannels(1)
        wav_file.setsampwidth(SAMPLE_WIDTH)
        wav_file.setframerate(FRAME_RATE)

        while True:
            audio_data = await queue.get()
            if audio_data is None:
                # Break the loop when None is received to stop writing to the file.
                break
            logger.info('Writing audio chunk of length {}'.format(len(audio_data)))

            # Write the audio data to the file.
            wav_file.writeframes(audio_data)

async def synthesize_callback(evt: speechsdk.SpeechSynthesisEventArgs):
    audio = evt.result.audio_data
    logger.info('Audio chunk received of length {}, duration {}'.format(len(audio), evt.result.audio_duration))
    audio_queue.put_nowait(audio)

async def main():
    # Create an instance of the SpeechConfig with your subscription key and region
    speech_config = speechsdk.SpeechConfig(subscription=SUBSCRIPTION_KEY, region=REGION)

    # Create an instance of the SpeechSynthesizer with the SpeechConfig
    synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)

    # Connect the callback
    synthesizer.synthesizing.connect(synthesize_callback)

    # SSML text to be synthesized
    ssml_text = "<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='en-US'> \
                    <voice name='en-US-JennyNeural'> \
                        Butta bomma, Butta bomma, nannu suttukuntiveyyy, Zindagi ke atta bommaiey. \
                        Janta kattu kuntiveyyy. \
                    </voice> \
                </speak>"

    # Create a task to run the send_audio() coroutine concurrently with the main() function.
    audio_task = asyncio.create_task(send_audio(audio_queue))

    # Start the synthesis process
    result = synthesizer.speak_ssml_async(ssml_text).get()

    # Signal the audio_queue to stop writing to the file
    audio_queue.put_nowait(None)

    # Wait for the send_audio() task to complete
    await audio_task

if __name__ == "__main__":
    asyncio.run(main())
  • 语音合成器连接合成回调,并启动 SSML 文本合成。

  • synthesize_callback()
    函数将接收音频块,
    send_audio()
    函数将音频数据流式传输到WAV文件。

以下声明将帮助您确定问题是否出在接收的音频数据或 WAV 文件创建中。

async def synthesize_callback(evt: speechsdk.SpeechSynthesisEventArgs):
    audio = evt.result.audio_data
    logger.info('Audio chunk received of length {}, duration {}'.format(len(audio), evt.result.audio_duration))
    # Debug statement: Save the received audio to a file for inspection (optional)
    with open("received_audio.wav", "wb") as f:
        f.write(audio)
    audio_queue.put_nowait(audio)

检查wav文件是否在同一应用程序目录中生成。

enter image description here

输出:

enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.