使用
synthesizing
回调,我们如何正确地将音频数据流式传输到文件?我想在音频数据发生时立即写入文件,这不是我的最终意图,但如果这有效,我以后可以继续使用更多功能。
我必须使用
synthesizing
回调。
下面的代码中
server_bad_audio
有跳动的声音,server_audio
一切都好。
这里有什么问题吗?有什么提示吗?
audio_queue = asyncio.Queue()
async def send_audio(self, queue):
with wave.open("server_bad_audio.wav", "wb") as wav_file:
wav_file.setnchannels(1)
wav_file.setsampwidth(SAMPLE_WIDTH)
wav_file.setframerate(FRAME_RATE)
while True:
audio_data = await queue.get()
if audio_data is None:
break
self.logger.info('Sending audio chunk of length {}'.format(len(audio_data)))
wav_file.writeframes(audio_data)
def synthesize_callback(evt: SpeechSynthesisEventArgs):
audio = evt.result.audio_data
self.logger.info('Audio chunk received of length {}, duration {}'.format(len(audio), evt.result.audio_duration))
audio_queue.put_nowait(audio)
...
audio_config = AudioOutputConfig(filename="server_audio.wav")
synthesizer = SpeechSynthesizer(speech_config=self.speech_config, audio_config=audio_config)
synthesizer.synthesizing.connect(synthesize_callback)
result = synthesizer.speak_ssml_async(ssml_text).get()
...
audio_queue.put_nowait(None)
await send_audio_task
问题是 WAV 文件格式要求在写入音频数据本身之前先写入正确的音频属性的标头。
send_audio
函数,在写入音频数据之前先写入WAV文件头。使用 send_audio
调用 audio_queue
函数。现在音频数据将通过回调接收。import asyncio
import wave
import logging
import azure.cognitiveservices.speech as speechsdk
# Replace these with your Azure Speech Service credentials
SUBSCRIPTION_KEY = "YOUR_SUBSCRIPTION_KEY"
REGION = "YOUR_REGION"
# Global variables for audio properties
SAMPLE_WIDTH = 2 # 2 bytes per sample (16-bit audio)
FRAME_RATE = 16000 # 16 kHz sample rate
# Create a logger
logger = logging.getLogger("audio_logger")
logger.setLevel(logging.INFO)
handler = logging.StreamHandler()
formatter = logging.Formatter("%(asctime)s - %(levelname)s - %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)
# Audio queue to hold audio data
audio_queue = asyncio.Queue()
async def send_audio(queue):
with wave.open("generated_audio.wav", "wb") as wav_file:
wav_file.setnchannels(1)
wav_file.setsampwidth(SAMPLE_WIDTH)
wav_file.setframerate(FRAME_RATE)
while True:
audio_data = await queue.get()
if audio_data is None:
# Break the loop when None is received to stop writing to the file.
break
logger.info('Writing audio chunk of length {}'.format(len(audio_data)))
# Write the audio data to the file.
wav_file.writeframes(audio_data)
async def synthesize_callback(evt: speechsdk.SpeechSynthesisEventArgs):
audio = evt.result.audio_data
logger.info('Audio chunk received of length {}, duration {}'.format(len(audio), evt.result.audio_duration))
audio_queue.put_nowait(audio)
async def main():
# Create an instance of the SpeechConfig with your subscription key and region
speech_config = speechsdk.SpeechConfig(subscription=SUBSCRIPTION_KEY, region=REGION)
# Create an instance of the SpeechSynthesizer with the SpeechConfig
synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
# Connect the callback
synthesizer.synthesizing.connect(synthesize_callback)
# SSML text to be synthesized
ssml_text = "<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='en-US'> \
<voice name='en-US-JennyNeural'> \
Butta bomma, Butta bomma, nannu suttukuntiveyyy, Zindagi ke atta bommaiey. \
Janta kattu kuntiveyyy. \
</voice> \
</speak>"
# Create a task to run the send_audio() coroutine concurrently with the main() function.
audio_task = asyncio.create_task(send_audio(audio_queue))
# Start the synthesis process
result = synthesizer.speak_ssml_async(ssml_text).get()
# Signal the audio_queue to stop writing to the file
audio_queue.put_nowait(None)
# Wait for the send_audio() task to complete
await audio_task
if __name__ == "__main__":
asyncio.run(main())
语音合成器连接合成回调,并启动 SSML 文本合成。
synthesize_callback()
函数将接收音频块,send_audio()
函数将音频数据流式传输到WAV文件。
以下声明将帮助您确定问题是否出在接收的音频数据或 WAV 文件创建中。
async def synthesize_callback(evt: speechsdk.SpeechSynthesisEventArgs):
audio = evt.result.audio_data
logger.info('Audio chunk received of length {}, duration {}'.format(len(audio), evt.result.audio_duration))
# Debug statement: Save the received audio to a file for inspection (optional)
with open("received_audio.wav", "wb") as f:
f.write(audio)
audio_queue.put_nowait(audio)
检查wav文件是否在同一应用程序目录中生成。
输出: