添加 ssml 字符串后,我的 Azure 文本转语音应用程序不再输出

问题描述 投票:0回答:1

我使用 Microsoft Azure 创建了一个文本转语音脚本。今天我决定要添加一个音调转换器、语速转换器和可能的一些静音添加。为此,我需要用 speak_ssml_async(ssml_string) 替换我的 speak_text_async(text)。自从我这样做以来,TTS 停止播放并且没有生成任何 .wav 文件。我所做的只是添加恒定的 50% 间距只是为了测试它,ssml_string 并将合成器更改为 ssml 而不是文本(否则它只会读取 ssml 中的 html 行。

我只将 speak_ssml_async 改回 speak_text_async 到但保留(ssml_string)以确认问题出在 ssml_string 但我无法弄清楚它是什么,因为我没有收到任何错误。

我将把代码的相关部分留在这里。请记住,我有自定义输出文件名和目录选择器,以及此 def 之前的 tts 文本输入的输入标签。

        #Directory selector
        output_label = ttk.Label(self, text="Choose your output folder:", 
                                 font=platformfont, 
                                 style="Output.TLabel")
        output_label.pack(pady=2)
        self.output_dir_button = ttk.Button(self, text="Browse", command=self.choose_output_dir, 
                                            takefocus=False, 
                                            style="Custom.TButton")
        self.output_dir_button.pack()
        self.output_dir_path = tk.StringVar()
        self.output_dir_path.set("")
        self.output_dir_entry = tk.Entry(self, textvariable=self.output_dir_path, font=inputfont, 
                                        width=55, 
                                        foreground="#395578", 
                                        state='readonly', 
                                        background="light gray", 
                                        readonlybackground="#Eed9c9", 
                                        borderwidth=0, 
                                        cursor="X_cursor",
                                        relief="flat")
        self.output_dir_entry.pack(pady=5)

        #Output filename
        output_filename_label = ttk.Label(self, text="Enter output filename (without extension):", 
                                          font=platformfont, 
                                          style="Output.TLabel")
        output_filename_label.pack(pady=5)

        #Listen button
        speak_button = ttk.Button(self, text="Listen & Generate", command=self.speak_text, 
                                  takefocus=False, 
                                  style="Custom.TButton")
        speak_button.pack(pady=15)

    def choose_output_dir(self):
        dir_path = filedialog.askdirectory()
        if dir_path:
            self.output_dir_path.set(dir_path)

    def speak_text(self):
        text = self.input_text.get("1.0", "end")
        output_dir = self.output_dir_path.get()
        output_filename = self.output_filename_text.get()
        if output_filename == "":
            output_filename = "tcnoutput"
        output_file = os.path.join(output_dir, output_filename + ".wav")

        if os.path.exists(output_file):
            response = messagebox.askyesnocancel("File Exists", "A file with the same name already exists. Do you want to overwrite it?",
                                            icon='warning')
            if response == True:
                os.remove(output_file)
            elif response == False:
                i = 1
                while os.path.exists(os.path.join(output_dir, output_filename + f"({i})" + ".wav")):
                    i += 1
                output_filename = output_filename + f"({i})"
                output_file = os.path.join(output_dir, output_filename + ".wav")
            else:
                raise KeyboardInterrupt
            
        pitch = "+50.0%"
        ssml_string = f"<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='ro-RO'>" \
                      f"<prosody pitch='{pitch}'>{text}</prosody></speak>"
        
        speech_synthesis_result = self.speech_synthesizer.speak_ssml_async(ssml_string).get()
        if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
            with open(output_file, "wb") as f:
                f.write(speech_synthesis_result.audio_data)
            if output_dir == "":
                output_final = os.getcwd() + "\\" + output_filename + ".wav"
            else:
                output_final = output_dir + "/" + output_filename + ".wav"
            messagebox.showinfo("Success", f"Audio file successfully saved at: {output_final}")
        else:
            messagebox.showerror("Error", "Speech synthesis failed.")
python azure text-to-speech azure-cognitive-services ssml
1个回答
0
投票

我尝试了下面的 python 代码来配置文本到语音并使用 SSML 配置语音设置,并获得了所需的音频输出,如下所示:-

代码:

import  azure.cognitiveservices.speech  as  speechsdk
import  io
import  wave
speech_config = speechsdk.SpeechConfig(subscription="key", region="region")
synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
ssml_string = "<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='en-US'><voice name='en-US-JennyNeural'><prosody pitch='+50%'>Hello, my friend! How are you?</prosody></voice></speak>"
result = synthesizer.speak_ssml_async(ssml_string).get()
if  result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
    print("SSML string is correct")
else:
    print("SSML string is incorrect: {}".format(result.errorDetails))
    with  io.BytesIO(result.audio_data) as  compressedAudioStream:
        with  wave.open("test.wav", "wb") as  wavFile:
            wavFile.setnchannels(1)
            wavFile.setsampwidth(2)
            wavFile.setframerate(16000)
            wavFile.writeframes(compressedAudioStream.read())

输出:
输入文本的音频在 wav 文件中生成 ,
enter image description here

  • 参考这两个 MS 文档来配置 SSML 代码Doc1 &Doc2
© www.soinside.com 2019 - 2024. All rights reserved.