我对
cognitive
的文本转语音(TTS)服务非常陌生。我成功地使用 Microsoft Azure
的 TTS
服务将给定文本转换为音频文件。当我的 Azure
voice
文档中有单个 SSML
元素时,它工作得很好。 XML
工作的例子是;SSML
但是,当我有多个语音标签(基于性别)时,它会导致错误。它的
<speak version="1.0" xml:lang="en-US">
<voice xml:lang="en-US" xml:gender="Male" name="en-US-Jessa24kRUS">
Hello, this is my sample text to convert into audio?
</voice>
</speak>
是:
SSML
错误是:
响应状态代码不表示成功:400(SSML 必须包含最多 5 个语音元素。实际为 6 个。)。
如果有人解释为什么将我限制为五个
<speak version="1.0" xml:lang="en-US">
<voice xml:lang="en-US" xml:gender="Male" name="en-US-Guy24kRUS"> What’s your name? </voice>
<voice xml:lang="en-US" xml:gender="Female" name="en-US-Jessa24kRUS"> My name is Cindy Smith. Do you know John Silver?</voice>
<voice xml:lang="en-US" xml:gender="Male" name="en-US-Guy24kRUS"> John and I are old friends. </voice>
<voice xml:lang="en-US" xml:gender="Female" name="en-US-Jessa24kRUS"> John just joined our company as a salesperson. </voice>
<voice xml:lang="en-US" xml:gender="Male" name="en-US-Guy24kRUS"> That’s good news. John has been a salesperson for chemical products for many years. </voice>
<voice xml:lang="en-US" xml:gender="Female" name="en-US-Jessa24kRUS"> I head he really likes his new job.</voice>
</speak>
,而
文档中没有提到任何限制,这对我会有很大帮助。
voice tags
voice
标签。例如。 32 个 audio
标签和 13 个 voice
。