无法使用 Google TTS 生成非拉丁字符

Question

我创建了一个 python 脚本来使用 .csv 作为数据源生成音频。该脚本在生成英语/西班牙语音频时已经过验证，但我无法在 Telegu 中生成单词。

我的 .csv 文件采用 utf-8 格式，并且我的脚本指定 utf-8，但当我尝试运行脚本时，我不断收到以下错误消息：

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

我的脚本显示在下面，如果有人可以建议我可能做错了什么？

我还尝试在我的脚本中插入泰卢固语字符的 Unicode 转义序列，但这也不起作用。

我的代码如下：

import os

# Set the GOOGLE_APPLICATION_CREDENTIALS environment variable
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = r"C:\MYJSON.json"


import csv
from google.cloud import texttospeech_v1
from pydub import AudioSegment
import io

# Initialize Google Cloud TTS client
client = texttospeech_v1.TextToSpeechClient()

# Initialize an empty audio segment
final_audio = AudioSegment.empty()

# Read CSV and Generate Audio
with open("C:\\testscript.csv", 'r', newline='', encoding='utf-8') as csvfile:
    reader = csv.reader(csvfile)  # Read the CSV as a regular reader
    next(reader)  # Skip the header row
    
    for row in reader:
        phrase = row[0]  # Access the first column (Column A) for the phrase
        lang = row[1]    # Access the second column (Column B) for the language

        if phrase and lang:
            # Determine the language code based on the language specified in the CSV
            if lang == 'English':
                lang_code = 'en-US'
                voice_name = 'en-US-Wavenet-D'
            elif lang == 'Telugu':
                lang_code = 'te-IN'
                voice_name = 'te-IN-Standard-B'
            else:
                continue  # Skip rows with unrecognized languages

            # Generate and append audio
            synthesis_input = texttospeech_v1.SynthesisInput(text=phrase)
            voice = texttospeech_v1.VoiceSelectionParams(language_code=lang_code, name=voice_name)
            audio_config = texttospeech_v1.AudioConfig(audio_encoding=texttospeech_v1.AudioEncoding.MP3)
            response = client.synthesize_speech(input=synthesis_input, voice=voice, audio_config=audio_config)
            audio = AudioSegment.from_mp3(io.BytesIO(response.audio_content))
            final_audio += audio + AudioSegment.silent(duration=3000)

# Save Final Audio
final_audio.export("C:\\audio2.wav", format="wav")

Answer 1

您遇到的错误消息，“SyntaxError：（unicode error）'unicodeescape'编解码器无法解码位置 2-3 中的字节：截断的 \UXXXXXXXX escape，” 是与字符串文字的方式相关的常见 Python 错误解释以及如何在字符串表示中处理反斜杠。

要解决此问题，您可以使用双反斜杠

\\\\

而不是单反斜杠

\\

来转义文件路径中的反斜杠。这是修改后的行：

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = r"C:\\MYJSON.json"

无法使用 Google TTS 生成非拉丁字符

问题描述投票：0回答：1

1个回答

最新问题

无法使用 Google TTS 生成非拉丁字符

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1