无法使用 Google TTS 生成非拉丁字符

问题描述 投票:0回答:1

我创建了一个 python 脚本来使用 .csv 作为数据源生成音频。该脚本在生成英语/西班牙语音频时已经过验证,但我无法在 Telegu 中生成单词。

我的 .csv 文件采用 utf-8 格式,并且我的脚本指定 utf-8,但当我尝试运行脚本时,我不断收到以下错误消息:

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

我的脚本显示在下面,如果有人可以建议我可能做错了什么?

我还尝试在我的脚本中插入泰卢固语字符的 Unicode 转义序列,但这也不起作用。

我的代码如下:

import os

# Set the GOOGLE_APPLICATION_CREDENTIALS environment variable
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = r"C:\MYJSON.json"


import csv
from google.cloud import texttospeech_v1
from pydub import AudioSegment
import io

# Initialize Google Cloud TTS client
client = texttospeech_v1.TextToSpeechClient()

# Initialize an empty audio segment
final_audio = AudioSegment.empty()

# Read CSV and Generate Audio
with open("C:\\testscript.csv", 'r', newline='', encoding='utf-8') as csvfile:
    reader = csv.reader(csvfile)  # Read the CSV as a regular reader
    next(reader)  # Skip the header row
    
    for row in reader:
        phrase = row[0]  # Access the first column (Column A) for the phrase
        lang = row[1]    # Access the second column (Column B) for the language

        if phrase and lang:
            # Determine the language code based on the language specified in the CSV
            if lang == 'English':
                lang_code = 'en-US'
                voice_name = 'en-US-Wavenet-D'
            elif lang == 'Telugu':
                lang_code = 'te-IN'
                voice_name = 'te-IN-Standard-B'
            else:
                continue  # Skip rows with unrecognized languages

            # Generate and append audio
            synthesis_input = texttospeech_v1.SynthesisInput(text=phrase)
            voice = texttospeech_v1.VoiceSelectionParams(language_code=lang_code, name=voice_name)
            audio_config = texttospeech_v1.AudioConfig(audio_encoding=texttospeech_v1.AudioEncoding.MP3)
            response = client.synthesize_speech(input=synthesis_input, voice=voice, audio_config=audio_config)
            audio = AudioSegment.from_mp3(io.BytesIO(response.audio_content))
            final_audio += audio + AudioSegment.silent(duration=3000)

# Save Final Audio
final_audio.export("C:\\audio2.wav", format="wav")
python csv google-cloud-platform unicode text-to-speech
1个回答
0
投票

您遇到的错误消息,“SyntaxError:(unicode error)'unicodeescape'编解码器无法解码位置 2-3 中的字节:截断的 \UXXXXXXXX escape,” 是与字符串文字的方式相关的常见 Python 错误解释以及如何在字符串表示中处理反斜杠。

要解决此问题,您可以使用双反斜杠

\\\\
而不是单反斜杠
\\
来转义文件路径中的反斜杠。这是修改后的行:

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = r"C:\\MYJSON.json"
© www.soinside.com 2019 - 2024. All rights reserved.