我正在编写一个程序,其中有时会出现一些列表,这些列表会在语音合成过程中产生问题,例如语音合成的输出显示如下: “餐厅建议:1.披萨2.汉堡3.寿司4.面条...”.语音合成将数字解释为单词的一部分,导致发音尴尬。为了解决这个问题,应该在数字和单词之间插入空格。另外,输出不宜太长;最好将列表限制为前三个建议。
我已经尝试过这段代码:
import re
def post_processing(text):
"""
Post-processes a text string to address formatting issues for voice synthesis.
Args:
text: The input text string.
Returns:
The processed text string.
"""
# Process lists with improved handling
parts = text.split(":")
if len(parts) > 1:
# Split based on newlines, limiting to 3 items
items = parts[1].strip().split("\n")[:3]
# Remove trailing spaces, handle punctuation, and add spaces correctly
items = [
f"{item.strip()[:-1].rstrip('.')}{' ' if item.strip()[-1].isdigit() or item.strip()[-1] == '.' else ''}{item.strip()[-1:]}"
for item in items
]
text = ": ".join(items)
else:
text = text.strip() # Remove leading/trailing whitespace
# Remove URLs completely
text = re.sub(r"https?://\S+", "", text)
return text
所以当我输入以下内容作为输入时: text =“餐厅建议:1.披萨2.汉堡3.寿司4.面条......” 文本 = post_processing(文本)
应该有以下输出: print(text) # 输出:1.披萨2.汉堡3.寿司
但是我得到的结果如下: 1.披萨2.汉堡3。寿司4。面条.
如果您想确保列表编号前后都有空格,您可以调整列表理解中的格式。 试试这个
import re
def post_processing(text):
"""
Post-processes a text string to address formatting issues for voice synthesis.
Args:
text: The input text string.
Returns:
The processed text string.
"""
# Process lists with improved handling
parts = re.split(r'[:,]\s*', text, 1) # Split based on colon and optional space
if len(parts) > 1:
# Split based on newlines, limiting to 3 items
items = parts[1].strip().split("\n")[:3]
# Add space before and after list numbering, limit to first three recommendations
items = [
f"{item.strip()[:-1]}. {item.strip()[-1:]}" for item in items
]
text = " ".join(items)
else:
text = text.strip() # Remove leading/trailing whitespace
# Remove URLs completely
text = re.sub(r"https?://\S+", "", text)
return text
text = "Suggestions for restaurants: 1 . Pizza2. Burger3. Sushi4. Noodles...."
text = post_processing(text)
print(text) # Output: 1. Pizza 2. Burger 3. Sushi