列表编号前后不能添加空格

问题描述 投票:0回答:1

我正在编写一个程序,其中有时会出现一些列表,这些列表会在语音合成过程中产生问题,例如语音合成的输出显示如下: “餐厅建议:1.披萨2.汉堡3.寿司4.面条...”.语音合成将数字解释为单词的一部分,导致发音尴尬。为了解决这个问题,应该在数字和单词之间插入空格。另外,输出不宜太长;最好将列表限制为前三个建议。

我已经尝试过这段代码:

import re

def post_processing(text):
  """
  Post-processes a text string to address formatting issues for voice synthesis.

  Args:
    text: The input text string.

  Returns:
    The processed text string.
  """
  # Process lists with improved handling
  parts = text.split(":")
  if len(parts) > 1:
    # Split based on newlines, limiting to 3 items
    items = parts[1].strip().split("\n")[:3]
    # Remove trailing spaces, handle punctuation, and add spaces correctly
    items = [
        f"{item.strip()[:-1].rstrip('.')}{' ' if item.strip()[-1].isdigit() or item.strip()[-1] == '.' else ''}{item.strip()[-1:]}"
        for item in items
    ]
    text = ": ".join(items)
  else:
    text = text.strip()  # Remove leading/trailing whitespace

  # Remove URLs completely
  text = re.sub(r"https?://\S+", "", text)

  return text

所以当我输入以下内容作为输入时: text =“餐厅建议:1.披萨2.汉堡3.寿司4.面条......” 文本 = post_processing(文本)

应该有以下输出: print(text) # 输出:1.披萨2.汉堡3.寿司

但是我得到的结果如下: 1.披萨2.汉堡3。寿司4。面条.

python list split removing-whitespace trailing-whitespace
1个回答
0
投票

如果您想确保列表编号前后都有空格,您可以调整列表理解中的格式。 试试这个

import re

def post_processing(text):
    """
    Post-processes a text string to address formatting issues for voice synthesis.

    Args:
      text: The input text string.

    Returns:
      The processed text string.
    """
    # Process lists with improved handling
    parts = re.split(r'[:,]\s*', text, 1)  # Split based on colon and optional space
    if len(parts) > 1:
        # Split based on newlines, limiting to 3 items
        items = parts[1].strip().split("\n")[:3]
        # Add space before and after list numbering, limit to first three recommendations
        items = [
            f"{item.strip()[:-1]}. {item.strip()[-1:]}" for item in items
        ]
        text = " ".join(items)
    else:
        text = text.strip()  # Remove leading/trailing whitespace

    # Remove URLs completely
    text = re.sub(r"https?://\S+", "", text)

    return text

text = "Suggestions for restaurants: 1 . Pizza2. Burger3. Sushi4. Noodles...."
text = post_processing(text)
print(text)  # Output: 1. Pizza 2. Burger 3. Sushi
© www.soinside.com 2019 - 2024. All rights reserved.