Python正则表达式拆分但保留某些字符以进行拆分

问题描述 投票:0回答:1

我有以下文字

text = "Perennials. Stolons slender. Perianth bristles 6 or 7, ca. 2 × as long as nutlet"

我想使用定义为“。\ s [A-Z]”的单独分割段落。但是,我仍然希望保留原始句子中的[A-Z],这样输出就是:

['Perennials',
 'Stolons slender',
 'Perianth bristles 6 or 7, ca. 2 × as long as nutlet']

到目前为止,我所做的是:

re.split(r'\.\s[A-Z]', text)

但它删除了第一个字母:

['Perennials',
 'tolons slender',
 'erianth bristles 6 or 7, ca. 2 × as long as nutlet']

有人可以帮忙吗?谢谢〜

regex python-3.x split
1个回答
2
投票

使用前瞻分割:

result = re.split(r'\.\s(?=[A-Z])', text)
print(result)

['Perennials', 'Stolons slender', 'Perianth bristles 6 or 7, ca. 2 × as long as nutlet']

前瞻(?=[A-Z])将断言,但不消耗,点和空格后面的是大写字母。

© www.soinside.com 2019 - 2024. All rights reserved.