字符串的正则表达式

问题描述 投票:1回答:2

我想在python中分割字符串。

示例字符串:

嗨,这是ACTI。场景1和SCENE 2,这是ACT II。场景1和场景2及更多

..进入列表:

['Hi this is', 'ACT I. SCENE 1', 'and', 'SCENE2', 'and this is', 'ACT II. SCENE 1', 'and' , 'SCENE 2', 'and more']

有人可以帮我建立正则表达式吗?我建立的是

(ACT [A-Z]+.\sSCENE\s[0-9]+)]?(.*)(SCENE [0-9]+)

但是这不能正常工作。

python regex nsregularexpression
2个回答
1
投票

这是一个有效的脚本,尽管有点黑:

inp = "Hi this is ACT I. SCENE 1 and SCENE 2 and this is ACT II. SCENE 1 and SCENE 2 and more"
parts = re.findall(r'[A-Z]{2,}(?: [A-Z0-9.]+)*|(?![A-Z]{2})\w+(?: (?![A-Z]{2})\w+)*', inp)
print(parts)

此打印:

['Hi this is', 'ACT I. SCENE 1', 'and', 'SCENE 2', 'and this is', 'ACT II. SCENE 1',
 'and', 'SCENE 2', 'and more']

对正则表达式逻辑的解释,它使用一种替代来匹配两种情况之一:

[A-Z]{2,}              match TWO or more capital letters
(?: [A-Z0-9.]+)*       followed by zero or more words, consisting only of
                       capital letters, numbers, or period
|                      OR
(?![A-Z]{2})\w+        match a word which does NOT start with two capital letters
(?: (?![A-Z]{2})\w+)*  then match zero or more similar terms

0
投票

如果我正确理解了您的要求,则可以使用以下模式:

(?:ACT|SCENE).+?\d+|\S.*?(?=\s?(?:ACT|SCENE|$))

Demo

© www.soinside.com 2019 - 2024. All rights reserved.