字符串的正则表达式

Question

我想在python中分割字符串。

示例字符串：

嗨，这是ACTI。场景1和SCENE 2，这是ACT II。场景1和场景2及更多

..进入列表：

['Hi this is', 'ACT I. SCENE 1', 'and', 'SCENE2', 'and this is', 'ACT II. SCENE 1', 'and' , 'SCENE 2', 'and more']

有人可以帮我建立正则表达式吗？我建立的是

(ACT [A-Z]+.\sSCENE\s[0-9]+)]?(.*)(SCENE [0-9]+)

但是这不能正常工作。

Answer 1

这是一个有效的脚本，尽管有点黑：

inp = "Hi this is ACT I. SCENE 1 and SCENE 2 and this is ACT II. SCENE 1 and SCENE 2 and more"
parts = re.findall(r'[A-Z]{2,}(?: [A-Z0-9.]+)*|(?![A-Z]{2})\w+(?: (?![A-Z]{2})\w+)*', inp)
print(parts)

此打印：

['Hi this is', 'ACT I. SCENE 1', 'and', 'SCENE 2', 'and this is', 'ACT II. SCENE 1',
 'and', 'SCENE 2', 'and more']

对正则表达式逻辑的解释，它使用一种替代来匹配两种情况之一：

[A-Z]{2,}              match TWO or more capital letters
(?: [A-Z0-9.]+)*       followed by zero or more words, consisting only of
                       capital letters, numbers, or period
|                      OR
(?![A-Z]{2})\w+        match a word which does NOT start with two capital letters
(?: (?![A-Z]{2})\w+)*  then match zero or more similar terms

Answer 2

如果我正确理解了您的要求，则可以使用以下模式：

(?:ACT|SCENE).+?\d+|\S.*?(?=\s?(?:ACT|SCENE|$))

Demo。

字符串的正则表达式

问题描述投票：1回答：2

2个回答

最新问题

字符串的正则表达式

问题描述 投票：1回答：2

2个回答

最新问题

问题描述投票：1回答：2