我正在尝试使用正则表达式捕获莎士比亚对话,以便使用正则表达式进行文本匹配。例如,我想在这个特定的场景中捕获一个名为CALIBAN
的角色所说的所有文字:
PROSPERO. Thou most lying slave,
Whom stripes may move, not kindness! I have us'd thee,
Filth as thou art, with human care, and lodg'd thee
In mine own cell, till thou didst seek to violate
The honour of my child.
CALIBAN. O ho, O ho! Would't had been done.
Thou didst prevent me. I had peopl'd else
This isle with Calibans.
PROSPERO. Thou most lying slave,
Whom stripes may move, not kindness! I have us'd thee,
Filth as thou art, with human care, and lodg'd thee
In mine own cell, till thou didst seek to violate
The honour of my child.
CALIBAN. O ho, O ho! Would't had been done.
Thou didst prevent me. I had peopl'd else
This isle with Calibans.
我想抓住
O ho, O ho! Would't had been done.
Thou didst prevent me. I had peopl'd else
This isle with Calibans.
我如何使用正则表达式来实现这一目标?我尝试了这个特殊的正则表达式:
(?<=\n CALIBAN\. )[A-Za-z ',\.\n\!-]+(?=\n PROSPERO\. |$)
注意:在实际文本中,总是有2个空格字符,然后是新字符的名称。每一行的末尾都有一个回车符。我的正则表达式寻找CALIBAN.
开始,然后匹配一些文本,并确保它必须以PROSPERO.
结束。但是,当我将其插入regexp.com时,我的整个文本都匹配:
你可以使用这个正则表达式与懒惰量词:
(?<=\n CALIBAN\. )[A-Za-z\s',.!-]+?(?=\n PROSPERO\. |$)
在PHP中使用:
$re = '/(?<=\n CALIBAN\. )[A-Za-z\s\',.!-]+?(?=\n PROSPERO\. |$)/';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the result
print_r($matches[0]);
尝试使用以下正则表达式:
CALIBAN. ((.*\n .*)*)
第一个捕获组(组1)将匹配Caliban所说的文本而不包括他的名字。根据提供的示例,此正则表达式应该可行。