使用正则表达式捕捉莎士比亚角色的对话

问题描述 投票:1回答:2

我正在尝试使用正则表达式捕获莎士比亚对话,以便使用正则表达式进行文本匹配。例如,我想在这个特定的场景中捕获一个名为CALIBAN的角色所说的所有文字:

  PROSPERO. Thou most lying slave,
    Whom stripes may move, not kindness! I have us'd thee,
    Filth as thou art, with human care, and lodg'd thee
    In mine own cell, till thou didst seek to violate
    The honour of my child.

  CALIBAN. O ho, O ho! Would't had been done.
    Thou didst prevent me. I had peopl'd else
    This isle with Calibans.

  PROSPERO. Thou most lying slave,
    Whom stripes may move, not kindness! I have us'd thee,
    Filth as thou art, with human care, and lodg'd thee
    In mine own cell, till thou didst seek to violate
    The honour of my child.

  CALIBAN. O ho, O ho! Would't had been done.
    Thou didst prevent me. I had peopl'd else
    This isle with Calibans.

我想抓住

O ho, O ho! Would't had been done.
        Thou didst prevent me. I had peopl'd else
        This isle with Calibans.

我如何使用正则表达式来实现这一目标?我尝试了这个特殊的正则表达式:

(?<=\n  CALIBAN\. )[A-Za-z ',\.\n\!-]+(?=\n  PROSPERO\. |$)

注意:在实际文本中,总是有2个空格字符,然后是新字符的名称。每一行的末尾都有一个回车符。我的正则表达式寻找CALIBAN.开始,然后匹配一些文本,并确保它必须以PROSPERO.结束。但是,当我将其插入regexp.com时,我的整个文本都匹配:enter image description here

regex regex-lookarounds
2个回答
3
投票

你可以使用这个正则表达式与懒惰量词:

(?<=\n  CALIBAN\. )[A-Za-z\s',.!-]+?(?=\n  PROSPERO\. |$)

Updated Regex Demo

在PHP中使用:

$re = '/(?<=\n  CALIBAN\. )[A-Za-z\s\',.!-]+?(?=\n  PROSPERO\. |$)/';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

// Print the result
print_r($matches[0]);

1
投票

尝试使用以下正则表达式:

CALIBAN. ((.*\n .*)*)

第一个捕获组(组1)将匹配Caliban所说的文本而不包括他的名字。根据提供的示例,此正则表达式应该可行。

© www.soinside.com 2019 - 2024. All rights reserved.