如何让正则表达式仅在行以特定字符串开头时匹配单个单词?

问题描述 投票:0回答:2

我正在尝试让 RegEx 匹配特定角色所说的对话中的所有单词。每行的格式都为“[NAME]: [DIALOGUE]”,因此在每行的开头都有一个一致的标记以供检查,但我不知道该怎么做。例如,如果我正在查看罗密欧与朱丽叶中罗密欧的对话,它会匹配“罗密欧:我爱你朱丽叶”中的每个单词,但不会匹配“朱丽叶:我爱你罗密欧”中的任何单词。

我唯一想到的可能解决方案是使用后向断言,对此我有

(?<=NAME:[.*])\w+
,但不会返回任何匹配项。通过一些调试和查看其他答案,我发现问题在于添加
[.*]
,特别是方括号。这让我想到了
(?<=^NAME:).*\w+
,它几乎成功了,但它匹配了整行对话而不是单个单词。

发帖的时候看了复习题,看到了这道题,里面有代码

\Aframe.*width\s(?<width>\d+)\sheight\s(?<height>\d+)\z
。我尝试将其修改为
\ANAME:.*\w+\s(?<\w+>\d+)\s\z
,然后修改为
\ANAME:.*\w+\s(?\w+\d+)\s\z
,但都返回关于第二个
\w+
的错误,引用“bad escape”。然后我查看了this question,它有代码
(^@property|(?!^)\G)(.*? )\K([^-\n]\w+)
,但即使是没有任何修改的基本代码也返回了相同的“bad escape”错误。

python regex regex-lookarounds
2个回答
0
投票

如果行以给定的字符名称开头,则简单地遍历行并匹配所有单词:

import re

def get_character_words(character_name, dialogue):
  result = []
    
  for line in dialogue.splitlines():
    if line.startswith(character_name):
      result += re.findall(fr'\w+', line[len(character_name) + 2:])
    
  return result

试试看:

text = '''
Romeo: Shall I hear more, or shall I speak at this?
Juliet: ’Tis but thy name that is my enemy. Thou art thyself, though not a Montague. What’s Montague? It is nor hand, nor foot, Nor arm, nor face. O, be some other name Belonging to a man. What’s in a name? That which we call a rose
Romeo: Call me but love, and I’ll be new baptized. Henceforth I never will be Romeo.
'''

print(get_character_words('Romeo', text))

'''
[
  'Shall', 'I', 'hear', 'more', 'or', 'shall', 'I', 'speak', 'at', 'this',
  'Call', 'me', 'but', 'love', 'and', 'I', 'll', 'be', 'new', 'baptized',
  'Henceforth', 'I', 'never', 'will', 'be', 'Romeo'
]
'''

0
投票

使用第三方

regex
模块的纯正则表达式解决方案:

import regex

def get_character_words(character_name, dialogue):
  return regex.findall(fr'(?<=^{character_name}: .*)\b\w+', dialogue, flags = regex.M)

试试看:

text = '''
Romeo: Shall I hear more, or shall I speak at this?
Juliet: ’Tis but thy name that is my enemy. Thou art thyself, though not a Montague. What’s Montague? It is nor hand, nor foot, Nor arm, nor face. O, be some other name Belonging to a man. What’s in a name? That which we call a rose
Romeo: Call me but love, and I’ll be new baptized. Henceforth I never will be Romeo.
'''

print(get_character_words('Juliet', text))

'''
[
  'Tis', 'but', 'thy', 'name', 'that', 'is', 'my', 'enemy',
  'Thou', 'art', 'thyself', 'though', 'not', 'a', 'Montague',
  ...
]
'''
© www.soinside.com 2019 - 2024. All rights reserved.