我需要一个正则表达式来从列表中提取建筑物名称。我将文本和正则表达式传递给进行解析的框架,所以我真的想尝试使用正则表达式而不是代码来解决这个问题。
建筑物名称始终全部大写,前面带有“Building:”,后跟 (a) 数字、(b) 全部大写的单词“UNIT”或 (c) 任何大小写混合的单词。因此,我想得到 BUILDING ONE 作为以下所有内容的结果,除了最后一行,它应该不返回任何内容:
Building: BUILDING ONE 15 [building name followed by unit number]
Optional preceding text Building: BUILDING ONE 15 [preceding text, then building name followed unit number]
Building: BUILDING ONE UNIT 15 [building name followed by word UNIT and unit number]
Building: BUILDING ONE Floor 2 [building name followed by mixed case word]
Grounds: OPEN SPACE WEST Section 3 [not a building - return nothing]
我感觉我知道这一点,但是脑子有障碍。我现在最接近的是
^.*Building:\s([A-Z+\s*]*).*
对于上面的示例返回
BUILDING ONE
BUILDING ONE
BUILDING ONE UNIT
BUILDING ONE F
你可以使用这个正则表达式:
(?<=Building: )[A-Z]+(?: (?!UNIT\b)[A-Z]+\b)*
此匹配:
(?<=Building: )
:Building:
[A-Z]+
:大写单词(?: (?!UNIT\b)[A-Z]+\b)*
:零个或多个不是 UNIT