正则表达式提取除可选最终单词之外的单词

问题描述 投票:0回答:1

我需要一个正则表达式来从列表中提取建筑物名称。我将文本和正则表达式传递给进行解析的框架,所以我真的想尝试使用正则表达式而不是代码来解决这个问题。

建筑物名称始终全部大写,前面带有“Building:”,后跟 (a) 数字、(b) 全部大写的单词“UNIT”或 (c) 任何大小写混合的单词。因此,我想得到 BUILDING ONE 作为以下所有内容的结果,除了最后一行,它应该不返回任何内容:

Building: BUILDING ONE 15 [building name followed by unit number]
Optional preceding text Building: BUILDING ONE 15 [preceding text, then building name followed unit number]
Building: BUILDING ONE UNIT 15 [building name followed by word UNIT and unit number]
Building: BUILDING ONE Floor 2 [building name followed by mixed case word]
Grounds: OPEN SPACE WEST Section 3 [not a building - return nothing]

我感觉我知道这一点,但是脑子有障碍。我现在最接近的是

^.*Building:\s([A-Z+\s*]*).*
对于上面的示例返回

BUILDING ONE
BUILDING ONE
BUILDING ONE UNIT
BUILDING ONE F
regex regex-group
1个回答
0
投票

你可以使用这个正则表达式:

(?<=Building: )[A-Z]+(?: (?!UNIT\b)[A-Z]+\b)*

此匹配:

  • (?<=Building: )
    Building: 
  • 的正向回顾
  • [A-Z]+
    :大写单词
  • (?: (?!UNIT\b)[A-Z]+\b)*
    :零个或多个不是
    UNIT
  • 的大写单词

regex101

演示
© www.soinside.com 2019 - 2024. All rights reserved.