我现在尝试了1小时,以便在regex101上找到答案...但是我没有解决。这是通过我的正则表达式的小清单:
list = ["This.is.Test.Nr.One.C01B01.42U.Rack.08-Datacenter1",
"Is.this.Nr.Two.C03B03.London.48U.Rack.04-Datacenter4",
"This.Number.Random.C02.Frankfurt.42U.Rack.12-Datacenter1",
"This.is.Random.Number.C08B01.Zuerich.Rack.01-Datacenter2"
现在我要捕获5个组。我尝试了以下正则表达式:\A(.+)\.(C\d{1,2})(B\d{1,2})?.?(42U|48U)?.+-(.+)
Group1:
This.is.Test.Nr.One
Is.this.Nr.Two
This.Number.Random
This.is.Random.Number
Group2:
C01
C03
C02
C08
Group3:
B01
B03
**missing but should still work for all the other groups**
B01
Group4:
42U
48U
42U
**missing but should still work for all the other groups**
Group5:
Datacenter1
Datacenter4
Datacenter1
Datacenter2
编辑:我想也可能是这种情况:
Is.this.Nr.Two.B03.London.48U.Rack.04-Datacenter4
一种方法是使与C,B和U零件的匹配为可选,并在内部使用捕获组
^(?:(.+?)\.(C\d{1,2}))?(?:.*?(B\d{1,2}))?\.(?:.*?(42U|48U))?.*-(.+)$
对于更新后的问题,您可以重复第一部分,并用点分隔,断言右边的字符不是B或C,后跟2位数字。
然后首先匹配可选的B部分,然后匹配可选的C部分,最后匹配可选的U部分。
然后匹配,直到字符串的末尾并回溯到最后出现的-
,并捕获最后一个捕获组中的后续内容。
^([^.]+(?:\.(?![BC]\d{2})[^.]+)*)\.(?:(C\d{2})(?:\.(?![BC]\d{2})[^.\n]+)*)?(?:(B\d{2})(?:\.(?![BC]\d{2})[^.\n]+)*?)?(?:.*?\.(42U|48U))?.*-(.+)$
(.+)\.(C\d{1,2})?(B\d{1,2})?(?:\.[A-Za-z]+)*\.(42U|48U)?.+-(.+)
将起作用。测试代码如下:
import re
list_ = [
"This.is.Test.Nr.One.C01B01.42U.Rack.08-Datacenter1",
"Is.this.Nr.Two.C03B03.London.48U.Rack.04-Datacenter4",
"This.Number.Random.C02.Frankfurt.42U.Rack.12-Datacenter1",
"This.is.Random.Number.C08B01.Zuerich.Rack.01-Datacenter2",
"Is.this.Nr.Two.B03.London.48U.Rack.04-Datacenter4",
]
pattern = re.compile(
r'(.+)\.' # group1 and trailing literal "."
r'(C\d{1,2})?' # group2(optional)
r'(B\d{1,2})?' # group3(optional)
r'(?:\.[A-Za-z]+)*\.' # skipping for example multiple ".London" and "."
r'(42U|48U)?.+-' # group4(optional) and ... -
r'(.+)' # group5
)
for text in list_:
match_object = pattern.search(text)
if match_object:
print(match_object.groups())
else:
print('Not matched')
输出:
('This.is.Test.Nr.One', 'C01', 'B01', '42U', 'Datacenter1')
('Is.this.Nr.Two', 'C03', 'B03', '48U', 'Datacenter4')
('This.Number.Random', 'C02', None, '42U', 'Datacenter1')
('This.is.Random.Number', 'C08', 'B01', None, 'Datacenter2')
('Is.this.Nr.Two', None, 'B03', '48U', 'Datacenter4')