在分隔符之间提取具有特定长度的整数

问题描述 投票:1回答:3

给出一个字符串列表,如:

L = ['1759@1@83@0#[email protected]@[email protected]#1094@[email protected]@14.4', 
     '[email protected]@[email protected]', 
     '[email protected]@[email protected]#1101@2@40@0#1108@2@30@0',
     '1430@[email protected]@2.15#1431@[email protected]@60.29#1074@[email protected]@58.8#1109',
     '1809@[email protected]@292.66#1816@[email protected]@95.44#1076@[email protected]@1110.61']

我需要在分隔符#@之间提取长度为4的所有整数,并提取第一个和最后一个整数。没有花车。

我的解决方案有点过于复杂 - 用空格替换然后应用this解决方案:

pat = r'(?<!\S)\d{4}(?!\S)'
out = [re.findall(pat, re.sub('[#@]', ' ', x)) for x in L]
print (out)
"""
[['1759', '1362', '1094'], 
 ['1356'], 
 ['1354', '1101', '1108'], 
 ['1430', '1431', '1074', '1109'], 
 ['1809', '1816', '1076']]
"""

是否可以更改正则表达式,因为不使用re.sub必须替换?还有另一种性能更好的解决方案吗?

python regex string findall
3个回答
5
投票

要允许没有前导或尾随分隔符的第一次和最后一次出现,您可以使用负面外观:

(?<![^#])\d{4}(?![^@])

(?<![^#])(?:^|#)的同义词。这同样适用于负前瞻。

看到现场demo here


3
投票

有趣的问题!

这可以通过前瞻和后瞻的概念轻松解决。

INPUT

pattern = "(?<!\.)(?<=[#@])\d{4}|(?<!\.)\d{4}(?=[@#])"
out = [re.findall(pattern, x) for x in L]
print (out)

OUTPUT

[['1759', '1362', '1094', '1234'],
 ['1356'],
 ['1354', '1101', '1108'],
 ['1430', '1431', '1074', '1109'],
 ['1809', '1816', '1076', '1110']]

说明

上述图案是由|分隔的两个单独图案的组合(OR运算符)。

pattern_1 = "(?<!\.)(?<=[#@])\d{4}"
\d{4}     --- Extract exactly 4 digits
(?<!\.)   --- The 4 digits must not be preceded by a period(.) NEGATIVE LOOKBEHIND
(?<=[#@]) --- The 4 digits must be preceded by a hashtag(#) or at(@) POSITIVE LOOKBEHIND

pattern_2 = "(?<!\.)\d{4}(?=[@#])"
\d{4}     --- Extract exactly 4 digits
(?<!\.)   --- The 4 digits must not be preceded by a period(.) NEGATIVE LOOKBEHIND
(?=[@#]   --- The 4 digits must be followed by a hashtag(#) or at(@) POSITIVE LOOKAHEAD

为了更好地理解这些概念,click here


1
投票

如果你考虑长度为4而没有起始#或结束@的整数,那么这里是一个复杂的列表理解而不使用正则表达式:

[[n for o in p for n in o] for p in [[[m for m in k.split("@") if m.isdigit() and str(int(m))==m and len(m) ==4] for k in j.split("#")] for j in L]]

输出:

[['1759', '1362', '1094'], ['1356'], ['1354', '1101', '1108'], ['1430', '1431', '1074', '1109'], ['1809', '1816', '1076']]
© www.soinside.com 2019 - 2024. All rights reserved.