使用正则表达式捕获Python脚本中的所有字符串

问题描述 投票:1回答:1

此问题是由于我尝试调整此答案后的失败尝试而受到启发:RegEx: Grabbing values between quotation marks

请考虑以下Python脚本(t.py):

print("This is also an NL test")
variable = "!\n"
print('And this has an escaped quote "don\'t"  in it ', variable,
      "This has a single quote ' but doesn\'t end the quote as it" + \
      " started with double quotes")
if "Foo Bar" != '''Another Value''':
    """
    This is just nonsense
    """
    aux = '?'
    print("Did I \"failed\"?", f"{aux}")

我想在其中捕获所有字符串,为:

  • This is also an NL test
  • !\n
  • And this has an escaped quote "don\'t" in it
  • This has a single quote ' but doesn\'t end the quote as it
  • started with double quotes
  • Foo Bar
  • Another Value
  • This is just nonsense
  • ?
  • Did I \"failed\"?
  • {aux}

[我使用re模块编写了另一个Python脚本,从我尝试使用正则表达式中发现的大多数脚本是:

import re
pattern = re.compile(r"""(?<=(["']\b))(?:(?=(\\?))\2.)*?(?=\1)""")
with open('t.py', 'r') as f:
    msg = f.read()
x = pattern.finditer(msg, re.DOTALL)
for i, s in enumerate(x):
    print(f'[{i}]',s.group(0))

具有以下结果:

  • [0] And this has an escaped quote "don\'t" in it
  • [1] This has a single quote ' but doesn\'t end the quote as it started with double quotes
  • [2] Foo Bar
  • [3] Another Value
  • [4] Did I \"failed\"?

为了改善失败率,我也无法完全复制在regex101.com中可以找到的内容:

enter image description here

顺便说一句,我正在使用Python 3.6.9,我希望获得更多有关正则表达式的见识以破解这一问题。

python regex string double-quotes single-quotes
1个回答
0
投票
由于要匹配'''"""'"作为定界符,请将所有这些都放在第一组中:

('''|"""|["'])

不要在其后加上\b,因为当这些字符串以非文字字符开头时,它就不会与字符串匹配。

因为要确保在引擎开始下一次迭代时,不会将

final分隔符视为起始分隔符,所以您需要完全匹配(不只是提前查找)。

除分隔符外,任何可以匹配的中间部分都可以:

((?:\\.|.)*?)

全部放在一起:

('''|"""|["'])((?:\\.|.)*?)\1

并且您想要的结果将在第二个捕获组中:

pattern = re.compile(r"""(?s)('''|\"""|["'])((?:\\.|.)*?)\1""") with open('t.py', 'r') as f: msg = f.read() x = pattern.finditer(msg, re.DOTALL) for i, s in enumerate(x): print(f'[{i}]',s.group(2))

https://regex101.com/r/dvw0Bc/1
© www.soinside.com 2019 - 2024. All rights reserved.