使用正则表达式捕获Python脚本中的所有字符串

Question

此问题是由于我尝试调整此答案后的失败尝试而受到启发：RegEx: Grabbing values between quotation marks

请考虑以下Python脚本（t.py）：

print("This is also an NL test")
variable = "!\n"
print('And this has an escaped quote "don\'t"  in it ', variable,
      "This has a single quote ' but doesn\'t end the quote as it" + \
      " started with double quotes")
if "Foo Bar" != '''Another Value''':
    """
    This is just nonsense
    """
    aux = '?'
    print("Did I \"failed\"?", f"{aux}")

我想在其中捕获所有字符串，为：

This is also an NL test
!\n
And this has an escaped quote "don\'t" in it
This has a single quote ' but doesn\'t end the quote as it
started with double quotes
Foo Bar
Another Value
This is just nonsense
?
Did I \"failed\"?
{aux}

[我使用re模块编写了另一个Python脚本，从我尝试使用正则表达式中发现的大多数脚本是：

import re
pattern = re.compile(r"""(?<=(["']\b))(?:(?=(\\?))\2.)*?(?=\1)""")
with open('t.py', 'r') as f:
    msg = f.read()
x = pattern.finditer(msg, re.DOTALL)
for i, s in enumerate(x):
    print(f'[{i}]',s.group(0))

具有以下结果：

[0] And this has an escaped quote "don\'t" in it
[1] This has a single quote ' but doesn\'t end the quote as it started with double quotes
[2] Foo Bar
[3] Another Value
[4] Did I \"failed\"?

为了改善失败率，我也无法完全复制在regex101.com中可以找到的内容：

顺便说一句，我正在使用Python 3.6.9，我希望获得更多有关正则表达式的见识以破解这一问题。

Answer 1

由于要匹配'''或"""或'或"作为定界符，请将所有这些都放在第一组中：

('''|"""|["'])

不要在其后加上\b，因为当这些字符串以非文字字符开头时，它就不会与字符串匹配。

因为要确保在引擎开始下一次迭代时，不会将

final分隔符视为起始分隔符，所以您需要完全匹配（不只是提前查找）。

除分隔符外，任何可以匹配的中间部分都可以：
((?:\\.|.)*?)

全部放在一起：('''|"""|["'])((?:\\.|.)*?)\1

并且您想要的结果将在第二个捕获组中：pattern = re.compile(r"""(?s)('''|\"""|["'])((?:\\.|.)*?)\1""")
with open('t.py', 'r') as f:
    msg = f.read()
x = pattern.finditer(msg, re.DOTALL)
for i, s in enumerate(x):
    print(f'[{i}]',s.group(2))

https://regex101.com/r/dvw0Bc/1

使用正则表达式捕获Python脚本中的所有字符串

问题描述投票：1回答：1

1个回答

最新问题

使用正则表达式捕获Python脚本中的所有字符串

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1