我正在寻找一种将正则表达式应用于文本并以字典的形式提取其值的方法。
正则表达式中的组可以是命名的、未命名的或混合的。
理想情况下,我会使用模糊匹配(允许文本中出现一些错误)。
文字示例:
Name: foo BaR; Age: 42
Name: (?<name>[a-z]+) (?<lastname>[A-Z]+); Age: (\d+)'
{name: foo, lastname: BaR, gr0: 42}
带着问题,我也在下面贴出我的答案
如果有更好的方法,我很乐意采用;)
干杯:)
这就是我到目前为止使用的。
{e<=3}
)regex.search(...).capturedict()
提取命名组的字典
text="Name: foo BaR; Age: 42"
pattern = r'Name: (?<name>[a-z]+) (?<lastname>[A-Z]+); Age: (\d+)'
def name_groups_in_regex(pattern):
'''Make sure that all the groups in the regex are named-groups
Replace a non-named-group by "gr<idx>"
'''
# Pattern to get non-named group
get_parenthesis_pattern = r"(?<!\\)\((?!\?)"
# Count matches
n_parenthesis = len(re.findall(get_parenthesis_pattern, pattern))
# substitute non-named group with named group
pattern = re.sub(get_parenthesis_pattern, "(?<gr%d>", pattern)
pattern = pattern % tuple(i for i in range(n_parenthesis))
return pattern
# Name the groups in the regex
pattern = name_groups_in_regex(pattern)
# Perform fuzzy matching with an overall maximum of 3 'errors'
fuzzy_pattern = f'({pattern}){{e<=3}}'
regex.search(fuzzy_pattern, text, regex.BESTMATCH).capturesdict()
输出:
{'name': ['foo'], 'lastname': ['BaR'], 'gr0': ['42']}