Python regex找不到某些模式

问题描述 投票:0回答:2

我正在尝试从html代码中解析乳胶代码,如下所示:

string = " your answer is wrong! Solution: based on \((\vec{n_E},\vec{g})= 0 \) and \(d(g,E)=0\) beeing ... "

我想用以乳胶代码作为参数的函数的输出替换所有乳胶代码。 (由于查找正确的模式存在问题,因此函数extract目前返回空字符串)

我尝试过:

latex_end = "\)"
latex_start = "\("    
string = re.sub(r'{}.*?{}'.format(latex_start, latex_end), extract, string)

结果:

your answer is wrong! Solution: based on \= 0 \) and \=0\) beeing ...

预期

your answer is wrong! Solution: based on and beeing ...

任何想法为什么找不到样式?有办法实现吗?

regex python-3.x latex string-parsing python-regex
2个回答
0
投票

这是因为反斜杠在Python中充当转义字符。这使得处理这类情况非常棘手。以下是完成这项工作的两种快速方法:

import re

extract = lambda a: ""

# Using no raw components
string = " your answer is wrong! Solution: based on \((\vec{n_E},\vec{g})= 0 \) and \(d(g,E)=0\) beeing ... "
latex_bounds = ("\\\(", "\\\)\)")
print(re.sub('{}.*?{}'.format(*latex_bounds), extract, string))

# Using all raw components (backslashes mean nothing, but not really)
string = r"%s" % string
latex_bounds = (r"\\\(", r"\\\)")
print(re.sub(r'{}.*?{}'.format(*latex_bounds), extract, string))

0
投票

由于string被解释为特殊字符,因此您应在\v的定义中使用原始字符串。

import re

string = r" your answer is wrong! Solution: based on \((\vec{n_E},\vec{g})= 0 \) and \(d(g,E)=0\) beeing ... "


string = re.sub(r'\\\(.*?\\\)', '', string))
print(string)

打印:

 your answer is wrong! Solution: based on  and  beeing ...
© www.soinside.com 2019 - 2024. All rights reserved.