我正在尝试从html代码中解析乳胶代码,如下所示:
string = " your answer is wrong! Solution: based on \((\vec{n_E},\vec{g})= 0 \) and \(d(g,E)=0\) beeing ... "
我想用以乳胶代码作为参数的函数的输出替换所有乳胶代码。 (由于查找正确的模式存在问题,因此函数extract
目前返回空字符串)
我尝试过:
latex_end = "\)"
latex_start = "\("
string = re.sub(r'{}.*?{}'.format(latex_start, latex_end), extract, string)
结果:
your answer is wrong! Solution: based on \= 0 \) and \=0\) beeing ...
预期
your answer is wrong! Solution: based on and beeing ...
任何想法为什么找不到样式?有办法实现吗?
这是因为反斜杠在Python中充当转义字符。这使得处理这类情况非常棘手。以下是完成这项工作的两种快速方法:
import re
extract = lambda a: ""
# Using no raw components
string = " your answer is wrong! Solution: based on \((\vec{n_E},\vec{g})= 0 \) and \(d(g,E)=0\) beeing ... "
latex_bounds = ("\\\(", "\\\)\)")
print(re.sub('{}.*?{}'.format(*latex_bounds), extract, string))
# Using all raw components (backslashes mean nothing, but not really)
string = r"%s" % string
latex_bounds = (r"\\\(", r"\\\)")
print(re.sub(r'{}.*?{}'.format(*latex_bounds), extract, string))
由于string
被解释为特殊字符,因此您应在\v
的定义中使用原始字符串。
import re
string = r" your answer is wrong! Solution: based on \((\vec{n_E},\vec{g})= 0 \) and \(d(g,E)=0\) beeing ... "
string = re.sub(r'\\\(.*?\\\)', '', string))
print(string)
打印:
your answer is wrong! Solution: based on and beeing ...