Python regex找不到某些模式

Question

我正在尝试从html代码中解析乳胶代码，如下所示：

string = " your answer is wrong! Solution: based on \((\vec{n_E},\vec{g})= 0 \) and \(d(g,E)=0\) beeing ... "

我想用以乳胶代码作为参数的函数的输出替换所有乳胶代码。（由于查找正确的模式存在问题，因此函数extract目前返回空字符串）

我尝试过：

latex_end = "\)"
latex_start = "\("    
string = re.sub(r'{}.*?{}'.format(latex_start, latex_end), extract, string)

结果：

your answer is wrong! Solution: based on \= 0 \) and \=0\) beeing ...

预期

your answer is wrong! Solution: based on and beeing ...

任何想法为什么找不到样式？有办法实现吗？

Answer 1

这是因为反斜杠在Python中充当转义字符。这使得处理这类情况非常棘手。以下是完成这项工作的两种快速方法：

import re

extract = lambda a: ""

# Using no raw components
string = " your answer is wrong! Solution: based on \((\vec{n_E},\vec{g})= 0 \) and \(d(g,E)=0\) beeing ... "
latex_bounds = ("\\\(", "\\\)\)")
print(re.sub('{}.*?{}'.format(*latex_bounds), extract, string))

# Using all raw components (backslashes mean nothing, but not really)
string = r"%s" % string
latex_bounds = (r"\\\(", r"\\\)")
print(re.sub(r'{}.*?{}'.format(*latex_bounds), extract, string))

Answer 2

由于string被解释为特殊字符，因此您应在\v的定义中使用原始字符串。

import re

string = r" your answer is wrong! Solution: based on \((\vec{n_E},\vec{g})= 0 \) and \(d(g,E)=0\) beeing ... "


string = re.sub(r'\\\(.*?\\\)', '', string))
print(string)

打印：

 your answer is wrong! Solution: based on  and  beeing ...

Python regex找不到某些模式

问题描述投票：0回答：2

2个回答

最新问题

Python regex找不到某些模式

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2