Python 正则表达式删除某些模式之间的文本

Question

我有以下格式的文字。

|开始|这是要删除的第一个段落|结束|.
这是另一篇文章。
|start| 这是要删除的另一个段落 |end|。再来一些自由文本

我想删除 |start| 之间的所有文本和|结束|

我试过跟随re.

regex = '(?<=\|start\|).+(?=\|end\|)'
re.sub(regex, ''. text)

它回来了

“又是一些自由文本”

但我希望回来

这是另一篇文章。再来一些自由文本

Answer 1

注意开始/结束定界符在您的模式中的环视构造中，因此将保留在

re.sub

之后的结果字符串中。你应该将 lookbehind 和 lookahead 转化为消费模式。

此外，您似乎想删除右手定界符后的特殊字符，因此您需要在正则表达式的末尾添加

[^\w\s]*

。

你可以使用

import re
text = """|start| this is first para to remove |end|.
this is another text.
|start| this is another para to remove |end|. Again some free text"""
print( re.sub(r'(?s)\|start\|.*?\|end\|[^\w\s]*', '', text).replace('\n', '') )
# => this is another text. Again some free text

查看 Python 演示。

正则表达式细节

```
(?s)
```
- 内联 DOTALL 修饰符
```
\|start\|
```
-
```
|start|
```
文本
```
.*?
```
- 任何 0+ 个字符，尽可能少
```
\|end\|
```
-
```
|end|
```
文本
```
[^\w\s]*
```
- 除了单词和空格字符之外的 0 个或更多字符。

Answer 2

试试这个：

import re

your_string = """|start| this is first para to remove |end|.
this is another text.
|start| this is another para to remove |end|. Again some free text"""

regex = r'(\|start\|).+(\|end\|\.)'

result = re.sub(regex, '', your_string).replace('\n', '')

print(result)

输出：

this is another text. Again some free text

Answer 3

0
投票

（删了，不知道怎么删）

Python 正则表达式删除某些模式之间的文本

问题描述投票：0回答：3

3个回答

最新问题

Python 正则表达式删除某些模式之间的文本

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3