我正在尝试在Python中创建一个Regular-Expression
,它应该捕获多行字符串中的标题和与之对应的文本。示例字符串:
.Main Header
This is the main paragraph in the text. Also this is another sentence.
.Sub-Header
This is secondary header and text.
.Last Header
And this is the last header in the text.
这里.Main Header
,.Sub-Header
和.Last Header
是段落的标题,接下来的几行(文本,直到下一个“ .Header”字符串)是文本的正文。所以我的预期输出是:
Header1 - .Main Header, Text1 - This is the main paragraph in the text. Also this is another sentence.
Header2 - .Sub-Header, Text2 - This is secondary header and text.
Header3 - .Last Header, Text3 - And this is the last header in the text.
我已经尝试将regex
放在一起以满足这个期望,并且几乎可以奏效,我所面临的唯一挑战是捕获句子之间有dot(.)
的文本(例如ex。Text1) ,我的regex
的停止标准是newline
和dot(.)
,因为下一个标头是从dot(.)
开始的,所以我正在寻求帮助以常规行点与换行点作为我的停止标准。 >
我当前的正则表达式是:
^(.\w+[^\n]+)\n([^\.]+)
对于
Text1
,它捕获:
This is the main paragraph in the text
但应捕获:
This is the main paragraph in the text. Also this is another sentence.
我正在尝试在Python中创建一个正则表达式,该规则应在多行字符串中捕获标头和与之对应的文本。示例字符串:.Main Header这是......>
也许尝试以下正则表达式...
^(.\w+[^\n]+)\n(.*?)\.$