Python：仅使用RegEx在字符串中的特定单词之后查找完整文本

Question

文本如下：

text = list of documents check 01 original invoice in favour of company z 02 cjpc abstract sheet weighment 
slip goods receipt note iz checklist creator id name 30009460 [email protected]
checklist creation date 31 03 2018 checklist print date time 31 03 2018 10 45 57 note anything 
written manually on the checklist will not be considered invoice parth enterprise â invoice no dated 
kashish aarcade baroi road 18 25 mar 2018 village baroi delivery note mode terms of payment taluka 
mundra kutch supplierâ s ref other reference s gst no 24acypt3861 c1 z 7 dated buyer i buyer s order 
no 21 jun 2017 abc corporation 5700214006 â dated 40 mwp solar power plant i despatch document no 
vill bitta ta naliya abadasa despatched through destination march 18 terms of

目标：我想提取单词“发票”之后的文本，特别是第二次出现的“发票”]

我的方法：

txt = re.findall('invoice (.*)',text)
在上述方法中，我期望的字符串列表如下：

txt = ['in favour of company z 02 cjpc abstract sheet weighment slip goods receipt note iz checklist creator id name 30009460 [email protected] checklist creation date 31 03 2018 checklist print date time 31 03 2018 10 45 57 note anything written manually on the checklist will not be considered','parth enterprise â invoice no dated kashish aarcade baroi road 18 25 mar 2018 village baroi delivery note mode terms of payment taluka ..... #rest of the string]

但是我得到的是text中给出的整个字符串，即原始字符串。如果使用text.partition('invoice')，则无法获得txt中提到的正确字符串。

任何帮助将不胜感激。

有如下文本：文本=支持公司z的文件清单支票01原始发票02 cjpc摘要表称重发票收货单iz清单创建者ID名称30009460 ...

Answer 1

如果您要像问题中那样获得2个匹配项，则可以使用2个捕获组。

Answer 2

这可以通过split（）方法轻松完成例如：

Answer 3

您的正则表达式invoice (.*)将与第一个文字invoice匹配，后跟空格，然后(.*)会贪婪地捕获group1中正在发生的其余文本，这是预期的正确行为。

Answer 4

0
投票

更新

Answer 5

使用用于分割输入的更简单的正则表达式可以更有效地解决此问题：

import re text= r"""list of documents check 01 original invoice in favour of company z 02 cjpc abstract sheet weighment slip goods receipt note iz checklist creator id name 30009460 [email protected] checklist creation date 31 03 2018 checklist print date time 31 03 2018 10 45 57 note anything written manually on the checklist will not be considered invoice parth enterprise â invoice no dated kashish aarcade baroi road 18 25 mar 2018 village baroi delivery note mode terms of payment taluka mundra kutch supplierâ s ref other reference s gst no 24acypt3861 c1 z 7 dated buyer i buyer s order no 21 jun 2017 abc corporation 5700214006 â dated 40 mwp solar power plant i despatch document no vill bitta ta naliya abadasa despatched through destination march 18 terms of""" #matches = re.split(r'\b\s*invoice\s*\b', text)[1:-1] # if arbitrary white space can come before and after "invoice" matches = re.split(r'\b ?invoice ?\b', text)[1:-1] for i, match in enumerate(matches): print(f'\nMatch {i + 1}:\n', match, sep='')

Python：仅使用RegEx在字符串中的特定单词之后查找完整文本

问题描述投票：0回答：5

5个回答

最新问题

Python：仅使用RegEx在字符串中的特定单词之后查找完整文本

问题描述 投票：0回答：5

5个回答

最新问题

问题描述投票：0回答：5