Python正则表达式：查找多个特定长度的子字符串（查找字符串中的5位和6位数字）

Question

我正在为技术报告建立一个 .pdf 数据抓取器。原始数据大多为多页 .pdf 形式，其中有用数据仅在第一页。

使用 PyPDF2 模块，我将所有第一页合并到一个大的单个 .pdf 文件中，其中包含技术报告的所有第一页。

我使用 PdfReader 将每个页面的文本字符串作为字符串附加到列表中。为了便于说明，列表如下所示 =>
list_o_text= [ '随机字符串 1 2 3 45 6789 999999 22222', '技术报告示例 444444' ]

list_o_text 中的每个字符串肯定包含一个或多个 5 或 6 位数字。

我最近发现了 RE 模块。然而我在寻找合适的函数来搜索它们时遇到了问题。

我非常感谢帮助。

################################################## ########################## 尝试使用 findall() 空闲输入：_______

import re

list_o_text= [ 'Random string 1 2 3 45 6789 999999 22222', 'Example tech report 444444' ]

for n in range(len(list_o_text)):
find = re.findall('\d{5}+',list_o_text[n])
print(find)

空闲外壳输出：___

['99999','22222'] ['44444']

注：六位数‘999999’未完整找到

尝试使用 search() 空闲输入：_______

import re

list_o_text= [ 'Random string 1 2 3 45 6789 999999 22222', 'Example tech report 444444' ]

for n in range(len(list_o_text)):
find = re.search('\d{5}+',list_o_text[n])
print(find

空闲外壳输出：___

注意：给出位置，并且范围不包含 6 位数字

Attempt with search().group()

import re

list_o_text= [ 'Random string 1 2 3 45 6789 999999 22222', 'Example tech report 444444' ]

for n in range(len(list_o_text)):
find = re.search('\d{5}+',list_o_text[n]).group()
print(find)

空闲外壳输出：___ 99999 44444

注：六位数‘999999’未完整找到

################################################## ################ 复杂的解决方案 我使用了所有三种方法，但无法动摇它可以更简单的感觉

空闲输入：_______

`import re

list_o_text= [ 'Random string 1 2 3 45 6789 999999 22222', 'Example tech report 444444' ]

for n in range(len(list_o_text)):
    find_all = re.findall('\d{5}+',list_o_text[n])  
    #1st loop result is ['99999','22222']

    for five_d_num in find_all:
        
        find_start = re.search(five_d_num,list_o_text[n]).start()

        find = re.search('\d+',list_o_text[n][find_start: ]).group()
    
        print(find)`

空闲外壳输出：___ 999999 22222 444444

就是这样。

Answer 1

图案

\d{5}+

不是你需要的，你想要的

\d{5,6}

。

我强烈推荐regex101.com来构建和测试正则表达式模式。该网站提供了该模式组件的详细分类。

Python正则表达式：查找多个特定长度的子字符串（查找字符串中的5位和6位数字）

问题描述投票：0回答：1

注：六位数‘999999’未完整找到

注意：给出位置，并且范围不包含 6 位数字

注：六位数‘999999’未完整找到

空闲外壳输出：___ 999999 22222 444444

1个回答

最新问题

Python正则表达式：查找多个特定长度的子字符串（查找字符串中的5位和6位数字）

问题描述 投票：0回答：1

注：六位数‘999999’未完整找到

注意：给出位置，并且范围不包含 6 位数字

注：六位数‘999999’未完整找到

空闲外壳输出：___ 999999 22222 444444

1个回答

最新问题

问题描述投票：0回答：1