从pdf搜索多个单词

Question

我正在尝试编写一个Python脚本，该脚本将在pdf文件中查找特定的单词。现在，我必须滚动结果以找到找到它的行。

我希望单独包含单词的行被打印或保存为单独的文件。

# import packages
import PyPDF2
import re

# open the pdf file
object = PyPDF2.PdfFileReader("Filename.pdf")

# get number of pages
NumPages = object.getNumPages()

# define keyterms
Strings = "House|Property|street"

# extract text and do the search
for i in range(0, NumPages):
    PageObj = object.getPage(i)
    print("this is page " + str(i)) 
    Text = PageObj.extractText() 
    # print(Text)
    ResSearch = re.search(Strings, Text)
    print(ResSearch)

当我运行上面的代码时，我需要在输出中滚动以找到单词所在的行。我希望包含单词的行将被打印或保存为单独的文件，或者仅包含行的页面将被保存在单独的pdf或txt文件中。感谢您的提前帮助

Answer 1

您可以在每一页上的文本分割行之后使用re.match。

例如：

re.match

从pdf搜索多个单词

问题描述投票：0回答：1

1个回答

最新问题

从pdf搜索多个单词

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1