PyPDF2：从zip文件读取pdf

Question

我正在尝试让PyPDF2读取一个简单zip文件中的一个.pdf小文件。这是到目前为止我得到的：

import PyPDF2,zipfile

with zipfile.ZipFile("TEST.zip") as z:
    filename = z.namelist()[0]
    a = z.filelist[0]
    b = z.open(filename)
    c = z.read(filename)
    PyPDF2.PdfFileReader(b)

错误信息：

PdfReadWarning: PdfFileReader stream/file object is not in binary mode. It may not be read correctly. [pdf.py:1079]
io.UnsupportedOperation: seek

任何想法都值得赞赏！谢谢。

Answer 1

由于尚未提取文件，因此无法使用open()对其进行操作。

不过没关系，因为PdfFileReader需要一个流；因此我们可以使用BytesIO提供它。下面的示例获取解压缩的字节，并将其提供给BytesIO，这使它们成为PdfFileReader的流。如果您省略BytesIO，则会得到：AttributeError: 'bytes' object has no attribute 'seek'。

import PyPDF2,zipfile
from io import BytesIO                             

with zipfile.ZipFile('sample.zip','r') as z: 
    filename = z.namelist()[0] 
    pdf_file = PyPDF2.PdfFileReader(BytesIO(z.read(filename)))

结果：

In [20]: pdf_file
Out[20]: <PyPDF2.pdf.PdfFileReader at 0x7f01b61db2b0>

In [21]: pdf_file.getPage(0)
Out[21]: 
{'/Type': '/Page',
 '/Parent': {'/Type': '/Pages',
  '/Count': 2,
  '/Kids': [IndirectObject(4, 0), IndirectObject(6, 0)]},
 '/Resources': {'/Font': {'/F1': {'/Type': '/Font',
    '/Subtype': '/Type1',
    '/Name': '/F1',
    '/BaseFont': '/Helvetica',
    '/Encoding': '/WinAnsiEncoding'}},
  '/ProcSet': ['/PDF', '/Text']},
 '/MediaBox': [0, 0, 612, 792],
 '/Contents': {}}

PyPDF2：从zip文件读取pdf

问题描述投票：0回答：1

1个回答

最新问题

PyPDF2：从zip文件读取pdf

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1