ImageMagick的＆PyPDF2崩溃的Python一起使用时

Question

我有包括大约20〜25页的PDF文件。这个工具的目的是分裂的PDF文件转换成网页（使用PyPdf2），节约每一PDF页面（使用PyPdf2）目录，将PDF页面转换成图像（使用ImageMagick），然后用正方体对他们进行一些OCR（使用PIL和PyOCR）中提取的数据。该工具最终将通过Tkinter的一个图形用户界面，因此用户可以通过点击一个按钮执行相同的操作多次。在我的测试沉重，我已经注意到，如果整个过程重复6-7次，工具/ Python脚本通过展示上没有响应Windows崩溃。我已经进行了一些调试，但遗憾的是没有抛出错误。内存和CPU都很好，所以没有问题，有作为。我能够通过观察，达到对正方体部分之前，PyPDF2和ImageMagick的是，当他们一起运行未能缩小问题。我能够通过它简化为以下Python代码复制的问题：

from wand.image import Image as Img
from PIL import Image as PIL
import pyocr
import pyocr.builders
import io, sys, os 
from PyPDF2 import PdfFileWriter, PdfFileReader


def splitPDF (pdfPath):
    #Read the PDF file that needs to be parsed.
    pdfNumPages =0
    with open(pdfPath, "rb") as pdfFile:
        inputpdf = PdfFileReader(pdfFile)

        #Iterate on every page of the PDF.
        for i in range(inputpdf.numPages):
            #Create the PDF Writer Object
            output = PdfFileWriter()
            output.addPage(inputpdf.getPage(i))
            with open("tempPdf%s.pdf" %i, "wb") as outputStream:
                output.write(outputStream)

        #Get the number of pages that have been split.
        pdfNumPages = inputpdf.numPages

    return pdfNumPages

pdfPath = "Test.pdf"
for i in range(1,20):
    print ("Run %s\n--------" %i)
    #Split the PDF into Pages & Get PDF number of pages.
    pdfNumPages = splitPDF (pdfPath)
    print(pdfNumPages)
    for i in range(pdfNumPages):
        #Convert the split pdf page to image to run tesseract on it.
        with Img(filename="tempPdf%s.pdf" %i, resolution=300) as pdfImg:
            print("Processing Page %s" %i)

我已经使用了与语句来正确处理文件的打开和关闭，所以应该没有内存泄漏出现。我曾尝试单独运行单独拆分部分和图像转换的一部分，当独自跑了，他们工作得很好。然而，当码组合，它会遍历周边5-6次后失败。我已经使用尝试和异常块，但没有错误都会被抓住。另外我使用的是最新版本的所有库。任何帮助或指导表示赞赏。

谢谢。

Answer 1

以供将来参考，这个问题是由于ImageMagick的的32位版本作为评价的一个提到的（由于emcconville）。卸载Python和ImageMagick的32位版本和安装都64位版本固定的问题。希望这可以帮助。

ImageMagick的＆PyPDF2崩溃的Python一起使用时

问题描述投票：0回答：1

1个回答

最新问题

ImageMagick的＆PyPDF2崩溃的Python一起使用时

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1