cv2直接转为魔方，无需保存。

Question

import pytesseract
from pdf2image import convert_from_path, convert_from_bytes
import cv2,numpy
def pil_to_cv2(image):
    open_cv_image = numpy.array(image)
    return open_cv_image[:, :, ::-1].copy() 


path='OriginalsFile.pdf'
images = convert_from_path(path)
cv_h=[pil_to_cv2(i) for i in images]
img_header = cv_h[0][:160,:]
#print(pytesseract.image_to_string(Image.open('test.png'))) I only found this in tesseract docs

你好，请问有什么方法可以阅读 img_header 直接使用pytesseract而不保存。

pytesseract docs

Answer 1

pytesseract.image_to_string() 输入格式

正如文件所解释的那样 pytesseract.image_to_string() 所以你可以很容易地将你的简历图像转换为PIL图像，就像这样。

from PIL import Image
... (your code)
print(pytesseract.image_to_string(Image.fromarray(img_header)))

如果你真的不想用PIL!

请看。https:/github.commadmazepytesseractblobmastersrcpytesseract.py。

pytesseract是运行tesseract命令的一个简单的包装器。def run_and_get_output() 行，你会看到它将你的图片保存到一个临时文件中，然后给tesseract地址来运行。

因此，你也可以用opencv做同样的事情，只要重写pytesseract就可以了。.py 文件来做，尽管；我没有看到任何性能上的改进。

Answer 2

fromarray函数可以让你把PIL文档加载到esseract中，而不需要把文档保存到磁盘上，但是你也应该确保你不会把一个pil图像列表发送到esseract中。如果一个pdf文档包含多个页面，convert_from_path函数会生成一个pil图像列表，因此你需要将每个页面单独发送至tesseract。

import pytesseract
from pdf2image import convert_from_path
import cv2, numpy

def pil_to_cv2(image):
    open_cv_image = numpy.array(image)
    return open_cv_image[:, :, ::-1].copy()

doc = convert_from_path(path)

for page_number, page_data in enumerate(doc):
    cv_h= pil_to_cv2(page_data)
    img_header = cv_h[:160,:]
    print(f"{page_number} - {pytesseract.image_to_string(Image.fromarray(img_header))}")

cv2直接转为魔方，无需保存。

问题描述投票：0回答：1

1个回答

pytesseract.image_to_string() 输入格式

如果你真的不想用PIL!

最新问题

cv2直接转为魔方，无需保存。

问题描述 投票：0回答：1

1个回答

pytesseract.image_to_string() 输入格式

如果你真的不想用PIL!

最新问题

问题描述投票：0回答：1