跑步时
import cv2
import numpy as np
import pytesseract
pytesseract.pytesseract.tesseract_cmd = '/usr/local/bin/tesseract'
img = cv2.imread('some.png')
h, w, c = img.shape
boxes = pytesseract.image_to_boxes(img)
我得到以下堆栈跟踪:
File "/Users/thomaskilian/Documents/pytess.py", line 9, in <module>
boxes = pytesseract.image_to_boxes(img)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytesseract/pytesseract.py", line 491, in image_to_boxes
}[output_type]()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytesseract/pytesseract.py", line 490, in <lambda>
Output.STRING: lambda: run_and_get_output(*args),
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytesseract/pytesseract.py", line 290, in run_and_get_output
with open(filename, 'rb') as output_file:
builtins.FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/m3/26h8sdk11p7731577hpllh900000gn/T/tess_wayjir39.box'
我在 pytesseract.py 中追踪到了
run_tesseract(**kwargs)
filename = f"{kwargs['output_filename_base']}{extsep}{extension}"
with open(filename, 'rb') as output_file:
在
run
之后,输出位于temp var
文件夹中。但是就在open
处,文件不见了。看起来临时文件有点太临时了。有什么办法吗?
看来问题可能是 Pytesseract 的临时文件在 open() 函数可以访问它之前被擦除得太早了。
将 output_filename_base 参数设置为特定文件路径作为让 Pytesseract 生成临时文件的替代方法是一种尝试的选择。例如,您可以将代码更改为如下所示:
import cv2
import numpy as np
import pytesseract
import tempfile
pytesseract.pytesseract.tesseract_cmd = '/usr/local/bin/tesseract'
img = cv2.imread('some.png')
h, w, c = img.shape
with tempfile.NamedTemporaryFile(suffix='.box') as tf:
boxes = pytesseract.image_to_boxes(img,
output_type=pytesseract.Output.BYTES, output_filename_base=tf.name)
print(boxes.decode())
tempfile 模块中的NamedTemporaryFile 方法在此代码中用于创建具有提供的文件扩展名(在本例中为.box)的临时文件。 with 语句确保文件在不再需要时自动销毁。
在image_to_boxes()生成的bytes对象上使用decode()方法,然后您可以访问临时文件的内容。