在 python 中使用 Tesseract 并得到不一致的 OCR 结果 - 它可以正确转换数字行,但无法转换每个单独的数字。
例如,对于此图像 1 Tesseract 给出了正确的结果(p0 是图像 1)():
pytesseract.image_to_string(p0, config=options) #p0 是图像 '431659 '
pytesseract.image_to_string(p1, config=options) #p1 是第一个数字为 4 的图像 2 ''
我不知道如何得到正确的结果。这是一个非常清晰且简单的数字图像,Tesseract 无法对其进行 OCR,尽管当它是一行数字的图像时它很容易做到这一点。
我尝试重新缩放单个单元格,但 Tesseract 就是不喜欢单个数字。
我用不同的
Page segmentation modes
测试了你的图像,两者都适用于 8
、9
、10
、13
完整的工作代码:
from PIL import Image, ImageOps
import pytesseract
for number in range(1, 3):
image_path = f'0{number}.png'
print(f'--- {image_path} ---')
img = Image.open(image_path)
# Use Tesseract OCR to extract text
for psm in range(0, 14):
try:
custom_config = fr'--oem 3 --psm {psm} -c tessedit_char_whitelist=0123456789.,-'
text = pytesseract.image_to_string(img, lang='eng', config=custom_config)
text = text.strip() # remove `new line` at the end of text
# Print the extracted text
print(f"{psm:3} | Extracted Text:", text)
except Exception as ex:
print(f"{psm:3} | Exception:", ex)
结果:
--- 01.png ---
0 | Exception: (1, 'Warning, detects only orientation with -l eng Tesseract Open Source OCR Engine v4.1.1 with Leptonica Warning: Invalid resolution 0 dpi. Using 70 instead. Warning. Invalid resolution 0 dpi. Using 70 instead. Too few characters. Skipping this page Error during processing.')
1 | Extracted Text:
2 | Exception: (1, 'Warning: Invalid resolution 0 dpi. Using 70 instead.')
3 | Extracted Text:
4 | Extracted Text:
5 | Extracted Text:
6 | Extracted Text:
7 | Extracted Text:
8 | Extracted Text: 431659
9 | Extracted Text: 431659
10 | Extracted Text: 431659
11 | Extracted Text:
12 | Extracted Text:
13 | Extracted Text: 431659
--- 02.png ---
0 | Exception: (1, 'Warning, detects only orientation with -l eng Tesseract Open Source OCR Engine v4.1.1 with Leptonica Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating resolution as 888 Too few characters. Skipping this page Warning. Invalid resolution 0 dpi. Using 70 instead. Too few characters. Skipping this page Error during processing.')
1 | Extracted Text:
2 | Exception: (1, 'Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating resolution as 888 Empty page!!')
3 | Extracted Text:
4 | Extracted Text:
5 | Extracted Text:
6 | Extracted Text: 4
7 | Extracted Text: 4
8 | Extracted Text: 4
9 | Extracted Text: 4
10 | Extracted Text: 4
11 | Extracted Text:
12 | Extracted Text:
13 | Extracted Text: 4