Python 的 Tesseract - 奇怪的 OCR 结果 - 转换数字行但无法转换每个单独的数字

问题描述 投票:0回答:1

在 python 中使用 Tesseract 并得到不一致的 OCR 结果 - 它可以正确转换数字行,但无法转换每个单独的数字。

例如,对于此图像 1 Tesseract 给出了正确的结果(p0 是图像 1)(https://i.stack.imgur.com/hAn9q.png):

pytesseract.image_to_string(p0, config=options) #p0 是图像 https://i.stack.imgur.com/hAn9q.png '431659 '

但是对于每个只有数字的子单元格(图像2)(https://i.stack.imgur.com/MxBaK.png)它给出空结果:

pytesseract.image_to_string(p1, config=options) #p1 是第一个数字为 4 的图像 2 ''

我不知道如何得到正确的结果。这是一个非常清晰且简单的数字图像,Tesseract 无法对其进行 OCR,尽管当它是一行数字的图像时它很容易做到这一点。

我尝试重新缩放单个单元格,但 Tesseract 就是不喜欢单个数字。

python ocr tesseract
1个回答
0
投票

我用不同的

Page segmentation modes
测试了你的图像,两者都适用于
8
9
10
13


完整的工作代码:

from PIL import Image, ImageOps
import pytesseract

for number in range(1, 3):
    image_path = f'0{number}.png'

    print(f'--- {image_path} ---') 
    
    img = Image.open(image_path)

    # Use Tesseract OCR to extract text
    for psm in range(0, 14):
        try:
            custom_config = fr'--oem 3 --psm {psm} -c tessedit_char_whitelist=0123456789.,-'
            text = pytesseract.image_to_string(img,  lang='eng', config=custom_config)
            text = text.strip()  # remove `new line` at the end of text
            # Print the extracted text
            print(f"{psm:3} | Extracted Text:", text)
        except Exception as ex:
            print(f"{psm:3} | Exception:", ex)

结果:

--- 01.png ---

  0 | Exception: (1, 'Warning, detects only orientation with -l eng Tesseract Open Source OCR Engine v4.1.1 with Leptonica Warning: Invalid resolution 0 dpi. Using 70 instead. Warning. Invalid resolution 0 dpi. Using 70 instead. Too few characters. Skipping this page Error during processing.')
  1 | Extracted Text: 
  2 | Exception: (1, 'Warning: Invalid resolution 0 dpi. Using 70 instead.')
  3 | Extracted Text: 
  4 | Extracted Text: 
  5 | Extracted Text: 
  6 | Extracted Text: 
  7 | Extracted Text: 
  8 | Extracted Text: 431659
  9 | Extracted Text: 431659
 10 | Extracted Text: 431659
 11 | Extracted Text: 
 12 | Extracted Text: 
 13 | Extracted Text: 431659

--- 02.png ---

  0 | Exception: (1, 'Warning, detects only orientation with -l eng Tesseract Open Source OCR Engine v4.1.1 with Leptonica Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating resolution as 888 Too few characters. Skipping this page Warning. Invalid resolution 0 dpi. Using 70 instead. Too few characters. Skipping this page Error during processing.')
  1 | Extracted Text: 
  2 | Exception: (1, 'Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating resolution as 888 Empty page!!')
  3 | Extracted Text: 
  4 | Extracted Text: 
  5 | Extracted Text: 
  6 | Extracted Text: 4
  7 | Extracted Text: 4
  8 | Extracted Text: 4
  9 | Extracted Text: 4
 10 | Extracted Text: 4
 11 | Extracted Text: 
 12 | Extracted Text: 
 13 | Extracted Text: 4
© www.soinside.com 2019 - 2024. All rights reserved.