pytesseract 无法正确识别负数图像

Question

我有下面的图片，编号为-1.49。问题是这在屏幕上已经是一个非常小的数字，所以它是像素化的。

关于如何改进这个问题有什么建议吗？正确读取这个数字非常重要

from PIL import Image
import pytesseract
 # Open an image file
    pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
    image_path = 'large_scale_numbers.png'
    img = Image.open(image_path)

    # Use Tesseract OCR to extract text
    custom_config = r'--oem 3 --psm 8 -c tessedit_char_whitelist=0123456789.,-'
    text = pytesseract.image_to_string(img,  lang='eng', config=custom_config)

    # Print the extracted text
    print("Extracted Text:", text)

退货提取文本：41.49

Answer 1

您的代码为我提供了图像的空字符串

但是当我调整它的大小时

x2

然后我得到正确的结果 - 但对于不同的

psm

img = Image.open(image_path)

w, h = img.size

img = img.resize((w*2, h*2), Image.Resampling.NEAREST)

我用于测试的完整工作代码：

from PIL import Image
import pytesseract
 # Open an image file
#pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
image_path = 'image.png'

img = Image.open(image_path)
w, h = img.size
print(w, h)
img = img.resize((w*2, h*2), Image.Resampling.NEAREST)

# Use Tesseract OCR to extract text
for psm in range(0, 14):
    try:
        custom_config = fr'--oem 3 --psm {psm} -c tessedit_char_whitelist=0123456789.,-'
        text = pytesseract.image_to_string(img,  lang='eng', config=custom_config)
        text = text.strip()
        # Print the extracted text
        print(f"{psm:3} | Extracted Text:", text)
    except Exception as ex:
        print(f"{psm:3} | Exception:", ex)

pytesseract 无法正确识别负数图像

问题描述投票：0回答：1

1个回答

最新问题

pytesseract 无法正确识别负数图像

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1