pytesseract 无法正确识别负数图像

问题描述 投票:0回答:1

我有下面的图片,编号为-1.49。问题是这在屏幕上已经是一个非常小的数字,所以它是像素化的。

关于如何改进这个问题有什么建议吗?正确读取这个数字非常重要 the image I am using

from PIL import Image
import pytesseract
 # Open an image file
    pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
    image_path = 'large_scale_numbers.png'
    img = Image.open(image_path)

    # Use Tesseract OCR to extract text
    custom_config = r'--oem 3 --psm 8 -c tessedit_char_whitelist=0123456789.,-'
    text = pytesseract.image_to_string(img,  lang='eng', config=custom_config)

    # Print the extracted text
    print("Extracted Text:", text)

退货 提取文本:41.49

python python-tesseract
1个回答
0
投票

您的代码为我提供了图像的空字符串

但是当我调整它的大小时

x2
然后我得到正确的结果 - 但对于不同的
psm

img = Image.open(image_path)

w, h = img.size

img = img.resize((w*2, h*2), Image.Resampling.NEAREST)

我用于测试的完整工作代码:

from PIL import Image
import pytesseract
 # Open an image file
#pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
image_path = 'image.png'

img = Image.open(image_path)
w, h = img.size
print(w, h)
img = img.resize((w*2, h*2), Image.Resampling.NEAREST)

# Use Tesseract OCR to extract text
for psm in range(0, 14):
    try:
        custom_config = fr'--oem 3 --psm {psm} -c tessedit_char_whitelist=0123456789.,-'
        text = pytesseract.image_to_string(img,  lang='eng', config=custom_config)
        text = text.strip()
        # Print the extracted text
        print(f"{psm:3} | Extracted Text:", text)
    except Exception as ex:
        print(f"{psm:3} | Exception:", ex)
© www.soinside.com 2019 - 2024. All rights reserved.