我有下面的图片,编号为-1.49。问题是这在屏幕上已经是一个非常小的数字,所以它是像素化的。
from PIL import Image
import pytesseract
# Open an image file
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
image_path = 'large_scale_numbers.png'
img = Image.open(image_path)
# Use Tesseract OCR to extract text
custom_config = r'--oem 3 --psm 8 -c tessedit_char_whitelist=0123456789.,-'
text = pytesseract.image_to_string(img, lang='eng', config=custom_config)
# Print the extracted text
print("Extracted Text:", text)
退货 提取文本:41.49
您的代码为我提供了图像的空字符串
但是当我调整它的大小时
x2
然后我得到正确的结果 - 但对于不同的psm
img = Image.open(image_path)
w, h = img.size
img = img.resize((w*2, h*2), Image.Resampling.NEAREST)
我用于测试的完整工作代码:
from PIL import Image
import pytesseract
# Open an image file
#pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
image_path = 'image.png'
img = Image.open(image_path)
w, h = img.size
print(w, h)
img = img.resize((w*2, h*2), Image.Resampling.NEAREST)
# Use Tesseract OCR to extract text
for psm in range(0, 14):
try:
custom_config = fr'--oem 3 --psm {psm} -c tessedit_char_whitelist=0123456789.,-'
text = pytesseract.image_to_string(img, lang='eng', config=custom_config)
text = text.strip()
# Print the extracted text
print(f"{psm:3} | Extracted Text:", text)
except Exception as ex:
print(f"{psm:3} | Exception:", ex)