我怎样才能教 pytesseract 它应该通过示例识别什么以使其更准确？

Question

我需要使用 pytesseract 从图像中读取文本，而且我大部分时间都可以使用我的老问题之一来做到这一点。现在的问题是 pytesseract 很难读取大数字、0 和特殊字符，如 €、$ 或 ₽

Pytesseract 经常删除 0（所以 4 000 000 是红色的 4 或 4 000），删除一些太多重复的字符（5 311 111 是红色的 531 111），将 $ 读为 8 或 5，将 ₽ 读为 2，因此即使不同的阈值。有没有办法教 pytesseract 什么应该读作 0 或 8，什么不应该使用图像示例？

我的代码：

def read_image(tresh, path, img, show=False) : #path and img has to be strings
    originalImage = cv2.imread(path+"\\"+img)
    grayImage = cv2.cvtColor(originalImage, cv2.COLOR_BGR2GRAY)
    (_, blackAndWhiteImage) = cv2.threshold(grayImage, tresh, 255, cv2.THRESH_BINARY_INV) #thresh goes from 110 to 125 to test different thresholding, 121 to 125 often works better
    text = pyt.image_to_string(blackAndWhiteImage, config="--psm 7 --oem 3 -c tessedit_char_whitelist=0123456789")
    string = str(text).replace("\n", "")
    return string

我怎样才能教 pytesseract 它应该通过示例识别什么以使其更准确？

问题描述投票：0回答：0

最新问题

我怎样才能教 pytesseract 它应该通过示例识别什么以使其更准确？

问题描述 投票：0回答：0

最新问题

问题描述投票：0回答：0