使用 Tesseract 无法识别图像中的单个字符

Question

我尝试从附图中提取号码

[ Image having number

但是我没有得到数字 8 作为输出。我尝试了不同的 PSM 值，例如 6、10 等。

这是我到目前为止所拥有的：

image = cv2.imread(image_path)
if(image is not None):
# Convert the image to grayscale
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    # Use OCR to extract text from the image
    extracted_text = pytesseract.image_to_string(gray, config='--psm 10 -c tessedit_char_whitelist=0123456789')

Answer 1

尽管图像对于 OCR 来说看起来不错，但垂直线存在一些阴影，这对检测不利。我做了一些阈值，最终得到了这个图像：

我将其输入超立方体，得到“8”：

import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = "C:/Program Files/Tesseract-OCR/tesseract.exe"
im = cv2.imread("8.png") # read
b,g,r = cv2.split(im) # split
mask = (b>200)*(r<200)*(g<200) # threshold
text = pytesseract.image_to_string(mask, config='-l eng --psm 10') # use 
print(text) # print, results is "8"

当然，如果涉及到其他颜色，这就会失败。如果您遇到这种情况，您可以发布更多图片以便我调整代码吗？

使用 Tesseract 无法识别图像中的单个字符

问题描述投票：0回答：1

1个回答

最新问题

使用 Tesseract 无法识别图像中的单个字符

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1