使用 Pytesseract 提高 Python 中预处理 ROI 的 OCR 准确性

Question

我正在开发一个项目，需要对屏幕上的感兴趣区域 (ROI) 进行预处理，以便使用 Pytesseract 更准确地检索文本。我已经成功捕获屏幕并根据与 OpenCV 匹配的模板定义 ROI，但我正在努力有效地预处理这些 ROI 以进行文本识别。

这是我用来预处理图像并打印 pytesseract.image_to_string() 的结果的主要函数：

### images is a dict and the frame is a numpy array containing the frames captured with pyautogui.screenshot() ###
def draw_boxes(images, frame):
    rectangle_color = (0, 255, 0)

    ### SEARCH FOR TEMPLATE ON SCREEN USING THE IMAGES FROM images.items() ###
    for name, img in images.items():

        gray_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

        result = cv2.matchTemplate(gray_frame, gray_img, cv2.TM_CCOEFF_NORMED)
        min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)

        ### IF TEMPLATE IS FOUND IN SCREEN DEFINE A REGION OF INTEREST ###
        if max_val > 0.5:
            

            ### MAKES A RECTANGLE ON CV2 OUTPUT FOR DEBUG ###
            top_left = max_loc
            bottom_right = (top_left[0] + img.shape[1], top_left[1] + img.shape[0])

            cv2.rectangle(frame, top_left, bottom_right, rectangle_color, 2)


            ### DEFINING THE REGION OF INTEREST ###
            x, y, w, h = max_loc[0], max_loc[1], img.shape[1], img.shape[0]
            roi = frame[y:y+h, x:x+w]

            ### PRE PROCESSING THE REGION OF INTEREST ###
            roi_gray = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY)
            blurred_image = cv2.medianBlur(roi_gray, 1, 5)
            denoised_image = cv2.fastNlMeansDenoising(roi_gray, None, 12, 7, 21)
            roi_processed = denoised_image
         

            ### APPLYING BINARY THRESHOLD FOR PYTESSERACT TO READ FROM ###
            #inverted_image = cv2.bitwise_not(roi_processed)
            otsu_threshold_value, thresh_image = cv2.threshold(roi_processed, 1, 255,   cv2.THRESH_BINARY + cv2.THRESH_OTSU)
    

            ### SHOW OCR VIEW FOR DEBUG AND DEVELOPMENT ###
            cv2.imshow('OCR View', thresh_image)
            cv2.waitKey(1)

            custom_config = r'--oem 3 --psm 11'
            text = pytesseract.image_to_string(thresh_image, lang='eng', config=custom_config)

            
    try:
        pass
        print(text)
    except UnboundLocalError:
        print('unbound')

这是我从 Pytesseract 获得的控制台输出的示例：

Buy Offer

Ectoplasm

Gx

:

A substance from thase passing through the Underworld.

2

2

3379

Buy limit per 4 hours: 25,000

Quantity

Your price per item:

25,000

**bais**

85,375,000 coins

You have bought | so far for 3,415 coins.

125,000

我注意到有时像“3,415”这样的数字会被 Pytesseract 误读为“bais”。下面的图像显示了预处理之前和之后的 ROI。我唯一能想到的就是让处理后的图像更清晰，但不知道如何实现。

-

我正在开发一个项目，需要对屏幕上的感兴趣区域 (ROI) 进行预处理，以便使用 Pytesseract 更准确地检索文本。我已经成功捕获屏幕并根据与 OpenCV 匹配的模板定义 ROI，但我正在努力有效地预处理这些 ROI 以进行文本识别。

Answer 1

好吧，在尝试了去噪、扩张、锐化之后，你能想到的都可以。我尝试调整传入图像的大小：

pytesseract.image_to_string()

我添加了一些模糊，它似乎可以非常准确地读取 ROI，这是新的控制台输出：

我觉得这使它更加可靠，但我还有更多测试要做。

总结一下。如果图像中的文本非常小，请在这种情况下调整图像或 ROI 的大小。

我是这样做的：

        scale_factor = 3
        width = int(roi.shape[1] * scale_factor)
        height = int(roi.shape[0] * scale_factor)
        dim = (width, height)

        resized_roi = cv2.resize(thresh_image, dim, interpolation=cv2.INTER_LINEAR)
        blurred_roi = cv2.medianBlur(resized_roi, 3)

        custom_config = r'--oem 3 --psm 6'
        text = pytesseract.image_to_string(blurred_roi, lang='eng', config=custom_config)

这使得整个图像更大，同时也使得上面的文本更大。模糊是可选的。

使用 Pytesseract 提高 Python 中预处理 ROI 的 OCR 准确性

问题描述投票：0回答：1

1个回答

最新问题

使用 Pytesseract 提高 Python 中预处理 ROI 的 OCR 准确性

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1