我在代码中遇到了 pytesseract 问题。我已尽力使代码尽可能易于阅读,但如果有任何不合理的地方,请告诉我。问题似乎出在一个小片段上,特别是
extract_gravnumbers_from_red
函数。
令人惊讶的是,
extract_gravnumbers_from_yellow
功能运行得非常好,可以正确读取文本。有关我已尝试解决该问题的其他信息将在下一节中提供。
完整代码如下:
import cv2
import pytesseract
# Enter the path to the Tesseract OCR executable
pytesseract.pytesseract.tesseract_cmd = r'path_to_teseract\Tesseract\tesseract.exe'
# Function to extract the grave numbers from a selected region
def extract_gravnumbers_from_yellow(region):
# Use Tesseract OCR to get the text from the region
grave_numbers_text = pytesseract.image_to_string(region, config='--psm 6')
# Transforming the OCR result into a list of grave numbers
grave_numbers_list = [int(num) for num in grave_numbers_text.split() if num.isdigit()]
try:
# If there is more than one grave number, return the smallest one
if len(grave_numbers_list) > 1:
min_grave_number = min(grave_numbers_list)
print(min_grave_number)
else:
print("No Value")
min_grave_number = None
except ValueError:
# If the list is empty, set None.
min_grave_number = None
print("List Empty")
# Return min_grav_number
return min_grave_number
def extract_gravnumbers_from_red(region):
# Use Tesseract OCR to get the text from the region
grave_numbers_text = pytesseract.image_to_string(region, config='--psm 6')
# Transforming the OCR result into a list of grave numbers
grave_numbers_list = [int(num) for num in grave_numbers_text.split() if num.isdigit()]
try:
# If there is more than one grave number, return the smallest one
if len(grave_numbers_list) > 1:
min_grave_number = min(grave_numbers_list)
print(min_grave_number)
else:
print("No Value")
min_grave_number = None
except ValueError:
# If the list is empty, set None.
min_grave_number = None
print("List Empty")
# Return min_grave_number
return min_grave_number
def get_red_grave_numbers(red_color, hsv_image, image, grav_numbers_list):
# Define the color range of the red markings in the HSV space.
lower_red, upper_red = red_color
mask_red = cv2.inRange(hsv_image, lower_red, upper_red)
# Find the contours of the red markings
red_contours, _ = cv2.findContours(mask_red, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Iterate over the found contours of the red markings
for red_contour in red_contours:
# Draw a rectangle around the red mark (only to show the result).
x, y, w, h = cv2.boundingRect(red_contour)
cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 255), 2)
# Extract the grave number from the red marker
red_region = image[y:y + h, x:x + w]
# Extract the grave number from the red marker and add it to the list.
min_grav_number = extract_gravnumbers_from_red(red_region)
if min_grav_number is not None:
grav_numbers_list.append(min_grav_number)
return grav_numbers_list
def get_yellow_grave_numbers(yellow_color, hsv_image, image, grav_numbers_list):
# Define the color range of the yellow markings in the HSV space.
lower_yellow, upper_yellow = yellow_color
mask_yellow = cv2.inRange(hsv_image, lower_yellow, upper_yellow)
# Find the contours of the yellow markings
yellow_contours, _ = cv2.findContours(mask_yellow, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Iterate over the found contours of the yellow markings
for yellow_contour in yellow_contours:
# Draw a rectangle around the yellow mark (only to show the result).
x, y, w, h = cv2.boundingRect(yellow_contour)
cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 255), 2)
# Extract the grave number from the yellow marker
yellow_region = image[y:y + h, x:x + w]
# Extract the grave number from the red marker and add it to the list.
min_grav_number = extract_gravnumbers_from_yellow(yellow_region)
if min_grav_number is not None:
grav_numbers_list.append(min_grav_number)
return grav_numbers_list
def check_double_number(filename, kvarter):
# Load the image
image = cv2.imread(filename)
# Print image dimensions to check if the image is loaded correctly
print("Image shape:", image.shape)
print(f"Kvarter: {kvarter}")
# Yellow color HSV Color Code
yellow_color = ((35, 249, 245), (35, 249, 245))
# Red color HSV Color Code
red_color = ((7, 208, 190), (7, 208, 190))
# Convert the image to the correct color space (BGR to HSV)
hsv_image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
# Create a list to store the grave numbers
grav_numbers_list = []
# Quick Debug print
print("\nYellow mark is being processed:\n")
grav_numbers_list = get_yellow_grave_numbers(yellow_color, hsv_image, image, grav_numbers_list)
# Quick Debug print
print(f"\nRed mark is being processed:\n")
grav_numbers_list = get_red_grave_numbers(red_color, hsv_image, image, grav_numbers_list)
print("\n")
# Display Result Window
cv2.imshow('Result', image)
cv2.waitKey(0)
cv2.destroyAllWindows()
return grav_numbers_list
def create_doublegrave_list():
kvarter = 2
doublegraveList = []
for kv in range(1, kvarter + 1):
kv = check_double_number(f"kvarter_image{kv}.png", kv)
doublegraveList.append(kv)
return doublegraveList
dubbelgravLista = create_doublegrave_list()
print(dubbelgravLista)
我不会使用块引号作为引号,而是为了改善问题的结构
看起来,导致
extract_gravnumbers_from_red
函数中无法识别文本的行可能是这一行:
grave_numbers_text = pytesseract.image_to_string(region, config='--psm 6')
尝试时:
min_grave_number = min(grave_numbers_list)
print(min_grave_number)
在
else
语句中,会引发 ValueError,因为它返回空结果,因为没有找到整数。这个问题似乎特定于读取红色区域上的两“行”文本,因为它适用于黄色区域。
在第一张图像的
grave_numbers_text
语句中打印 else
时,我得到以下输出:
Red mark is being processed:
No Value
ize
No Value
ro
No Value
nz
所以这显然是在阅读并得到一些东西,但这是不对的。尽管在第二张图片上它确实是正确的:
Red mark is being processed:
No Value
146
正确的区域仅包含一行文本。所以我的理论是,它无法读取两“行”文本,因为我缺少某种设置,再加上不同的对比度,因为它确实在黄色区域上正确显示。我知道这与图像的质量无关,因为我尝试使用更大的
kvarter_image1.png
图像来运行它。
我已经尝试过
config
参数并尝试了各种选项,例如 --psm 11
、--psm 12
、--psm 7
和 --psm 4
,但不幸的是,它没有任何区别。由于 config
参数可能不会影响对比度,这可能是主要挑战。
我已经显示了红色区域,这就是结果:
看起来应该如此。
尝试了一种替代方法,使用颜色过滤和直方图均衡,如下所示:
red_text_mask = cv2.inRange(region, (63, 161, 76), (7, 208, 190))
red_text = cv2.bitwise_and(region, region, mask=red_text_mask)
gray_text = cv2.cvtColor(red_text, cv2.COLOR_BGR2GRAY)
equalized_text = cv2.equalizeHist(gray_text)
grave_numbers_text = pytesseract.image_to_string(equalized_text, config='--psm 6')
使用的颜色代码是HSV,与
变量相同。我在这个程序中选择它得到的黑色代码:red_color
import cv2 import numpy as np def get_hsv_color(event, x, y, flags, param): if event == cv2.EVENT_LBUTTONDOWN: pixel_value = image\[y, x\] hsv_pixel_value = cv2.cvtColor(np.uint8(\[\[pixel_value\]\]), cv2.COLOR_BGR2HSV) print("HSV Value:", hsv_pixel_value) Read the image image = cv2.imread('kvarter_image1.png') Create a window to display the image cv2.namedWindow('Image') cv2.imshow('Image', image) Set the mouse callback function cv2.setMouseCallback('Image', get_hsv_color) Wait for a key event to exit cv2.waitKey(0) cv2.destroyAllWindows()
但是,即使进行了此修改,结果仍然保持不变,并且当尝试在
else
语句中打印它时,ValueError 仍然存在,因为 grave_numbers_text
变量仍为空。
阅读红底黑字似乎对 OCR 过程提出了挑战。缺乏对比度可能会妨碍文本的准确识别。或者我的配置可能有问题。任何有关提高黑色文本红色区域 OCR 性能的见解或建议将受到高度赞赏。谢谢您的协助!
tesseract
基于文献阅读模式。我建议您使用EasyOCR
,这有助于阅读现实生活中的任何类型的文本。用法非常简单我也尝试了您的示例输入图像。首先,您需要安装 EasyOCR:
pip install easyocr
这是代码和带有图像的结果。结果还包括属于以下检测到的文本的坐标。您可以检查输出并看看它有多准确:
import easyocr
reader = easyocr.Reader(['ch_sim','en'])
result = reader.readtext('/home/ubuntu/ab.png')
print(result)
结果:
[([[633, 157], [729, 157], [729, 201], [633, 201]], '123', 0.970589095852502), ([[635, 200], [735, 200], [735, 248], [635, 248]], '124', 0.9784687161445618), ([[514, 234], [612, 234] , [612, 282], [514, 282]], '125', 0.9959679843082824), ([[514, 276], [614, 276], [614, 324], [514, 324]], '120', 0.9427849510848981), ([[378, 302], [476, 302], [476, 350], [378, 350]], '127', 0.8409048914909363), ([[723, 351], [821, 351], [821, 395], [723, 395]], '166', 0.9964757966703823), ([[260, 368], [360, 368], [360, 416], [260, 416]], '128', 0.9026410027274797), ([[161, 443], [257, 443] , [257, 487], [161, 487]], '129', 0.9528453507814674), ([[595, 439], [689, 439], [689, 483], [595, 483]], '164', 0.9960849252898073), ([[6, 490], [104, 490], [104, 538], [6, 538]], '130', 0.948180079460144), ([[595, 483], [691, 483], [691, 527], [595, 527]], '465', 0.5173705816268921), ([[11, 533], [105, 533], [105, 577], [11, 577]], '131', 0.9920455440102758), ([[438, 526], [540, 526], [540, 574], [438, 574]], '-0', 0.002514158801917607), ([[333, 587], [431, 587 ], [431, 631], [333, 631]], '162', 0.9630756538371861), ([[186, 664], [295, 664], [295, 737], [186, 737]], '4', 0.01427749452419752)]