从图像中提取文本

Question

我正在研究从图像中提取文本。

最初图像是彩色的，文本是白色的，在进一步处理图像时，文本显示为黑色，其他像素显示为白色（有一些噪音），这是一个示例：

现在，当我尝试使用 pytesseract (tesseract) 对其进行 OCR 时，我仍然没有收到任何文本。

有什么解决方案可以从彩色图像中提取文本吗？

Answer 1

from PIL import Image
import pytesseract
import argparse
import cv2

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True, help="Path to the image")
args = vars(ap.parse_args())

# load the image and convert it to grayscale
image = cv2.imread(args["image"])
cv2.imshow("Original", image)

# Apply an "average" blur to the image

blurred = cv2.blur(image, (3,3))
cv2.imshow("Blurred_image", blurred)
img = Image.fromarray(blurred)
text = pytesseract.image_to_string(img, lang='eng')
print (text)
cv2.waitKey(0)

结果我得到=“住宿：在 Overwoter 平房 $3»”

使用轮廓并从中去除不必要的斑点怎么样？可能有用

Answer 2

试试这个 -

import os
from PIL import Image
import cv2
import pytesseract
import ftfy
import uuid

filename = 'uTGi5.png'
image = cv2.imread(os.path.join(filename))
gray = cv2.threshold(image, 200, 255, cv2.THRESH_BINARY)[1]
gray = cv2.resize(gray, (0, 0), fx=3, fy=3)
gray = cv2.medianBlur(gray, 9)
filename = str(uuid.uuid4())+".jpg"
cv2.imwrite(os.path.join(
    filename), gray)
config = ("-l eng --oem 3 --psm 11")
text = pytesseract.image_to_string(Image.open(os.path.join(
    filename)), config=config)
text = ftfy.fix_text(text)
text = ftfy.fix_encoding(text)
text = text.replace('-\n', '')
print(text)

Answer 3

Tesseract 是一个开源 OCR 引擎，自然在质量方面存在一些限制。如果您无法获得预期的准确性，您可以考虑尝试商业 OCR 引擎。例如 Amazon Textract。

要使用 Amazon Textract，我建议使用

pip install amazon-textract-textractor

包。

您可以像这样调用 DetectText API：

from textractor import Textractor
from textractor.data.constants import TextractFeatures
extractor = Textractor(profile_name="default")
document = extractor.detect_document_text(
    file_source="./uTGi5.png"
)
document.visualize()

例如，您可以获得这样的文本：

print('\n'.join([l.text for l in document.lines]))

Stay in
an Overwater
Bungalow

从图像中提取文本

问题描述投票：0回答：3

3个回答

最新问题

从图像中提取文本

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3