以Python列表格式提取图像中的删除线文本

Question

我的任务是从图像中识别并提取带有删除线符号的文本。我只想选择具有此符号的单词并将每个实例放入列表中。

我尝试过的代码：

from PIL import Image
import pytesseract

# Open the image file
img_path = 'path/to/image.png'
img = Image.open(img_path)

# Use tesseract to do OCR on the image
text = pytesseract.image_to_string(img)

text

问题是输出包含所有没有删除线符号的单词。如果字符串包含一个删除单词或短语的指示符，例如“-”，那么我可以进一步处理它；然而，常规 pytesseract 不会检测到该图像中的删除线。

需要更好的方法。

输出示例：

['Once upon a time', 'Jack', 'village']

Answer 1

通过查看置信区间提取单词取得了部分成功，尽管删除线也会造成不准确。这可以通过查看边界框并使用 openCV 之类的工具来清理删除线来改善。

# Open the image file
img_path = 'path/KrDdO.png'
img = Image.open(img_path)

# Use tesseract to do OCR on the image
text = pytesseract.image_to_data(img, output_type = 'dict')

for word, conf in zip(text['text'], text['conf']):
    if 0 < conf < 93:
        print(word, conf)

输出：

Onceupon-atime, 72
Jaek 91
viage 31

以Python列表格式提取图像中的删除线文本

问题描述投票：0回答：1

1个回答

最新问题

以Python列表格式提取图像中的删除线文本

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1