我正在使用pytesseract来识别文本,如下所示
td = pytesseract.image_to_data(img, output_type=Output.DICT)
tn_boxes = len(td['level'])
for o in range(0, tn_boxes):
text = td['text'][o]
print(text)
我只是使用简单的逻辑Examples
来索引detect keyword 'Example no.' find it's end point keyword 'Sol.' and put a piece of image from keyword 'Example no.' to keyword 'Sol.' into index and then find next example and so on
但是当我尝试跟随图像时然后显示输出SET THEORY ae . . 5 (6) Let A = {x: x isa negative odd integer} = {-1,-3,-5,-7,
...等看看它如何无法识别第一行Sol. (a) Let A={x:x is a natural number
.. etc。当我尝试以下图像没有水平线时它很好用。
有时,当我们将某些图像放置在文本上方或其他较大尺寸的文本上方时,pytesseract无法检测到该较大对象下方的文本。
例如它显示输出usually denoted by o(G). ors a a {= 7 Wave =e () oe that the set of ae | group usual ition of integers.
看看它没有检测到关键字Example 1.
跟随图像
但是当我尝试跟随图像时它显示输出usually denoted by o(G). Example 1. (2) Prove that th . group under usual addition of integers,
现在它正在检测关键字Example 1.