从图像中提取表格

问题描述 投票:0回答:1

我一直在尝试使用 img2table 和 Tesseract 提取表格,但无论我使用不同的参数,我总是得不到提取的表格。为什么?如何从此类图像中成功提取表格?


from img2table.ocr import TesseractOCR
from img2table.document import Image

# Instantiation of OCR
ocr = TesseractOCR(n_threads=1, lang="eng")

# Instantiation of document, either an image or a PDF
src = "table.png"
doc = Image(src)

# Table extraction
extracted_tables = doc.extract_tables(ocr=ocr,
                                      implicit_rows=True,
                                      borderless_tables=True,
                                      min_confidence=50)

In [2]: extracted_tables
Out[2]: []

图片是:

table

ocr tesseract python-tesseract img2table
1个回答
0
投票

我收到此错误:

TypeError: Image.extract_tables() got an unexpected keyword argument 'borderless_tables'

这是来自 extract_table 的工具提示(在 VSCode 中显示),它确实似乎仅指示三个参数:

(method) def extract_tables(
    ocr: Any = None,
    implicit_rows: bool = True,
    min_confidence: int = 50
) -> List[ExtractedTable]
Extract tables from document
:param ocr: OCRInstance object used to extract table content
:param implicit_rows: boolean indicating if implicit rows are splitted
:param min_confidence: minimum confidence level from OCR in order to process text, from 0 (worst) to 99 (best)
:return: list of extracted tables

使用的版本(Python 3.12.1):

D:\TEMP\python\i2t>pip list | findstr "opencv tesseract img2table"
img2table       0.0.12
opencv-python   4.9.0.80
tesseract       0.1.3
© www.soinside.com 2019 - 2024. All rights reserved.