从图像中提取表格

Question

我一直在尝试使用 img2table 和 Tesseract 提取表格，但无论我使用不同的参数，我总是得不到提取的表格。为什么？如何从此类图像中成功提取表格？


from img2table.ocr import TesseractOCR
from img2table.document import Image

# Instantiation of OCR
ocr = TesseractOCR(n_threads=1, lang="eng")

# Instantiation of document, either an image or a PDF
src = "table.png"
doc = Image(src)

# Table extraction
extracted_tables = doc.extract_tables(ocr=ocr,
                                      implicit_rows=True,
                                      borderless_tables=True,
                                      min_confidence=50)

In [2]: extracted_tables
Out[2]: []

图片是：

table

Answer 1

我收到此错误：

TypeError: Image.extract_tables() got an unexpected keyword argument 'borderless_tables'

这是来自 extract_table 的工具提示（在 VSCode 中显示），它确实似乎仅指示三个参数：

(method) def extract_tables(
    ocr: Any = None,
    implicit_rows: bool = True,
    min_confidence: int = 50
) -> List[ExtractedTable]
Extract tables from document
:param ocr: OCRInstance object used to extract table content
:param implicit_rows: boolean indicating if implicit rows are splitted
:param min_confidence: minimum confidence level from OCR in order to process text, from 0 (worst) to 99 (best)
:return: list of extracted tables

使用的版本（Python 3.12.1）：

D:\TEMP\python\i2t>pip list | findstr "opencv tesseract img2table"
img2table       0.0.12
opencv-python   4.9.0.80
tesseract       0.1.3

从图像中提取表格

问题描述投票：0回答：1

1个回答

最新问题

从图像中提取表格

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1