我一直在尝试使用 img2table 和 Tesseract 提取表格,但无论我使用不同的参数,我总是得不到提取的表格。为什么?如何从此类图像中成功提取表格?
from img2table.ocr import TesseractOCR
from img2table.document import Image
# Instantiation of OCR
ocr = TesseractOCR(n_threads=1, lang="eng")
# Instantiation of document, either an image or a PDF
src = "table.png"
doc = Image(src)
# Table extraction
extracted_tables = doc.extract_tables(ocr=ocr,
implicit_rows=True,
borderless_tables=True,
min_confidence=50)
In [2]: extracted_tables
Out[2]: []
图片是:
我收到此错误:
TypeError: Image.extract_tables() got an unexpected keyword argument 'borderless_tables'
这是来自 extract_table 的工具提示(在 VSCode 中显示),它确实似乎仅指示三个参数:
(method) def extract_tables(
ocr: Any = None,
implicit_rows: bool = True,
min_confidence: int = 50
) -> List[ExtractedTable]
Extract tables from document
:param ocr: OCRInstance object used to extract table content
:param implicit_rows: boolean indicating if implicit rows are splitted
:param min_confidence: minimum confidence level from OCR in order to process text, from 0 (worst) to 99 (best)
:return: list of extracted tables
使用的版本(Python 3.12.1):
D:\TEMP\python\i2t>pip list | findstr "opencv tesseract img2table"
img2table 0.0.12
opencv-python 4.9.0.80
tesseract 0.1.3