如何使用 OpenCV 和 PyTorch 重叠轮廓来检测表中的列？”

Question

我正在开发一个项目，使用 PyTorch 和 OpenCV 的深度学习模型来检测表格并从图像中提取列。虽然我可以成功检测表格并获取各个单元格的轮廓，但我在准确定义列边界方面遇到了困难。虽然我有合适的列掩码。我需要将它与表掩码一起使用以获得 ocr 来维护表的结构。

这是我的方法的总结：

预处理：调整图像大小并标准化。
模型推理：使用自定义 PyTorch 模型生成表和列掩码。
轮廓检测：使用cv2.findContours检测表格掩模中的轮廓。
过滤轮廓：根据面积过滤掉小或不相关的轮廓。
边界矩形：围绕检测到的轮廓绘制边界框。

我想通过考虑轮廓之间的空间关系来改进列检测。具体来说，我想重叠轮廓以确定列边界。如果两个轮廓沿 x 轴显着重叠，则应将它们视为同一列的一部分。

我尝试过的：

根据 x 坐标对轮廓进行排序。
根据面积过滤轮廓以去除噪声。
使用边界框可视化检测到的表格。

问题：

难以确定准确的列边界。
轮廓通常代表单个细胞，因此很难将它们分组为列。

代码：

这是我的代码的相关部分：

import cv2
import numpy as np
import torch
import torch.nn as nn
from PIL import Image
from albumentations import Compose, Normalize
from albumentations.pytorch import ToTensorV2

# Model Definition (DenseNet and TableNet classes omitted for brevity)

TRANSFORM = Compose([
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], max_pixel_value=255),
    ToTensorV2()
])

def perform_ocr(image):
    # Perform OCR using Tesseract
    image = Image.fromarray(image)
    return pytesseract.image_to_string(image, lang='eng')

def predict(img_path):
    orig_image = Image.open(img_path).resize((1024, 1024))
    test_img = np.array(orig_image.convert('LA').convert("RGB"))
    image = TRANSFORM(image=test_img)["image"]

    with torch.no_grad():
        image = image.unsqueeze(0)
        table_out, column_out = model(image)
        table_out = torch.sigmoid(table_out).detach().numpy().squeeze(0).transpose(1, 2, 0) > 0.5
        column_out = torch.sigmoid(column_out).detach().numpy().squeeze(0).transpose(1, 2, 0) > 0.5

    contours, _ = cv2.findContours(table_out.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
    contours = [c for c in contours if cv2.contourArea(c) > 3000]

    # Visualization and OCR steps omitted for brevity

    for c in contours:
        x, y, w, h = cv2.boundingRect(c)
        cv2.rectangle(orig_image, (x, y), (x+w, y+h), (0, 0, 255), 4)

    # How to overlap contours to determine columns?
    # ...

    cv2.imshow("Detected Tables", orig_image)
    cv2.waitKey(0)

# Load model and process image (model loading code omitted for brevity)

问题：

如何重叠检测到的轮廓以准确确定列边界？
在 OpenCV 中处理列检测重叠轮廓的最佳方法是什么？
是否有任何推荐的技术或最佳实践来提高表中列检测的准确性？

任何有关如何有效重叠轮廓以检测列的指导或建议将不胜感激！图像处理方面的新手，所以你可以直接回答一下！

请注意，此项目涉及“扫描文档”（如果这很重要）。

编辑：我必须说出上述要求的原因

vanilla OCR 经常错误地放置表中列的顺序，我正在尝试使用列掩码来保持正确的结构

Answer 1

我会将其作为您问题的可能解决方案发布，因为我知道我无法帮助解决您的问题，因为我看不到您的图像。正如我所说，只需检查轮廓的 X 和 Y 坐标并进行一些统计。我通常会采用按列/按行求和并检查峰值，但这会因同一图像中存在很多表而失败。所以，我决定逐个轮廓进行。

唉，乱码表：

代码是：

检测轮廓
得到X，Y
获取 X、Y 中值的计数（重复值表示单元格边框）
阈值
绘制唯一值

这是代码：

im = cv2.imread("FakeTable.png") # read im
imGray = cv2.imread("FakeTable.png", 0) # and as gray
imOTSU = cv2.threshold(imGray, 0, 255, cv2.THRESH_OTSU+cv2.THRESH_BINARY_INV)[1] # get otsu with cell as positive
contours, _ = cv2.findContours(imOTSU, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE) # find contours
plt.figure() # figure
plt.imshow(im) # show image
for c in contours: # for every contour
    if cv2.contourArea(c)>1000: # filter out letters
        X, Y = c.T # get coords
        counts = np.bincount(X[0]) # get bin counts of values in X
        for uniqueX in np.where(counts > 10)[0]: # threshold, you can think about another approach, this is not optimal
            plt.axvline(uniqueX, color = "b") # plot
        # see X for explanation
        counts = np.bincount(Y[0])
        for uniqueY in np.where(counts > 50)[0]:
            plt.axhline(uniqueY, color = "r")
plt.axis("off")
plt.tight_layout()

结果：

两张相邻的桌子，尺寸不同：

__

再次，我无法想出更好的解决方案，因为我不知道你的数据是什么样的。希望这有帮助。

如何使用 OpenCV 和 PyTorch 重叠轮廓来检测表中的列？”

问题描述投票：0回答：1

1个回答

最新问题

如何使用 OpenCV 和 PyTorch 重叠轮廓来检测表中的列？”

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1