有没有办法可以检测图像方向并将图像旋转到直角?

问题描述 投票:0回答:4

我正在制作一个修复扫描文档的脚本,我现在需要一种方法来检测图像方向并旋转图像,以便其旋转正确。

现在我的脚本不可靠而且不够精确。

现在我寻找一条线,它会旋转它正确看到的第一条线,但这几乎不起作用,除了一些图像

img_before = cv2.imread('rotated_377.jpg')

img_gray = cv2.cvtColor(img_before, cv2.COLOR_BGR2GRAY)
img_edges = cv2.Canny(img_gray, 100, 100, apertureSize=3)
lines = cv2.HoughLinesP(img_edges, 1, math.pi / 180.0, 100, minLineLength=100, maxLineGap=5)

angles = []

for x1,y1,x2,y2 in lines[0]:
    angle = math.degrees(math.atan2(y2 - y1, x2 - x1))
    angles.append(angle)

median_angle = np.median(angles)
img_rotated = ndimage.rotate(img_before, median_angle)

print("Angle is {}".format(median_angle))
cv2.imwrite('rotated.jpg', img_rotated)

我想制作一个脚本来获取像这样的图像(不要介意该图像用于测试目的)

并以正确的方式旋转它,这样我就可以获得正确方向的图像。

python image-processing
4个回答
21
投票

这是一个有趣的问题,我尝试了很多方法来纠正文档图像的方向,但所有方法都有不同的例外。 我正在分享一种基于文本方向的方法。对于文本区域检测,我使用输入图像的梯度图。

所有其他实现细节都在代码中注释。

请注意,这仅在图像中存在的所有文本具有相同方向时才有效。

#Document image orientation correction
#This approach is based on text orientation

#Assumption: Document image contains all text in same orientation

import cv2
import numpy as np

debug = True

#Display image
def display(img, frameName="OpenCV Image"):
    if not debug:
        return
    h, w = img.shape[0:2]
    neww = 800
    newh = int(neww*(h/w))
    img = cv2.resize(img, (neww, newh))
    cv2.imshow(frameName, img)
    cv2.waitKey(0)

#rotate the image with given theta value
def rotate(img, theta):
    rows, cols = img.shape[0], img.shape[1]
    image_center = (cols/2, rows/2)
    
    M = cv2.getRotationMatrix2D(image_center,theta,1)

    abs_cos = abs(M[0,0])
    abs_sin = abs(M[0,1])

    bound_w = int(rows * abs_sin + cols * abs_cos)
    bound_h = int(rows * abs_cos + cols * abs_sin)

    M[0, 2] += bound_w/2 - image_center[0]
    M[1, 2] += bound_h/2 - image_center[1]

    # rotate orignal image to show transformation
    rotated = cv2.warpAffine(img,M,(bound_w,bound_h),borderValue=(255,255,255))
    return rotated


def slope(x1, y1, x2, y2):
    if x1 == x2:
        return 0
    slope = (y2-y1)/(x2-x1)
    theta = np.rad2deg(np.arctan(slope))
    return theta


def main(filePath):
    img = cv2.imread(filePath)
    textImg = img.copy()

    small = cv2.cvtColor(textImg, cv2.COLOR_BGR2GRAY)

    #find the gradient map
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3))
    grad = cv2.morphologyEx(small, cv2.MORPH_GRADIENT, kernel)

    display(grad)

    #Binarize the gradient image
    _, bw = cv2.threshold(grad, 0.0, 255.0, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
    display(bw)

    #connect horizontally oriented regions
    #kernal value (9,1) can be changed to improved the text detection
    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (9, 1))
    connected = cv2.morphologyEx(bw, cv2.MORPH_CLOSE, kernel)
    display(connected)

    # using RETR_EXTERNAL instead of RETR_CCOMP
    # _ , contours, hierarchy = cv2.findContours(connected.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
    contours, hierarchy = cv2.findContours(connected.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE) #opencv >= 4.0



    mask = np.zeros(bw.shape, dtype=np.uint8)
    #display(mask)
    #cumulative theta value
    cummTheta = 0
    #number of detected text regions
    ct = 0
    for idx in range(len(contours)):
        x, y, w, h = cv2.boundingRect(contours[idx])
        mask[y:y+h, x:x+w] = 0
        #fill the contour
        cv2.drawContours(mask, contours, idx, (255, 255, 255), -1)
        #display(mask)
        #ratio of non-zero pixels in the filled region
        r = float(cv2.countNonZero(mask[y:y+h, x:x+w])) / (w * h)

        #assume at least 45% of the area is filled if it contains text
        if r > 0.45 and w > 8 and h > 8:
            #cv2.rectangle(textImg, (x1, y), (x+w-1, y+h-1), (0, 255, 0), 2)

            rect = cv2.minAreaRect(contours[idx])
            box = cv2.boxPoints(rect)
            box = np.int0(box)
            cv2.drawContours(textImg,[box],0,(0,0,255),2)

            #we can filter theta as outlier based on other theta values
            #this will help in excluding the rare text region with different orientation from ususla value 
            theta = slope(box[0][0], box[0][1], box[1][0], box[1][1])
            cummTheta += theta
            ct +=1 
            #print("Theta", theta)
            
    #find the average of all cumulative theta value
    orientation = cummTheta/ct
    print("Image orientation in degress: ", orientation)
    finalImage = rotate(img, orientation)
    display(textImg, "Detectd Text minimum bounding box")
    display(finalImage, "Deskewed Image")

if __name__ == "__main__":
    filePath = 'D:\data\img6.jpg'
    main(filePath)

这是检测到文本区域的图像,从中我们可以看到一些文本区域丢失了。文本方向检测在整个文档方向检测中起着关键作用,因此根据文档类型,应该在文本检测算法中进行一些小的调整,以使这种方法更好地工作。

这是方向正确的最终图像

请对此方法提出修改建议,以使其更加稳健。


5
投票

当包含多行文本的文档对齐良好时,图像的水平直方图应产生类似方波的图案,该图案清楚地显示文本行与其之间的空白空间分开的位置。相比之下,如果图像仅轻微旋转,水平直方图就会明显模糊。

此 Python 脚本通过测量一定角度范围内水平直方图的清晰度来对齐图像。它将每个角度与其紧邻的角度进行比较。

import cv2
import numpy as np

# Rotates an image
def rotate_image(image: np.ndarray, angle: float) -> np.ndarray:
    mean_pixel = np.median(np.median(image, axis=0), axis=0)
    image_center = tuple(np.array(image.shape[1::-1]) / 2)
    rot_mat = cv2.getRotationMatrix2D(image_center, angle, 1.0)
    result = cv2.warpAffine(image, rot_mat, image.shape[1::-1], flags=cv2.INTER_LINEAR, borderMode=cv2.BORDER_CONSTANT, borderValue=mean_pixel)
    return result

# Returns a small value if the horizontal histogram is sharp.
# Returns a large value if the horizontal histogram is blurry.
def eval_image(image: np.ndarray) -> float:
    hist = np.sum(np.mean(image, axis=1), axis=1)
    bef = 0
    aft = 0
    err = 0.
    assert(hist.shape[0] > 0)
    for pos in range(hist.shape[0]):
        if pos == aft:
            bef = pos
            while aft + 1 < hist.shape[0] and abs(hist[aft + 1] - hist[pos]) >= abs(hist[aft] - hist[pos]):
                aft += 1
        err += min(abs(hist[bef] - hist[pos]), abs(hist[aft] - hist[pos]))
    assert(err > 0)
    return err

# Measures horizontal histogram sharpness across many angles
def sweep_angles(image: np.ndarray) -> np.ndarray:
    results = np.empty((81, 2))
    for i in range(81):
        angle = (i - results.shape[0] // 2) / 4.
        rotated = rotate_image(image, angle)
        err = eval_image(rotated)
        results[i, 0] = angle
        results[i, 1] = err
    return results

# Find an angle that is a lot better than its neighbors
def find_alignment_angle(image: np.ndarray) -> float:
    best_gain = 0
    best_angle = 0.
    results = sweep_angles(image)
    for i in range(2, results.shape[0] - 2):
        ave = np.mean(results[i-2:i+3, 1])
        gain = ave - results[i, 1]
        # print('angle=' + str(results[i, 0]) + ', gain=' + str(gain))
        if gain > best_gain:
            best_gain = gain
            best_angle = results[i, 0]
    return best_angle

# input: an image that needs aligning
# output: the aligned image
def align_image(image: np.ndarray) -> np.ndarray:
    angle = find_alignment_angle(image)
    return rotate_image(image, angle)

# Do it
fixme: np.ndarray = cv2.imread('fixme.png')
cv2.imwrite('fixed.png', align_image(fixme))

1
投票

这并不是一个真正的答案,而是对主要具有水平/垂直线的图像的可能方法的建议:尝试每(比如说)0.5 度旋转图像,并且对于每次旋转,将所有扫描线相加(产生对于每个旋转值,一维总和数组,大小为 ydim)。然后查看扫描线求和的统计数据,找到使分布最大化的旋转值(最大 - 最小)。换句话说,就是扫描线总和的“最高对比度”。这应该是最好的方向。

为了提高速度,您可以使用半分辨率图像每 2 度开始,找到最佳的,然后使用全分辨率图像在该邻域中每 0.5 度重试一次。


0
投票

尚未提及的另一个解决方案是使用库

image_to_osd
中的方法
pytesseract
。此方法返回包含有关方向和脚本检测信息的结果。文档这里

使用此方法的一个解决方案如下所示:

def float_convertor(x):
    if x.isdigit():
        out= float(x)
    else:
        out= x
    return out 

img = cv2.imread(image)
k = pytesseract.image_to_osd(img)
out = {i.split(":")[0]: float_convertor(i.split(":")[-1].strip()) for i in k.rstrip().split("\n")}
img_rotated = ndimage.rotate(img, 360-out["Rotate"])

(倒数第二行解析 OSD 信息('k')并创建一个带有键值对的字典('out')。它按行分割信息,然后按冒号分割信息,并使用左侧部分作为键右边的部分作为值)

© www.soinside.com 2019 - 2024. All rights reserved.