我正在尝试用Python实现论文“数字图像处理的三次卷积插值”中的图像的双三次卷积插值。然而,我的实现看起来像一个适当的比例,但仍然与参考实现不同,我不明白为什么。这在较小的图像中尤其明显,如下图所示:
这是由 MWE 生成的图像,其中包含原始未缩放图像、我的坏双三次、opencv/skimage 双三次尺度,以及它们与我的缩放图像的差异。
这是我迄今为止在没有多重处理的情况下转换为 MWE 的代码:
import math
import time
from functools import cache
import cv2 as cv
import matplotlib.pyplot as plt
import numpy as np
import skimage
def u(s: float):
# bicubic convolution kernel aka catmull-rom spline
# the value of a here is -0.5 as that was used in Keys' version
a: float = -0.5
s = abs(s)
if 0 <= s < 1:
return (a + 2) * s**3 - (a + 3) * s**2 + 1
elif 1 <= s < 2:
return a * s**3 - 5 * a * s**2 + 8 * a * s - 4 * a
return 0
in_file = "test_sharpen.png"
ratio = 2.0
im_data = cv.imread(str(in_file))
# because plt uses rgb
im_data = cv.cvtColor(im_data, cv.COLOR_RGB2BGR)
start = time.perf_counter()
print("Scaling image...")
H, W, C = im_data.shape
# pad by 2 px
image = cv.copyMakeBorder(im_data, 2, 2, 2, 2, cv.BORDER_REFLECT)
image = image.astype(np.float64) / 255
# create new image
new_H = math.floor(H * ratio)
new_W = math.floor(W * ratio)
big_image = np.zeros((new_H, new_W, C))
for c in range(C):
for j in range(new_H):
# scale new image's coordinate to be in old image
y = j * (1 / ratio) + 2
# we separate x and y to integer and fractional parts
iy = int(y)
# ix and iy are essentially the closest original pixels
# as all the old pixels are in integer positions
# decx and decy as the fractional parts are then the distances
# to the original pixels on the left and above
decy = iy - y
for i in range(new_W):
x = i * (1 / ratio) + 2
ix = int(x)
decx = ix - x
pix = sum(
sum(
image[iy + M, ix + L, c] * u(decx + L) * u(decy + M)
for L in range(-1, 2 + 1)
)
for M in range(-1, 2 + 1)
)
# we limit results to [0, 1] because bicubic interpolation
# can produce pixel values outside the original range
big_image[j, i, c] = max(min(1, pix), 0)
big_image = (big_image * 255).astype(np.uint8)
print(f"Finished scaling in {time.perf_counter() - start} seconds")
# generate proper bicubic scales with opencv and skimage
# and compare them to my scale with plt
proper_cv = cv.resize(im_data, None, None, ratio, ratio, cv.INTER_CUBIC)
proper_skimage = skimage.util.img_as_ubyte(
skimage.transform.rescale(im_data, ratio, channel_axis=-1, order=3)
)
fig, ax = plt.subplots(nrows=4, ncols=2)
ax[0, 0].imshow(im_data)
ax[0, 0].set_title("Original")
ax[0, 1].imshow(big_image)
ax[0, 1].set_title("My scale")
ax[1, 0].set_title("Proper OpenCV")
ax[1, 0].imshow(proper_cv)
ax[1, 1].set_title("Proper Skimage")
ax[1, 1].imshow(proper_cv)
print("my scale vs proper_cv psnr:", cv.PSNR(big_image, proper_cv))
ax[2, 0].set_title("Absdiff OpenCV vs My")
diffy_cv = cv.absdiff(big_image, proper_cv)
ax[2, 0].imshow(diffy_cv)
ax[2, 1].set_title("Absdiff Skimage vs My")
diffy_skimage = cv.absdiff(big_image, proper_skimage)
ax[2, 1].imshow(diffy_skimage)
ax[3, 1].set_title("Absdiff CV vs Skimage")
ax[3, 1].imshow(cv.absdiff(proper_cv, proper_skimage))
ax[3, 0].set_title("Absdiff CV vs Skimage")
ax[3, 0].imshow(cv.absdiff(proper_cv, proper_skimage))
print("diffy_cv", diffy_cv.min(), diffy_cv.max(), diffy_cv.dtype, diffy_cv.shape)
print(
"diffy_skimage",
diffy_skimage.min(),
diffy_skimage.max(),
diffy_skimage.dtype,
diffy_skimage.shape,
)
print(
"proper_skimage vs proper_opencv psnr:",
cv.PSNR(big_image, proper_cv),
cv.absdiff(proper_cv, proper_skimage).max(),
)
plt.show()
它可以与例如
python scaling.py
将 test_sharpening.png 缩放至 2 倍。
到目前为止我已经实现了,看起来工作正常,但仍然有所不同。我也尝试更改
a
的值,但这不是问题。
看来我缩放坐标的方法是错误的。例如,比率为 2 时,y 轴上的新点为 2.0、2.5、3.0 等。
这是错误的,因为坐标应该位于旧点之间,而不是直接在它们之上。我将缩放比例更改为:
# scale new image's coordinate to be in old image based on its midpoint
y = ((j + 0.5) / ratio) - 0.5 + 2
x = ((i + 0.5) / ratio) - 0.5 + 2
现在新的点坐标是1.75、2.25、2.75等等。我的直觉告诉我,这些点应该从 2.25 开始,但在这种情况下,图像与参考相比似乎略有移动。
现在,当 a=-0.75 时,此实现与 cv2 的实现以及其他实现几乎完美匹配,唯一的例外是我的反射边框,它似乎被复制到其他实现上。
我将最终代码与 Rust 版本放在了 Github 上,该版本的速度提高了约 200 倍,可以在更大的图像上进行测试。