优化数组加法(y、x、RGBA)

问题描述 投票:0回答:1

我有两个数组 A、B,它们的形状均为 (42, 28, 4),其中:

42 : y_dim
28 : x_dim
4  : RGBA
## I'm on MacBook Air M1 2020 16Gb btw

我想通过与此类似的过程将它们组合起来:

def add(A, B):
    X = A.shape[1]
    Y = A.shape[0]
    alpha = A[..., 3] / 255

    B[..., :3] = blend(B[..., :3], A[..., :3], alpha.reshape(Y, X, 1))    

    return B

def blend(c1, c2, alpha):
    return np.asarray((c1 + np.multiply(c2, alpha))/(np.ones(alpha.shape) + alpha), dtype='uint8')

但目前这对我来说有点太慢了(大约 20 毫秒,250 个图像叠加在基本数组 [1] 之上),如果你有任何方法来改进它(最好有 8 位 alpha 支持),我会很高兴知道。

[1]:

start = time.time()
for obj in l: # len(l) == 250
    _slice = np.index_exp[obj.y * 42:(obj.y+1) * 42, obj.x * 28 : (obj.x+1) * 28, :]
    self.pixels[_slice] = add(obj.array, self.pixels[_slice])

stop = time.time()
>>> stop - start # ~20ms 

我已经半尝试过以下方法:

# cv2.addWeighted() in add()
## doesn't work because it has one alpha for the whole image,
## but I want to have indiviual alpha control for each pixel

B = cv.addWeighted(A, 0.5, B, 0.5, 0)
# np.vectorize blend() and use in add()
## way too slow because as the docs mention it's basically just a for-loop

B[..., :3] = np.vectorize(blend)(A[..., :3], B[..., :3], A[..., 3] / 255)

# changed blend() to the following
def blend(a, b, alpha):
    if alpha == 0:
        return b
    elif alpha == 1:
        return a
    
    return (b + a * alpha) / (1 + alpha)
# moved the blend()-stuff to add()
## doesn't combine properly; too dark with alpha

np.multiply(A, alpha.reshape(Y, X, 1)) + np.multiply(B, 1 - alpha.reshape(Y, X, 1))

我也尝试过一些按位的东西,但我的猴脑无法正确理解它。我使用的是 M1 Mac,因此如果您有过 Metalcompute 和 Python 的经验,请写下对此的任何想法!

欢迎任何意见,提前致谢!

python numpy opencv optimization alphablending
1个回答
0
投票

这里是 版本,它在我的计算机(AMD 5700x)上比原始版本快约 2 倍(我没有 M1,所以你的结果可能会有所不同):

@njit
def add_numba(A, B):
    alpha = A[..., 3] / 255

    for i in range(A.shape[0]):
        for j in range(A.shape[1]):
            B[i, j, :3] = (B[i, j, :3] + A[i, j, :3] * alpha[i, j]) / (1 + alpha[i, j])

    return B

基准:

from statistics import median
from timeit import repeat

import numpy as np
from numba import njit


@njit
def add_numba(A, B):
    alpha = A[..., 3] / 255

    for i in range(A.shape[0]):
        for j in range(A.shape[1]):
            B[i, j, :3] = (B[i, j, :3] + A[i, j, :3] * alpha[i, j]) / (1 + alpha[i, j])

    return B


def setup_A_B():
    A = np.random.randint(0, 255, size=(42, 28, 4), dtype="uint8")
    B = np.random.randint(0, 255, size=(42, 28, 4), dtype="uint8")
    return A, B


def add(A, B):
    X = A.shape[1]
    Y = A.shape[0]
    alpha = A[..., 3] / 255

    B[..., :3] = blend(B[..., :3], A[..., :3], alpha.reshape(Y, X, 1))

    return B


def blend(c1, c2, alpha):
    return np.asarray(
        (c1 + np.multiply(c2, alpha)) / (np.ones(alpha.shape) + alpha), dtype="uint8"
    )


# assert the result is equal
np.random.seed(42)
A1, B1 = setup_A_B()
A2, B2 = A1.copy(), B1.copy()
assert np.allclose(add(A1, B1), add_numba(A2, B2))


repeats_normal = repeat(
    "add(A, B)", setup="A, B = setup_A_B()", globals=globals(), repeat=10, number=2500
)
repeats_numba = repeat(
    "add_numba(A, B)",
    setup="A, B = setup_A_B()",
    globals=globals(),
    repeat=10,
    number=2500,
)

print(f"2500 calls (original) = {median(repeats_normal):.4f}")
print(f"2500 calls (numba)    = {median(repeats_numba):.4f}")

打印:

2500 calls (original) = 0.1501
2500 calls (numba)    = 0.0742
© www.soinside.com 2019 - 2024. All rights reserved.