优化数组加法（y、x、RGBA）

Question

我有两个数组 A、B，它们的形状均为 (42, 28, 4)，其中：

42 : y_dim
28 : x_dim
4  : RGBA
## I'm on MacBook Air M1 2020 16Gb btw

我想通过与此类似的过程将它们组合起来：

def add(A, B):
    X = A.shape[1]
    Y = A.shape[0]
    alpha = A[..., 3] / 255

    B[..., :3] = blend(B[..., :3], A[..., :3], alpha.reshape(Y, X, 1))    

    return B

def blend(c1, c2, alpha):
    return np.asarray((c1 + np.multiply(c2, alpha))/(np.ones(alpha.shape) + alpha), dtype='uint8')

但目前这对我来说有点太慢了（大约 20 毫秒，250 个图像叠加在基本数组 [1] 之上），如果你有任何方法来改进它（最好有 8 位 alpha 支持），我会很高兴知道。

[1]：

start = time.time()
for obj in l: # len(l) == 250
    _slice = np.index_exp[obj.y * 42:(obj.y+1) * 42, obj.x * 28 : (obj.x+1) * 28, :]
    self.pixels[_slice] = add(obj.array, self.pixels[_slice])

stop = time.time()
>>> stop - start # ~20ms

我已经半尝试过以下方法：

# cv2.addWeighted() in add()
## doesn't work because it has one alpha for the whole image,
## but I want to have indiviual alpha control for each pixel

B = cv.addWeighted(A, 0.5, B, 0.5, 0)

# np.vectorize blend() and use in add()
## way too slow because as the docs mention it's basically just a for-loop

B[..., :3] = np.vectorize(blend)(A[..., :3], B[..., :3], A[..., 3] / 255)

# changed blend() to the following
def blend(a, b, alpha):
    if alpha == 0:
        return b
    elif alpha == 1:
        return a
    
    return (b + a * alpha) / (1 + alpha)

# moved the blend()-stuff to add()
## doesn't combine properly; too dark with alpha

np.multiply(A, alpha.reshape(Y, X, 1)) + np.multiply(B, 1 - alpha.reshape(Y, X, 1))

我也尝试过一些按位的东西，但我的猴脑无法正确理解它。我使用的是 M1 Mac，因此如果您有过 Metalcompute 和 Python 的经验，请写下对此的任何想法！

欢迎任何意见，提前致谢！

Answer 1

这里是 numba 版本，它在我的计算机（AMD 5700x）上比原始版本快约 2 倍（我没有 M1，所以你的结果可能会有所不同）：

@njit
def add_numba(A, B):
    alpha = A[..., 3] / 255

    for i in range(A.shape[0]):
        for j in range(A.shape[1]):
            B[i, j, :3] = (B[i, j, :3] + A[i, j, :3] * alpha[i, j]) / (1 + alpha[i, j])

    return B

基准：

from statistics import median
from timeit import repeat

import numpy as np
from numba import njit


@njit
def add_numba(A, B):
    alpha = A[..., 3] / 255

    for i in range(A.shape[0]):
        for j in range(A.shape[1]):
            B[i, j, :3] = (B[i, j, :3] + A[i, j, :3] * alpha[i, j]) / (1 + alpha[i, j])

    return B


def setup_A_B():
    A = np.random.randint(0, 255, size=(42, 28, 4), dtype="uint8")
    B = np.random.randint(0, 255, size=(42, 28, 4), dtype="uint8")
    return A, B


def add(A, B):
    X = A.shape[1]
    Y = A.shape[0]
    alpha = A[..., 3] / 255

    B[..., :3] = blend(B[..., :3], A[..., :3], alpha.reshape(Y, X, 1))

    return B


def blend(c1, c2, alpha):
    return np.asarray(
        (c1 + np.multiply(c2, alpha)) / (np.ones(alpha.shape) + alpha), dtype="uint8"
    )


# assert the result is equal
np.random.seed(42)
A1, B1 = setup_A_B()
A2, B2 = A1.copy(), B1.copy()
assert np.allclose(add(A1, B1), add_numba(A2, B2))


repeats_normal = repeat(
    "add(A, B)", setup="A, B = setup_A_B()", globals=globals(), repeat=10, number=2500
)
repeats_numba = repeat(
    "add_numba(A, B)",
    setup="A, B = setup_A_B()",
    globals=globals(),
    repeat=10,
    number=2500,
)

print(f"2500 calls (original) = {median(repeats_normal):.4f}")
print(f"2500 calls (numba)    = {median(repeats_numba):.4f}")

打印：

2500 calls (original) = 0.1501
2500 calls (numba)    = 0.0742

优化数组加法（y、x、RGBA）

问题描述投票：0回答：1

1个回答

最新问题

优化数组加法（y、x、RGBA）

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1