以“向量化”方式“重新映射”Python numpy 数组？

Question

最近我发现，在处理至少稍大的数据集时，Python 在运行速度方面并不能很好地使用“for 循环”。

我有一项任务正在执行转换操作，因此我必须将数据从一个数组“重新映射”到一个新数组，但数据最终在新数组中的位置取决于计算结果。

假设我现在正在考虑 3 个 numpy 数组：

A（有原始数据），B（点的重新映射，无数据），C（结果数组）。

有什么方法可以“快速”执行此操作吗？

我的意思是，我知道如果我用“C”写这个，它可能根本就不是一个问题，因为无论如何它运行得足够快。如果我真的需要速度，我可能可以用方程将 A 中的 B 和 C 放在一起，直接从内存中直接传输出指针位置。

我还不想让事情变得那么复杂。但我想知道是否有一种快速、有效的方法可以在 Python 中做到这一点。

这是一个例子：假设我正在做三重剪切图像旋转。在这种情况下，“A”是我的形状为 (x, y, 3) 的原始图像（3 显然是 RGB 值），“B”是我的“变换”，即“A”中的旧像素现在应该结束的位置。然后“C”是我必须实际将所有内容移出的时候（即 x, y 3，将三维中的所有数据移动到新位置）。

如前所述，对于 for 循环来说这并不难，但 Python 中的 for 循环非常慢。

import numpy as np
import math as m

# Define theta in degrees
angle = 30
theta = m.radians(angle)

# Triple shear transformation

# first = np.array([[1, -np.tan(theta/2)], [0, 1]])
# second = np.array([[1, 0], [np.sin(theta), 1]])
# third = np.array([[1, -np.tan(theta/2)], [0, 1]])

side = np.array([[1, -np.tan(theta/2)], [0, 1]])
middle = np.array([[1, 0], [np.sin(theta), 1]])

# Define vector test set - X, Y
pixels = np.array([[10, 5], [20, 3], [4, 5]])

# Perform the matrix multiplication

# result = first @ second @ third @ pixels.T
result = side @ middle @ side @ pixels.T

# Print the result
print(np.round(result.T))

鉴于此“玩具代码”的输出，您将得到一个矩阵

[[ 6.  9.], [16. 13.], [ 1.  6.]]

。

有没有直接的方法可以告诉Python将数据从原始位置（

[[10, 5], [20, 3], [4, 5]]

）直接发送到新位置，而不执行迭代？

即“就像”一个复制命令，但我只是用一个“什么应该去哪里”的列表来呈现它？也许我可以用

np.roll()

做我想做的事？

Answer 1

您正在执行的操作已经高度优化，

numpy

在这种类型的事情上是惊人的，并且您完全按照预期使用它，因此它的性能预期相当好，并且获得更好的性能将是非常好的具有挑战性的。在测试过程中，我发现你的原始算法的运行时间不到 10 微秒，即使向量长度高达 1000，这已经相当快了。

对于某些向量长度，您可以使用

numba

获得更好的性能，但随着向量长度的增加，其好处会逐渐减少，这让我感到惊讶，但这就是我们测试这些东西的原因。下面是一些代码，展示了如何对其进行优化，将长度 3 向量从 7.6 us/run 提高到 2.2 us/run，并提供有意义的改进，长度可达 1000 个向量，其中改进从 10.5 us/run 提高到 4.2 us/run。如果您正在尝试处理大型向量，例如长度> 1000，这可能不值得额外的工作和依赖性。如果您尝试处理许多小向量，那么优化循环可能比优化每个单独的操作更有效。如果您始终执行相同的旋转，则可以将该部分从该功能中拉出以进一步加快速度。

为了优化这一点，我将所有内容都转换为浮点数，以便

numba

可以编译它，将

math.radians

替换为

numpy.radians

，删除

print

，因为它主导了运行时间。我把你原来的代码，稍微修改了一下以方便测试，放在

test

函数中进行比较。

import numpy as np
import math as m
import numba
# May require scipy


def test(angle, pixels):
    # Define theta in degrees
    # angle = 30
    theta = m.radians(angle)

    # Triple shear transformation
    side = np.array([[1, -np.tan(theta/2)], [0, 1]])
    middle = np.array([[1, 0], [np.sin(theta), 1]])

    # Define vector test set - X, Y
    # pixels = np.array([[10, 5], [20, 3], [4, 5]])

    # Perform the matrix multiplication

    result = side @ middle @ side @ pixels.T

    # Print the result
    # print(np.round(result.T))
    return result.T


@numba.njit
def test_optimized(angle, pixels):
    # Define theta in degrees
    theta = np.radians(angle)

    # Triple shear transformation
    side = np.array([[1.0, -np.tan(theta/2.0)], [0.0, 1.0]])
    middle = np.array([[1.0, 0.0], [np.sin(theta), 1.0]])

    # Perform the matrix multiplication
    result = side @ middle @ side @ pixels.T

    return result.T


def _main():
    import time

    def _gt(s=0.0) -> float:
        return time.perf_counter() - s

    N = int(1e3)

    angle = 30.0
    pixels = np.array([[10, 5], [20, 3], [4, 5]])

    # Get any compilation things out of the way
    test(angle, pixels)
    test_optimized(angle, pixels.astype(float))

    for L in (3, 10, 100, 1_000, 10_000, 100_000):
        pixels2 = np.random.random(size=(L, 2))
        for name, fun in (('orig', test), ('optm', test_optimized)):
            s = _gt()
            for i in range(N):
                fun(angle, pixels2)
            print(f'Run time {name}: {_gt(s) * 1e6 / N:6.1f} us / run  -  L = {L}')
        print()

    print(f'Valid: {np.allclose(test(angle, pixels), test_optimized(angle, pixels.astype(float)))}')


if __name__ == '__main__':
    _main()

运行我得到以下输出：

Run time orig:    7.6 us / run  -  L = 3
Run time optm:    2.2 us / run  -  L = 3

Run time orig:    7.7 us / run  -  L = 10
Run time optm:    2.3 us / run  -  L = 10

Run time orig:    8.0 us / run  -  L = 100
Run time optm:    2.6 us / run  -  L = 100

Run time orig:   10.5 us / run  -  L = 1000
Run time optm:    4.2 us / run  -  L = 1000

Run time orig:   26.5 us / run  -  L = 10000
Run time optm:   20.7 us / run  -  L = 10000

Run time orig:  442.1 us / run  -  L = 100000
Run time optm:  421.4 us / run  -  L = 100000

Valid: True

在 Windows 10、Python 3.11.8、i9-10900K 上测试

以“向量化”方式“重新映射”Python numpy 数组？

问题描述投票：0回答：1

1个回答

最新问题

以“向量化”方式“重新映射”Python numpy 数组？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1