我有一个
np.uint64
的 numpy 数组,仅包含 0
或 1
值,并且我必须将 0
映射到 np.float64(1.0)
,将 1
映射到 np.float64(-1.0)
。
由于解释器不知道它只需转换
0
和 1
,它使用了昂贵的通用算法,所以我想使用带有结果的数组,并使用 uint64
作为索引数组,避免任何转换,但速度更慢。
import numpy as np
import timeit
random_bit = np.random.randint(0, 2, size=(10000), dtype=np.uint64)
def np_cast(random_bit):
vectorized_result = 1.0 - 2.0 * np.float64(random_bit)
return vectorized_result
def product(random_bit):
mapped_result = 1.0 - 2.0 * random_bit
return mapped_result
np_one_minus_one = np.array([1.0, -1.0]).astype(np.float64)
def _array(random_bit):
mapped_result = np_one_minus_one[random_bit]
return mapped_result
one = np.float64(1)
minus_two = np.float64(-2)
def astype(random_bit):
mapped_result = one + minus_two * random_bit.astype(np.float64)
return mapped_result
function_list = [np_cast, product, _array, astype]
print("start benchmark")
for function in function_list:
_time = timeit.timeit(lambda: function(random_bit), number=100000)
print(f"{function.__name__}: {_time:.3f} seconds")
我得到这些时间:
np_cast: 178.604 seconds
product: 172.939 seconds
_array: 239.305 seconds
astype: 186.031 seconds
使用 numba 可以将速度提高约 4 倍,对于一般的 Nd 情况,这可能是:
import numba as nb
@nb.vectorize
def numba_if(random_bit):
return -1.0 if random_bit else 1.0
@nb.vectorize
def numba_product(random_bit):
return 1.0 - 2.0 * random_bit
或者在特定的一维情况下,您可以使用显式循环来使其更快:
import numpy as np
@nb.njit
def numba_if_loop(random_bit):
assert random_bit.ndim == 1
result = np.empty_like(random_bit, dtype=np.float64)
for i in range(random_bit.size):
result[i] = -1.0 if random_bit[i] else 1.0
return result
@nb.njit
def numba_product_loop(random_bit):
assert random_bit.ndim == 1
result = np.empty_like(random_bit, dtype=np.float64)
for i in range(random_bit.size):
result[i] = 1.0 - 2.0 * random_bit[i]
return result
时间(
mason
是评论中的lambda(x):(1-2*x.astype(np.int8)).astype(float)
):
%timeit np_cast(random_bit)
%timeit product(random_bit)
%timeit _array(random_bit)
%timeit astype(random_bit)
%timeit mason(random_bit)
assert np.array_equal(np_cast(random_bit), numba_if(random_bit))
assert np.array_equal(np_cast(random_bit), numba_product(random_bit))
assert np.array_equal(np_cast(random_bit), numba_if_loop(random_bit))
assert np.array_equal(np_cast(random_bit), numba_product_loop(random_bit))
%timeit numba_if(random_bit)
%timeit numba_product(random_bit)
%timeit numba_if_loop(random_bit)
%timeit numba_product_loop(random_bit)
输出:
6.58 µs ± 218 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
7.58 µs ± 251 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
11 µs ± 9.34 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
7.32 µs ± 674 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
6.86 µs ± 153 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.89 µs ± 25.8 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
2.07 µs ± 13.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.6 µs ± 14.7 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
1.78 µs ± 5.31 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)