我参考了这个问题,已经有很好的答案了;但发现了不必要的操作(请参阅帖子中的讨论),我只是好奇是否可以成功消除它们......
同时,我找到了一种方法,可以避免不必要的乘法(使用掩码进行索引)并给出相同的结果。代码如下。
变体 1 是原始版本。
在变体 2 中,我尝试结合使用 python 切片和掩码 - 不仅是为了以更好、更紧凑的方式编写两个循环,而且主要是希望它会变得更快。但事实证明,它甚至慢了约 30%。老实说,原始代码的可读性更好,但我希望与双循环相比能够得到显着的改进。
为什么情况并非如此?
或者反过来问:在哪些情况下切片操作比逐元素操作更快?它们只是具有大量内部开销的语法糖吗?我认为它们是在底层用 C/C++ 实现的,并且一定比在 Python 中手动循环
i,j
更快。
输出:
D:\python\animation>python test.py
used time for variant 1: 1.0377624034881592
used time for variant 2: 1.30381441116333
D:\python\animation>python test.py
used time for variant 1: 0.8954949378967285
used time for variant 2: 1.251044750213623
D:\python\animation>python test.py
used time for variant 1: 0.9750621318817139
used time for variant 2: 1.3896379470825195
代码:
import numpy as np
import numpy.ma as ma
import time
def test():
f = np.array([
[0, 0, 0, 0, 0, 0, 0],
[0, 1, 3, 6 , 4, 2, 0],
[0, 2, 4, 7 , 6, 4, 0],
[0, 0, 0, 0, 0, 0, 0]
])
u = np.array([
[0, 0, 0, 0, 0, 0, 0],
[0, 0.5, 1, 0, -1, -0.5, 0],
[0, 0.7, 1.1, 0, -1, -0.4, 0],
[0, 0, 0, 0, 0, 0, 0],
])
# calculate : variant 1
x = np.zeros_like(f)
maxcount = 100000
start = time.time()
for count in range(maxcount):
for i in range(1,u.shape[0]-1):
for j in range(1,u.shape[1]-1):
if u[i,j] > 0:
x[i,j] = u[i,j]*(f[i,j]-f[i,j-1])
else:
x[i,j] = u[i,j]*(f[i,j+1]-f[i,j])
end = time.time()
print("used time for variant 1:", end-start)
# calculate : variant 2
y = np.zeros_like(f)
start = time.time()
for count in range(maxcount):
maskl = (u[1:-1, 1:-1] > 0)
maskr = ~maskl
diff = f[1:-1, 1:] - f[1:-1, 0:-1]
(y[1:-1, 1:-1])[maskl] = (u[1:-1, 1:-1 ])[maskl] * (diff[:, :-1])[maskl]
(y[1:-1, 1:-1])[maskr] = (u[1:-1, 1:-1 ])[maskr] * (diff[:, 1: ])[maskr]
end = time.time()
print("used time for variant 2:", end-start)
np.testing.assert_array_equal(x, y)
test()
“预取”u 和 y 切片使其好一点,但效果并不显着:
for count in range(maxcount):
maskl = (u[1:-1, 1:-1] > 0)
maskr = ~maskl
diff = f[1:-1, 1:] - f[1:-1, 0:-1]
yy = (y[1:-1, 1:-1]) # <<--
uu = (u[1:-1, 1:-1 ]) # <<--
yy[maskl] = uu[maskl] * (diff[:, :-1])[maskl]
yy[maskr] = uu[maskr] * (diff[:, 1: ])[maskr]
您可以使用 numba 轻松加快该过程。另外,正如评论中所述,这取决于您的输入数组有多大 - 数组越大,第二个变体就会变得更快。
这是快速基准测试:
import perfplot
import numpy as np
from numba import njit
def variant_1(u, f):
x = np.zeros_like(f)
for i in range(1, u.shape[0] - 1):
for j in range(1, u.shape[1] - 1):
if u[i, j] > 0:
x[i, j] = u[i, j] * (f[i, j] - f[i, j - 1])
else:
x[i, j] = u[i, j] * (f[i, j + 1] - f[i, j])
return x
def variant_2(u, f):
y = np.zeros_like(f)
maskl = u[1:-1, 1:-1] > 0
maskr = ~maskl
diff = f[1:-1, 1:] - f[1:-1, 0:-1]
(y[1:-1, 1:-1])[maskl] = (u[1:-1, 1:-1])[maskl] * (diff[:, :-1])[maskl]
(y[1:-1, 1:-1])[maskr] = (u[1:-1, 1:-1])[maskr] * (diff[:, 1:])[maskr]
return y
@njit
def variant_numba(u, f):
x = np.zeros_like(f)
for i in range(1, u.shape[0] - 1):
for j in range(1, u.shape[1] - 1):
if u[i, j] > 0:
x[i, j] = u[i, j] * (f[i, j] - f[i, j - 1])
else:
x[i, j] = u[i, j] * (f[i, j + 1] - f[i, j])
return x
f = np.array(
[
[0, 0, 0, 0, 0, 0, 0],
[0, 1, 3, 6, 4, 2, 0],
[0, 2, 4, 7, 6, 4, 0],
[0, 0, 0, 0, 0, 0, 0],
]
)
u = np.array(
[
[0, 0, 0, 0, 0, 0, 0],
[0, 0.5, 1, 0, -1, -0.5, 0],
[0, 0.7, 1.1, 0, -1, -0.4, 0],
[0, 0, 0, 0, 0, 0, 0],
]
)
x1 = variant_1(u, f)
x2 = variant_2(u, f)
x3 = variant_numba(u, f)
assert np.allclose(x1, x2)
assert np.allclose(x1, x3)
def setup_u_f(n):
return np.tile(u, (n, n)), np.tile(f, (n, n))
perfplot.show(
setup=setup_u_f,
kernels=[
lambda u, f: variant_1(u, f),
lambda u, f: variant_2(u, f),
lambda u, f: variant_numba(u, f),
],
labels=["variant_1", "variant_2", "variant_numba"],
n_range=[1, 2, 5, 10, 20, 50, 100],
xlabel="np.tile(_, (n, n))",
logx=True,
logy=True,
)
创建此图表: