我有一个很长的一维数组,我想计算它的累积和,然后在结果数组的开头添加一个零。
import numpy as np
def padded_cumsum(x):
cum_sum_x = np.cumsum(x) # Creating a new array here is fine
return np.pad(cum_sum_x, (1,0), mode='constant') # Prepend a single zero w/ pad
x = np.array(range(2**25))
print padded_cumsum(x)
函数
padded_cumsum
将被调用数十亿次,并且数组长度不同,因此我试图避免任何数组复制,因为这很昂贵。另外,我不能通过在开始/结束处用额外的值/NaN实例化原始数组x
来改变它。由于 cum_sum_x
无论如何都需要创建,我怀疑我可以通过做一些像黑客一样的事情来偷偷地输入零:
def padded_cumsum(x):
cum_sum_x = np.empty(x.shape[0]+1)
cum_sum_x[0] = 0
cum_sum_x[1:] = np.cumsum(x)
return np.pad(cum_sum_x, (1,0), mode='constant') # Prepend a single zero w/ pad
在手动分配的数组上使用
out
关键字。
out = np.empty(len(x)+pad, dtype=yourdtype)
np.cumsum(x, out=out[pad:])
out[:pad] = 0
你可以就地射精:
def padcumsum(x):
csum=np.hstack((0,x)) # 3x faster than pad.
csum.cumsum(out=csum)
return csum
对于性能问题,您可以安装 numba :
@numba.njit
def perf(x):
csum=np.empty(x.size+1,x.dtype)
csum[0]=0
for i in range(x.size):
csum[i+1]=csum[i]+x[i]
return csum
比 padcumsum 快两倍
我比较了几个选项,主要来自这里。
np.concatenate(([0], x)).cumsum()
是最快的。
x:问题大小,y:1000 次运行的计算时间
import timeit
import random
import numpy as np
import matplotlib.pyplot as plt
cmds = [
'np.r_[[0], x].cumsum()',
'np.hstack(([0], x)).cumsum()',
'np.concatenate(([0], x)).cumsum()',
'csp0 = np.zeros(shape=(len(x) + 1,)); np.cumsum(x, out=csp0[1:])',
]
test_range = [1e0, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6]
# test_range = [1e0, 1e1, 1e2]
ts = np.empty((len(cmds), len(test_range)), dtype=float)
for tt, size_float in enumerate(test_range):
size = round(size_float)
print('array size:', size)
x = np.random.randint(low=0, high=100, size=size)
n_trials = 1000
for cc, cmd in enumerate(cmds):
t = timeit.Timer(cmd, globals={**globals(), **locals()})
t = t.timeit(n_trials)
ts[cc, tt] = t
print('time for {:d}x \"{:}\": {:.6f}'.format(n_trials, cmd, t))
fig, ax = plt.subplots(1, 1, figsize=(15, 10))
for cc, cmd in enumerate(cmds):
ax.plot(test_range, ts[cc, :], label=cmd)
print(cmd)
ax.legend()
ax.set_xscale('log')
ax.set_yscale('log')
array size: 1
time for 1000x "np.r_[[0], x].cumsum()": 0.015609
time for 1000x "np.hstack(([0], x)).cumsum()": 0.005469
time for 1000x "np.concatenate(([0], x)).cumsum()": 0.002997
time for 1000x "csp0 = np.zeros(shape=(len(x) + 1,)); np.cumsum(x, out=csp0[1:])": 0.003499
array size: 10
time for 1000x "np.r_[[0], x].cumsum()": 0.018170
time for 1000x "np.hstack(([0], x)).cumsum()": 0.005663
time for 1000x "np.concatenate(([0], x)).cumsum()": 0.002993
time for 1000x "csp0 = np.zeros(shape=(len(x) + 1,)); np.cumsum(x, out=csp0[1:])": 0.003511
array size: 100
time for 1000x "np.r_[[0], x].cumsum()": 0.018444
time for 1000x "np.hstack(([0], x)).cumsum()": 0.005621
time for 1000x "np.concatenate(([0], x)).cumsum()": 0.003145
time for 1000x "csp0 = np.zeros(shape=(len(x) + 1,)); np.cumsum(x, out=csp0[1:])": 0.003770
array size: 1000
time for 1000x "np.r_[[0], x].cumsum()": 0.018034
time for 1000x "np.hstack(([0], x)).cumsum()": 0.007816
time for 1000x "np.concatenate(([0], x)).cumsum()": 0.005275
time for 1000x "csp0 = np.zeros(shape=(len(x) + 1,)); np.cumsum(x, out=csp0[1:])": 0.006885
array size: 10000
time for 1000x "np.r_[[0], x].cumsum()": 0.036433
time for 1000x "np.hstack(([0], x)).cumsum()": 0.027001
time for 1000x "np.concatenate(([0], x)).cumsum()": 0.024336
time for 1000x "csp0 = np.zeros(shape=(len(x) + 1,)); np.cumsum(x, out=csp0[1:])": 0.034565
array size: 100000
time for 1000x "np.r_[[0], x].cumsum()": 0.228152
time for 1000x "np.hstack(([0], x)).cumsum()": 0.219081
time for 1000x "np.concatenate(([0], x)).cumsum()": 0.215639
time for 1000x "csp0 = np.zeros(shape=(len(x) + 1,)); np.cumsum(x, out=csp0[1:])": 0.311723
array size: 1000000
time for 1000x "np.r_[[0], x].cumsum()": 2.693319
time for 1000x "np.hstack(([0], x)).cumsum()": 2.656931
time for 1000x "np.concatenate(([0], x)).cumsum()": 2.634273
time for 1000x "csp0 = np.zeros(shape=(len(x) + 1,)); np.cumsum(x, out=csp0[1:])": 3.755322
np.r_[[0], x].cumsum()
np.hstack(([0], x)).cumsum()
np.concatenate(([0], x)).cumsum()
csp0 = np.zeros(shape=(len(x) + 1,)); np.cumsum(x, out=csp0[1:])
这是另一种选择:
x = np.roll(x.cumsum(), 1)
x[0] = 0
使用Markus答案中的基准脚本,我发现它的性能介于np.r_和np.hstack之间。但如果您使用 PyTorch 而不是 Numpy,这似乎是最快的选择。