为了给出一些解释,我想创建一个协方差矩阵,其中每个元素由内核函数k(x, y)
定义,我想为单个向量执行此操作。它应该是这样的:
# This is given
x = [x1, x2, x3, x4, ...]
# This is what I want to compute
result = [[k(x1, x1), k(x1, x2), k(x1, x3), ...],
[k(x2, x1), k(x2, x2), ...],
[k(x3, x1), k(x3, x2), ...],
...]
但是当然这应该在numpy数组中完成,理想情况下,由于性能的原因,不需要进行Python交互。如果我不关心性能,我可能只会写:
result = np.zeros((len(x), len(x)))
for i in range(len(x)):
for j in range(len(x)):
result[i, j] = k(x[i], x[j])
但我觉得必须有一种更惯用的方式来编写这种模式。
如果k
在2D阵列上运行,您可以使用np.meshgrid
。但是,这会产生额外的内存开销。另一种方法是创建与2D
相同的np.meshgrid
网格视图,就像这样 -
def meshgrid1D_view(x):
shp = (len(x),len(x))
mesh1 = np.broadcast_to(x,shp)
mesh2 = np.broadcast_to(x[:,None],shp)
return mesh1, mesh2
样品运行 -
In [140]: x
Out[140]: array([3, 5, 6, 8])
In [141]: np.meshgrid(x,x)
Out[141]:
[array([[3, 5, 6, 8],
[3, 5, 6, 8],
[3, 5, 6, 8],
[3, 5, 6, 8]]), array([[3, 3, 3, 3],
[5, 5, 5, 5],
[6, 6, 6, 6],
[8, 8, 8, 8]])]
In [142]: meshgrid1D(x)
Out[142]:
(array([[3, 5, 6, 8],
[3, 5, 6, 8],
[3, 5, 6, 8],
[3, 5, 6, 8]]), array([[3, 3, 3, 3],
[5, 5, 5, 5],
[6, 6, 6, 6],
[8, 8, 8, 8]]))
这有什么用?
它有助于提高内存效率,从而提高性能。让我们测试大型阵列,看看差异 -
In [143]: x = np.random.randint(0,10,(10000))
In [144]: %timeit np.meshgrid(x,x)
10 loops, best of 3: 171 ms per loop
In [145]: %timeit meshgrid1D(x)
100000 loops, best of 3: 6.91 µs per loop
另一种解决方案是让numpy进行广播:
import numpy as np
def k(x,y):
return x**2+y
def meshgrid1D_view(x):
shp = (len(x),len(x))
mesh1 = np.broadcast_to(x,shp)
mesh2 = np.broadcast_to(x[:,None],shp)
return mesh1, mesh2
x = np.random.randint(0,10,(10000))
b=k(a[:,None],a[None,:])
def sol0(x):
k(x[:,None],x[None,:])
def sol1(x):
x,y=np.meshgrid(x,x)
k(x,y)
def sol2(x):
x,y=meshgrid1D_view(x)
k(x,y)
%timeit sol0(x)
165 ms ± 1.67 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit sol1(x)
655 ms ± 6.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit sol2(x)
341 ms ± 2.91 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
您会发现这样做效率更高,而且代码更少。