如何更快地大量计算 scipy.stats.multivariate_normal.cdf() ？

Question

我想做的是计算大量的累积正态分布函数。我只在小实例（2 个变量）上调用它，但速度慢得令人痛苦。这是我的代码到目前为止的样子：

from scipy.stats import multivariate_normal

res = np.zeros((horizon,nb_groups,nb_ratings))
mean = [0,0]
mu = np.zeros((horizon,nb_groups,nb_ratings))
sigma = np.ones((horizon,nb_groups,nb_ratings))
rho = np.ones((horizon,nb_groups,nb_ratings)) #in the actual code rho,mu, sigma do not contain only ones and zeros but it doesnt really matter here

horizon,nb_groups,nb_ratings = 50,3,8
for t in range(horizon):
    for g in range(nb_groups):
        for i in range(nb_ratings):
            if thresh[t,g,i]==float('-inf'):
                res[t,g,i]=0
            else:
                res[t,g,i] = multivariate_normal(mean=mean,cov=[[1,rho[t,g,i]],[rho[t,g,i],1]]).cdf([mu[t,g,i]/np.sqrt(1+sigma[t,g,i]**2),thresh[t,g,i]])

我已经做了一些测试，导致它如此慢的是使用嵌套的 for 循环，如果我可以向量化计算，整个过程会快得多，如下所示：

这是测试设置：

import random
import numpy as np
from scipy.stats import norm, multivariate_normal as mvn

n = 1000000

inputs1 = np.array([random.uniform(-100, 100) for _ in range(n)])
inputs2 = np.array([random.uniform(-100, 100) for _ in range(n)])
inputs3 = np.array([np.array([random.uniform(-100, 100),random.uniform(-100, 100)]) for _ in range(n)])

mean = [0,0]
cov = [[1,0.5],[0.5,1]]
bivariate = mvn(mean, cov)

for i in range(n):
    a = bivariate.cdf([inputs1[i],inputs2[i]])

运行时间约3分钟

但是

a = bivariate.cdf(inputs3)

运行时间不到9秒

我的问题是我没有设法应用测试中所示的相同矢量化过程，因为我的协方差矩阵在每次调用时都会发生变化。你有什么想法我可以如何让它更快/矢量化它吗？

Answer 1

在这种情况下，循环速度很慢，因为

cdf

 方法有很大的开销。您可以通过直接调用计算 CDF 的（私有）函数来避免开销。

import numpy as np
from scipy.stats import multivariate_normal as mvn, _mvn
rng = np.random.default_rng()

mean = np.zeros(2)
cov = np.asarray([[1, 0.5], [0.5, 1]])
n = 1000000
x = rng.uniform(size=(n, 2))

bivariate = mvn(mean, cov)
ref = bivariate.cdf(x)

# lower limit of integration
low = np.asarray([-np.inf, -np.inf])
# maxpts is the maximum number of points to evaluate the PDF
# abseps and releps try to control the absolute and relative error
options = dict(maxpts=None, abseps=1e-5, releps=1e-5)
# _mvn.mvnun returns a tuple; we want the 0th element
res = np.asarray([_mvn.mvnun(low, xi, mean, cov, **options)[0] 
                  for xi in x])

np.testing.assert_allclose(res, ref)

请注意，

_mvn.mvnun

是私人的，因此可以更改，恕不另行通知。 （它可能会在未来几年内发生变化，因为许多旧的 Fortran 代码正在被替换。）

如何更快地大量计算 scipy.stats.multivariate_normal.cdf() ？

问题描述投票：0回答：1

1个回答

最新问题

如何更快地大量计算 scipy.stats.multivariate_normal.cdf() ？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1