我想做的是计算大量的累积正态分布函数。我只在小实例(2 个变量)上调用它,但速度慢得令人痛苦。这是我的代码到目前为止的样子:
from scipy.stats import multivariate_normal
res = np.zeros((horizon,nb_groups,nb_ratings))
mean = [0,0]
mu = np.zeros((horizon,nb_groups,nb_ratings))
sigma = np.ones((horizon,nb_groups,nb_ratings))
rho = np.ones((horizon,nb_groups,nb_ratings)) #in the actual code rho,mu, sigma do not contain only ones and zeros but it doesnt really matter here
horizon,nb_groups,nb_ratings = 50,3,8
for t in range(horizon):
for g in range(nb_groups):
for i in range(nb_ratings):
if thresh[t,g,i]==float('-inf'):
res[t,g,i]=0
else:
res[t,g,i] = multivariate_normal(mean=mean,cov=[[1,rho[t,g,i]],[rho[t,g,i],1]]).cdf([mu[t,g,i]/np.sqrt(1+sigma[t,g,i]**2),thresh[t,g,i]])
我已经做了一些测试,导致它如此慢的是使用嵌套的 for 循环,如果我可以向量化计算,整个过程会快得多,如下所示:
这是测试设置:
import random
import numpy as np
from scipy.stats import norm, multivariate_normal as mvn
n = 1000000
inputs1 = np.array([random.uniform(-100, 100) for _ in range(n)])
inputs2 = np.array([random.uniform(-100, 100) for _ in range(n)])
inputs3 = np.array([np.array([random.uniform(-100, 100),random.uniform(-100, 100)]) for _ in range(n)])
mean = [0,0]
cov = [[1,0.5],[0.5,1]]
bivariate = mvn(mean, cov)
for i in range(n):
a = bivariate.cdf([inputs1[i],inputs2[i]])
运行时间约3分钟
但是
a = bivariate.cdf(inputs3)
运行时间不到9秒
我的问题是我没有设法应用测试中所示的相同矢量化过程,因为我的协方差矩阵在每次调用时都会发生变化。你有什么想法我可以如何让它更快/矢量化它吗?
cdf
方法有很大的开销。您可以通过直接调用计算 CDF 的(私有)函数来避免开销。
import numpy as np
from scipy.stats import multivariate_normal as mvn, _mvn
rng = np.random.default_rng()
mean = np.zeros(2)
cov = np.asarray([[1, 0.5], [0.5, 1]])
n = 1000000
x = rng.uniform(size=(n, 2))
bivariate = mvn(mean, cov)
ref = bivariate.cdf(x)
# lower limit of integration
low = np.asarray([-np.inf, -np.inf])
# maxpts is the maximum number of points to evaluate the PDF
# abseps and releps try to control the absolute and relative error
options = dict(maxpts=None, abseps=1e-5, releps=1e-5)
# _mvn.mvnun returns a tuple; we want the 0th element
res = np.asarray([_mvn.mvnun(low, xi, mean, cov, **options)[0]
for xi in x])
np.testing.assert_allclose(res, ref)
请注意,_mvn.mvnun
是私人的,因此可以更改,恕不另行通知。 (它可能会在未来几年内发生变化,因为许多旧的 Fortran 代码正在被替换。)