使用 python 进行多元学生 t 分布

问题描述 投票:0回答:5

要生成具有多元 t 分布的样本,我使用此函数:

def multivariatet(mu,Sigma,N,M):
    '''
    Output:
    Produce M samples of d-dimensional multivariate t distribution
    Input:
    mu = mean (d dimensional numpy array or scalar)
    Sigma = scale matrix (dxd numpy array)
    N = degrees of freedom
    M = # of samples to produce
    '''
    d = len(Sigma)
    g = np.tile(np.random.gamma(N/2.,2./N,M),(d,1)).T
    Z = np.random.multivariate_normal(np.zeros(d),Sigma,M)
    return mu + Z/np.sqrt(g)

但我现在正在寻找的是多元学生t分布本身,这样我就可以计算元素的密度,其中

dimension > 1

这将类似于

scipy
包的 stats.t.pdf(x, df, loc, scale),但在多维空间中。

python statistics scipy probability-density
5个回答
9
投票

我自己编码了密度:

import numpy as np
from math import *

def multivariate_t_distribution(x,mu,Sigma,df,d):
    '''
    Multivariate t-student density:
    output:
        the density of the given element
    input:
        x = parameter (d dimensional numpy array or scalar)
        mu = mean (d dimensional numpy array or scalar)
        Sigma = scale matrix (dxd numpy array)
        df = degrees of freedom
        d: dimension
    '''
    Num = gamma(1. * (d+df)/2)
    Denom = ( gamma(1.*df/2) * pow(df*pi,1.*d/2) * pow(np.linalg.det(Sigma),1./2) * pow(1 + (1./df)*np.dot(np.dot((x - mu),np.linalg.inv(Sigma)), (x - mu)),1.* (d+df)/2))
    d = 1. * Num / Denom 
    return d

2
投票

这将评估 n × d 数据矩阵 X 的多元学生 T 分布的对数 pdf:

from scipy.special import gamma
from numpy.linalg import slogdet

def multivariate_student_t(X, mu, Sigma, df):    
    #multivariate student T distribution

    [n,d] = X.shape
    Xm = X-mu
    V = df * Sigma
    V_inv = np.linalg.inv(V)
    (sign, logdet) = slogdet(np.pi * V)

    logz = -gamma(df/2.0 + d/2.0) + gamma(df/2.0) + 0.5*logdet
    logp = -0.5*(df+d)*np.log(1+ np.sum(np.dot(Xm,V_inv)*Xm,axis=1))

    logp = logp - logz            

    return logp

2
投票

我概括了@farhawa的代码以允许在

x
中输入多个条目(我发现我想一次查询多个点)。

import numpy as np
from math import gamma

def multivariate_t_distribution(x, mu, Sigma, df):
    '''
    Multivariate t-student density. Returns the density
    of the function at points specified by x.

    input:
        x = parameter (n-d numpy array; will be forced to 2d)
        mu = mean (d dimensional numpy array)
        Sigma = scale matrix (dxd numpy array)
        df = degrees of freedom

    Edited from: http://stackoverflow.com/a/29804411/3521179
    '''

    x = np.atleast_2d(x) # requires x as 2d
    nD = Sigma.shape[0] # dimensionality

    numerator = gamma(1.0 * (nD + df) / 2.0)

    denominator = (
            gamma(1.0 * df / 2.0) * 
            np.power(df * np.pi, 1.0 * nD / 2.0) *  
            np.power(np.linalg.det(Sigma), 1.0 / 2.0) * 
            np.power(
                1.0 + (1.0 / df) *
                np.diagonal(
                    np.dot( np.dot(x - mu, np.linalg.inv(Sigma)), (x - mu).T)
                ), 
                1.0 * (nD + df) / 2.0
                )
            )

    return 1.0 * numerator / denominator 

0
投票

我尝试了上述答案,但每个答案都得到了不同的结果,但我不确定为什么/可能出了什么问题。以下内容,我基于高斯混合的 scikit-learn 代码,我认为有效(对于任意大小的输入 numpy 数组 X 和 c t 分布,参数包含列表均值和协变量):

import numpy as np
from scipy import linalg
try:  # SciPy >= 0.19
    from scipy.special import gammaln as sp_gammaln
except ImportError:
    from scipy.misc import gammaln as sp_gammaln

def log_multivariate_t_density(X, means, covars, nu = 1):
    n_samples, n_dim = X.shape
    nmix = len(means)
    log_prob = np.empty((n_samples, nmix))
    for c, (mu, cv) in enumerate(zip(means, covars)):
        try:
            cv_chol = linalg.cholesky(cv, lower=True)
        except linalg.LinAlgError:

            try:
                cv_chol = linalg.cholesky(cv + min_covar * np.eye(n_dim),
                                  lower=True)
            except linalg.LinAlgError:
                raise ValueError("'covars' must be symmetric, "
                         "positive-definite")

        cv_log_det = 2 * np.sum(np.log(np.diagonal(cv_chol)))
        cv_sol = linalg.solve_triangular(cv_chol, (X - mu).T, lower=True).T

        norm = (sp_gammaln((nu + n_dim) / 2.) - sp_gammaln(nu / 2.)
                - 0.5 * n_dim * np.log(nu * np.pi))
        inner = - (nu + n_dim) / 2. * np.log1p(np.sum(cv_sol ** 2, axis=1) / nu)
        log_prob[:, c] = norm + inner - cv_log_det

    return log_prob

0
投票

这将类似于 scipy 包的

stats.t.pdf(x, df, loc, scale)
,但在多维空间中。

这个问题相当老了,所以值得添加更新:

scipy
的最新版本(例如1.11.2)包括用于使用multivariate t-distribution(=multivariate Student distribution)的类 - 请参阅Wikipedia定义):
scipy.stats.multivariate_t

它允许生成随机样本,以及计算概率密度(pdf)、累积分布函数(cdf)、对数似然和熵。它缺乏单变量分布的类似类中存在的许多特征,但是

scipy.stats.multivariate_t.pdf(x, loc=None, shape=1, df=1, allow_singular=False)

完全回答了这个问题。

© www.soinside.com 2019 - 2024. All rights reserved.