使用scipy.stats和numpy理解对数正态分布和正态分布之间的关系。

Question

如果有人能帮助我了解我的错误之处，我会非常感激。我有一些描述概率分布的数据。这些数据为我提供了P10、P50和P90的值。我也知道这个分布是对数正态分布。

我读到，对于一个对数正态分布的随机变量X，那么Y = ln(X)具有正态分布--例如维基百科(https:/en.wikipedia.orgwikiLog-normal_distribution。).

然而，当我试图用scipystats和numpy来理解这个问题时，我无法让它成为真的。既然我知道这是真的，而且我知道我在这些python库中使用的简单函数没有任何问题，我知道我的理解有漏洞。只是，我实在看不出来我漏掉了什么......

我正在使用的代码是

    # build a lognormal distribution with scipystats (ss):

    # set parameters (based on the standard normal distribution mu=0 and sigma=1:
    s, mu, sd, size = 0.5,0,1,100000

    # save the distribution:
    X = ss.lognorm.rvs(s,loc=mu,scale=sd,size=size)

    # convert to normal distribution (i.e. calc the natural log of X):
    Y = np.log(X)

    # Check if Y is normal using ratio between p90-p50 and p50-p10 - should be 1:
    p10,p50,p90 = np.percentile(Y,[10,50,90])
    (p90-p50)/(p50-p10)

上面的代码返回0. 9932或其他接近于1的值目前为止很好我可以随心所欲地改变s和scale（或者说到目前为止我已经尝试过了），正常测试总是接近1。

    # build a lognormal distribution with scipystats (ss):

    # set parameters (normal distribution mu=100 and sigma=10:
    s, mu, sd, size = 0.5,100,10,100000

    # save the distribution:
    X = ss.lognorm.rvs(s,loc=mu,scale=sd,size=size)

    # convert to normal distribution (i.e. calc the natural log of X):
    Y = np.log(X)

    # Check if Y is normal using ratio between p90-p50 and p50-p10 - should be 1:
    p10,p50,p90 = np.percentile(Y,[10,50,90])
    (p90-p50)/(p50-p10)

在这种情况下，我得到的答案是1. 8左右，即不是正态分布。就像我说的，我显然误解了什么，但我看不出是什么。

总之，如果我使用 ss.lognorm.rvs 来计算一系列的对数正态分布随机变量，其中loc不是0，然后用 np.log 得到随机变量的自然对数，那么这个新的分布就不是正态分布，表面上看，这似乎违反了本问题顶部链接的维基百科文章顶部所描述的规则!

我非常感谢任何人能给我的任何帮助--我只是想确信我明白如何将对数正态数据与正态曲线联系起来!

Answer 1

看看这些方法来检查一下scipy.stats中的工作情况。

In [95]: ss.lognorm(s=0.1).mean()                                                                                                                                                                                  
Out[95]: 1.005012520859401

In [96]: np.exp(0.1**2 / 2)                                                                                                                                                                                        
Out[96]: 1.005012520859401

In [97]: ss.lognorm(s=0.1).var()                                                                                                                                                                                   
Out[97]: 0.010151172942587642

In [98]: (np.exp(0.1**2) - 1) * np.exp(0.1 **2)                                                                                                                                                                    
Out[98]: 0.010151172942587642

我发现scipy.stats的惯例有点混乱每次都要看一遍。

使用scipy.stats和numpy理解对数正态分布和正态分布之间的关系。

问题描述投票：0回答：1

1个回答

最新问题

使用scipy.stats和numpy理解对数正态分布和正态分布之间的关系。

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1