我想将数据拟合到 3 参数威布尔分布,但在我的图中,原始数据和拟合值之间始终存在差距。我有什么错吗?为什么当我使用这些initial_params = [1,1,1]时我会得到如此最差的拟合?
这是我的代码:
import numpy as np
from scipy.optimize import minimize
from scipy.special import gamma
import matplotlib.pyplot as plt
from scipy.stats import weibull_min
def weibull_log_likelihood(params, data):
shape, scale, loc = params
log_likelihood = -np.sum(weibull_min.logpdf(data, shape, loc=loc, scale=scale))
return -log_likelihood
def estimate_weibull_params(data):
initial_params = [shape, scale, loc]
bounds=[(1, None), (1, None), (1, None)]
result = minimize(weibull_log_likelihood, initial_params, args=(data,), method='nelder-mead', bounds=bounds)
return result.x
def weibull_pdf(x, shape, scale, loc):
return (shape / scale) * ((x - loc) / scale) ** (shape - 1) * np.exp(-((x - loc) / scale) ** shape)
shape = 7.5
scale = 150
loc = 350
size = 100
data = weibull_min.rvs(shape, loc=loc, scale=scale, size=size)
estimated_params = estimate_weibull_params(data)
shape, scale, loc = estimated_params
print(f"Estimated Parameters: Shape = {shape}, Scale = {scale}, Location = {loc}")
x = np.arange(1000)
pdf = weibull_pdf(x, shape, scale, loc)
plt.hist(data, bins=20, density=True, alpha=0.6, color='g')
plt.plot(x, pdf, 'r-', lw=2)
plt.xlim(0, 1000)
plt.show()
威布尔分布是一种非常极端的分布。它涉及权力的权力,而且会迅速失控。
如果您将 log_likelihood 函数更改为有效的 log(-log(pdf)) 那么您可能会得到更好的结果:
def weibull_log_likelihood(params, data):
shape, scale, loc = params
log_minus_log_likelihood = np.sum(np.log( -weibull_min.logpdf(data, shape, loc=loc, scale=scale)))
return log_minus_log_likelihood
我建议您也将最初的猜测改进为更符合数据的内容:
initial_params = [ 1, np.max( data ) - np.min( data ), np.min( data ) ]
注意,我认为这不是一个寻找参数
loc
的特别好的方法,因为威布尔分布涉及到一个项((x-loc)/scale)^shape,
并且对 loc 的自由猜测可以让您计算负数的非整数幂。
Estimated Parameters: Shape = 5.383509295480495, Scale = 98.8928064025068, Location = 395.6676730395027
(显然,考虑到您的随机数据和相对较小的样本量,您每次都会得到不同的数字。)