我是贝叶斯框架的新手,我已经开始阅读Kruschke的书和BDA课程来了解一些理论来解决我的问题。 我的问题很常见:我收集了一些数据 x,y,我需要根据 x 和 y 估计线性回归的斜率和截距。通常,我会使用“特征值”进行频率分析,置信度为 95%。但我有旧采样数据的音调,我想将其用作新采样数据的先验知识,因此我正在转向贝叶斯框架。 我所知道的是我的 x,y 数据遵循对数正态分布。 由此,我需要创建我的模型。 我的步骤是: 为了测试和建模,我创建了与真实数据非常相似的合成数据,并将其分为先验数据 (80%) 和新数据 (20%)。
import numpy as np
import pymc as pm
import matplotlib.pyplot as plt
import pandas as pd
N = 265
A = 0.438
B = 12.13
x = np.random.uniform(44, 2900, N)
sigma = 167
y = A * x + B + np.random.normal(0, sigma, size=N)
fig, ax = plt.subplots(1,1, figsize=(6,4))
ax.scatter(x,y, label='Data')
plt.xlabel("x")
plt.ylabel('y')
plt.legend()
# All data
data = pd.DataFrame({'x': x, 'y': y})
# x and y are vectors of collected data over the years
total_samples = data.shape[0]
prior_samples = int(0.8 * total_samples)
print(' 80% as priori: {:.0f} observations'.format(prior_samples))
# Randomly shuffle the data (
indices = np.arange(total_samples)
np.random.shuffle(indices)
# Split data into prior knowledge and new data
x_prior = data.x[indices[:prior_samples]]
y_prior = data.y[indices[:prior_samples]]
x_new = data.x[indices[prior_samples:]]
y_new = data.y[indices[prior_samples:]]
然后,我对我的数据拟合了对数正态分布:
prior_mu_x, prior_sigma_x = np.log(x_prior).mean(), np.log(x_prior).std()
print('prior x: μ = {:.2f}, σ = {:.2f}'.format(prior_mu_x, prior_sigma_x))
prior_mu_y, prior_sigma_y = np.log(y_prior).mean(), np.log(y_prior).std()
print('prior y: μ = {:.2f}, σ = {:.2f}'.format(prior_mu_y, prior_sigma_y))
最终创建了一个模型:
with pm.Model() as model:
prior_dist_x = pm.Lognormal('prior_dist_x', mu=prior_mu_x, sigma=prior_sigma_x)
prior_dist_y = pm.Lognormal('prior_dist_y', mu=prior_mu_y, sigma=prior_sigma_y)
slope = pm.Normal('slope', mu=0, sigma=10)
intercept = pm.Normal('intercept', mu=0, sigma=10)
sigma = np.std(y_new, ddof=1)
likelihood = pm.Normal('likelihood', mu=slope * x_new + intercept, sigma=sigma, observed=y_new)
trace = pm.sample(1000, tune=1000, cores=1)
# Plotting
fig,ax = plt.subplots(1,1)
# Create a scatter plot of the data points
ax.scatter(x,y, label='Data')
# Set labels and title
plt.xlabel("x")
plt.ylabel('y')
# Add a legend
plt.legend()
plt.show()
ax = pm.plot_trace(trace)
plt.tight_layout()
plt.show()
我这样做对吗?