我们使用 Pyro 和 sklearn 环境在 Python 中建立了贝叶斯逻辑回归模型。该模型使用包含分类变量和连续变量的数据集进行训练和测试。该代码首先预处理数据集(使用 LabelEncoder 处理分类变量)。然后代码定义了函数:
# Model definition
def logistic_regression(X_categorical, X_continuous):
num_categories = X_categorical.shape[1]
num_predictors = num_categories + X_continuous.shape[1]
alpha = pyro.sample("alpha", dist.Normal(alpha_prior_mean, alpha_prior_scale))
beta_categorical = pyro.sample("beta_categorical",
dist.Normal(beta_categorical_prior_means,
beta_categorical_prior_scales))
beta_continuous = pyro.sample("beta_continuous",
dist.Normal(beta_continuous_prior_means.unsqueeze(1),
beta_continuous_prior_scales.unsqueeze(1)))
# beta_categorical = beta_categorical.transpose(0, 1) # Transpose second and first dimensions
logits = alpha + torch.matmul(X_categorical, beta_categorical) + torch.matmul(X_continuous, beta_continuous)
probs = torch.sigmoid(logits)
beta_categorical = beta_categorical.to(torch.float)
beta_continuous = beta_continuous.to(torch.float)
y_obs = pyro.sample("y", dist.Bernoulli(probs=probs), obs=y) #dist.Multinomial(100, probs)
# Inference
nuts_kernel = NUTS(logistic_regression)
mcmc = MCMC(nuts_kernel, num_samples=50, warmup_steps=10, num_chains=1)
mcmc.run(X_categorical, X_continuous)
# Get posterior samples
posterior_samples = mcmc.get_samples()
alpha_samples = posterior_samples['alpha']
beta_categorical_samples = posterior_samples['beta_categorical']
beta_continuous_samples = posterior_samples['beta_continuous']
return alpha_samples, beta_categorical_samples, beta_continuous_samples, posterior_samples
# Estimate priors
alpha_prior_mean = torch.tensor(0.0)
alpha_prior_scale = torch.tensor(1.0)
beta_categorical_prior_means = torch.tensor([0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])
beta_categorical_prior_scales = torch.tensor([1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0])
# Fix the data types and dimensions of beta_continuous_prior_means and beta_continuous_prior_scales
beta_continuous_prior_means = torch.tensor([0.0, 0.0, 0.0, 0.0], dtype=torch.float32)
beta_continuous_prior_scales = torch.tensor([1.0, 1.0, 1.0, 1.0], dtype=torch.float32)
# Run model
alpha_samples, beta_categorical_samples, beta_continuous_samples, posterior_samples = bayesian_logistic_regression(y_train, X_categorical, X_continuous)
该模型的目标是预测响应变量,同时也是一种可以轻松修改输入参数的附加或更改的工具。
该模型的结果返回预测 all = 0(对于二元响应变量),根据样本,这是不正确的。我当前的假设是我设置模型定义和使用 NUTS MCMC 采样方法的方式存在问题。
我的问题是:我是否错误地设置了这个模型,或者是否有更好的方法来编码贝叶斯逻辑回归?
我尝试在 Iris 数据集上使用这种方法。我已经更新了代码以使用以前论文中使用的输入参数。但是,我的代码仍然预测此数据集上的响应变量为 0。我们再次知道这是错误的,因为我们有模型中使用的输入参数的结果。