使用 iris 数据集验证贝叶斯逻辑回归模型

问题描述 投票:0回答:0

我们使用 Pyro 和 sklearn 环境在 Python 中建立了贝叶斯逻辑回归模型。该模型使用包含分类变量和连续变量的数据集进行训练和测试。该代码首先预处理数据集(使用 LabelEncoder 处理分类变量)。然后代码定义了函数:

# Model definition
    def logistic_regression(X_categorical, X_continuous):
        num_categories = X_categorical.shape[1]
        num_predictors = num_categories + X_continuous.shape[1]
        alpha = pyro.sample("alpha", dist.Normal(alpha_prior_mean, alpha_prior_scale))
        beta_categorical = pyro.sample("beta_categorical",
                                       dist.Normal(beta_categorical_prior_means,
                                                   beta_categorical_prior_scales))
        beta_continuous = pyro.sample("beta_continuous",
                              dist.Normal(beta_continuous_prior_means.unsqueeze(1),
                                          beta_continuous_prior_scales.unsqueeze(1)))
        
        # beta_categorical = beta_categorical.transpose(0, 1)  # Transpose second and first dimensions
        logits = alpha + torch.matmul(X_categorical, beta_categorical) + torch.matmul(X_continuous, beta_continuous)
        probs = torch.sigmoid(logits)
        beta_categorical = beta_categorical.to(torch.float)
        beta_continuous = beta_continuous.to(torch.float)
        y_obs = pyro.sample("y", dist.Bernoulli(probs=probs), obs=y) #dist.Multinomial(100, probs)

    # Inference
    nuts_kernel = NUTS(logistic_regression)
    mcmc = MCMC(nuts_kernel, num_samples=50, warmup_steps=10, num_chains=1)
    mcmc.run(X_categorical, X_continuous)

    # Get posterior samples
    posterior_samples = mcmc.get_samples()
    alpha_samples = posterior_samples['alpha']
    beta_categorical_samples = posterior_samples['beta_categorical']
    beta_continuous_samples = posterior_samples['beta_continuous']
    
    
    return alpha_samples, beta_categorical_samples, beta_continuous_samples, posterior_samples
# Estimate priors
    alpha_prior_mean = torch.tensor(0.0)
    alpha_prior_scale = torch.tensor(1.0)
    beta_categorical_prior_means = torch.tensor([0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])
    beta_categorical_prior_scales = torch.tensor([1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0])
    # Fix the data types and dimensions of beta_continuous_prior_means and beta_continuous_prior_scales
    beta_continuous_prior_means = torch.tensor([0.0, 0.0, 0.0, 0.0], dtype=torch.float32)
    beta_continuous_prior_scales = torch.tensor([1.0, 1.0, 1.0, 1.0], dtype=torch.float32)

# Run model 
 alpha_samples, beta_categorical_samples, beta_continuous_samples, posterior_samples  = bayesian_logistic_regression(y_train, X_categorical, X_continuous)

该模型的目标是预测响应变量,同时也是一种可以轻松修改输入参数的附加或更改的工具。

该模型的结果返回预测 all = 0(对于二元响应变量),根据样本,这是不正确的。我当前的假设是我设置模型定义和使用 NUTS MCMC 采样方法的方式存在问题。

我的问题是:我是否错误地设置了这个模型,或者是否有更好的方法来编码贝叶斯逻辑回归?

我尝试在 Iris 数据集上使用这种方法。我已经更新了代码以使用以前论文中使用的输入参数。但是,我的代码仍然预测此数据集上的响应变量为 0。我们再次知道这是错误的,因为我们有模型中使用的输入参数的结果。

logistic-regression bayesian iris-dataset
© www.soinside.com 2019 - 2024. All rights reserved.