在 GPBoost 的 GPModel 中创建 group_rand_coef_data 的适当方法

问题描述 投票:0回答:1

下面是我从 GPBoost 站点复制的代码。 https://github.com/fabsig/GPBoost/blob/master/examples/python-guide/generalized_linear_Gaussian_process_mixed_effects_models.py

我认为在现实生活中运行此行

x
中的混合效果模型时,
group_rand_coef_data
的输入变量
gpb.GPModel(group_data=group_data, group_rand_coef_data=x,
是未知的,不像本教程代码(如下),其中
x
是已知/人造的。

x
未知的现实生活中,我能用什么来
group_rand_coef_data

import gpboost as gpb
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
plt.style.use('ggplot')

def simulate_response_variable(lp, rand_eff, likelihood):
    """Function that simulates response variable for various likelihoods"""
    n = len(rand_eff)
    if likelihood == "gaussian":
        xi = 0.1**0.5 * np.random.normal(size=n) # error term, variance = 0.1
        y = lp + rand_eff + xi
    return y

likelihood = "gaussian"

"""
Grouped random effects
"""
# --------------------Simulate data----------------
# Single-level grouped random effects
n = 1000  # number of samples
m = 200  # number of categories / levels for grouping variable
group = np.arange(n)  # grouping variable
for i in range(m):
    group[int(i * n / m):int((i + 1) * n / m)] = i
np.random.seed(1)
b = 0.25**0.5 * np.random.normal(size=m)  # simulate random effects, variance = 0.25
# Simulate linear regression fixed effects
X = np.column_stack((np.ones(n), np.random.uniform(size=n) - 0.5)) # design matrix / covariate data for fixed effect
beta = np.array([0, 2]) # regression coefficents
lp = X.dot(beta)

# Crossed grouped random effects and random slopes
group_crossed = group[np.random.permutation(n)-1] # grouping variable for crossed random effects
b_crossed = 0.25**0.5 * np.random.normal(size=m)  # simulate crossed random effects
b_random_slope = 0.25**0.5 * np.random.normal(size=m)
x = np.random.uniform(size=n)  # covariate data for random slope
rand_eff = b[group] + b_crossed[group_crossed] + x * b_random_slope[group]
rand_eff = rand_eff - np.mean(rand_eff)
y_crossed_random_slope = simulate_response_variable(lp=lp, rand_eff=rand_eff, likelihood=likelihood)

# --------------------Two crossed random effects and random slopes----------------
# Define and train model
group_data = np.column_stack((group, group_crossed))
gp_model = gpb.GPModel(group_data=group_data, group_rand_coef_data=x,
                       ind_effect_group_rand_coef=[1], likelihood=likelihood)
# 'ind_effect_group_rand_coef=[1]' indicates that the random slope is for the first random effect
gp_model.fit(y=y_crossed_random_slope, X=X, params={"std_dev": True})
gp_model.summary()
# Prediction
pred = gp_model.predict(group_data_pred=group_data, group_rand_coef_data_pred=x, X_pred=X)

# Obtain predicted (="estimated") random effects for the training data
all_training_data_random_effects = gp_model.predict_training_data_random_effects()
first_occurences_1 = [np.where(group==i)[0][0] for i in np.unique(group)]
pred_random_effects = all_training_data_random_effects.iloc[first_occurences_1,0]
pred_random_slopes = all_training_data_random_effects.iloc[first_occurences_1,2]
# Compare true and predicted random effects
plt.scatter(b, pred_random_effects, label="Random effects")
plt.scatter(b_random_slope, pred_random_slopes, label="Random slopes")
plt.legend()
plt.title("Comparison of true and predicted random effects")
plt.show(block=False)

结果

=====================================================
Model summary:
 Log-lik    AIC     BIC
  -823.6 1659.2 1688.64
Nb. observations: 1000
Nb. groups: 200 (Group_1), 200 (Group_2)
-----------------------------------------------------
Covariance parameters (random effects):
                        Param.  Std. dev.
Error_term              0.1044     0.0067
Group_1                 0.1918     0.0263
Group_2                 0.2709     0.0301
Group_1_rand_coef_nb_1  0.2288     0.0527
-----------------------------------------------------
Linear regression coefficients (fixed effects):
             Param.  Std. dev.  z value  P(>|z|)
Covariate_1 -0.0051      0.051  -0.0995   0.9208
Covariate_2  1.9975      0.048  41.6060   0.0000
=====================================================

情节 output plot

python mixed-models
1个回答
0
投票

group_rand_coef_data
仅用于随机系数/斜率模型。这是混合效应模型的一种特殊形式。否则,您可以忽略该参数。我想这也适用于你。如果您确实想使用随机斜率模型,则需要了解协变量数据(无论是模拟数据还是真实数据)。

© www.soinside.com 2019 - 2024. All rights reserved.