使用PyMC3的贝叶斯:PatsyError

问题描述 投票:1回答:1

我正在尝试使用PyMC3来应用贝叶斯线性回归。我想根据一些测量值来预测年龄。我找到了一个很棒的示例,并希望将其应用于一些数据。下面是代码。

import pandas as pd
import numpy as np
import pymc3 as pm
from sklearn.model_selection import train_test_split

data = pd.read_csv('data.csv')
X = data.drop(['User_ID','Gender','Age'], axis = 1)   # the features
Y = data['Age']  
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)
print(X_train.shape)
print(X_test.shape)

Formula = 'Age ~ ' + ' + '.join(['%s' % variable for variable in X_train.columns[0:]])
print(Formula)

with pm.Model() as normal_model:    
   f = pm.glm.families.Normal()    
   pm.GLM.from_formula(Formula, data = X_train, family = f)   
   normal_trace = pm.sample(draws=2000, chains = 2, tune = 500)

运行时,出现此错误

PatsyError: Error evaluating factor: NameError: name 'Age' is not defined
Age ~ Height + Weight + Duration + Heart_Rate + Body_Temp + Calories
^^^

但是如果我将Age保留在X中,则效果很好,但是在这种情况下,Age也包含在公式中,这不应该是因为Age是因变量,而其他是自变量。知道如何解决吗?在此先感谢

python bayesian pymc3 pymc
1个回答
0
投票

要使用pm.GLM.from_formula()方法,DataFrame data参数必须包含所有变量(预测变量和响应)。修改当前代码以执行此操作的一种简单方法是重新附加响应变量:

pm.GLM.from_formula(Formula, data=pd.concat([X_train, y_train], axis=1), family=f)
© www.soinside.com 2019 - 2024. All rights reserved.