如何在 Statsmodels Logit 模型中同时设置多个变量的偏移量?

问题描述 投票:0回答:1

我正在尝试使用 statsmodels.discrete.discrete_model.Logit 训练一个 logit 模型,其中某些变量的系数已知,但需要为其他变量计算。我能够让代码只偏移一个变量,但一直无法弄清楚如何同时对多个变量执行此操作。

这适用于单个变量偏移量:

import numpy as np
import pandas as pd
import statsmodels.discrete.discrete_model as smdm

df = pd.DataFrame(np.random.randn(8, 4), columns=list('yxza'))
labels = np.random.randint(2, size=8)

known = 0.2

model_train = smdm.Logit(labels, df[['y', 'x', 'a']], offset=known*df['z']).fit()

但这不适用于多个偏移量:

import numpy as np
import pandas as pd
import statsmodels.discrete.discrete_model as smdm

df = pd.DataFrame(np.random.randn(8, 4), columns=list('yxza'))
labels = np.random.randint(2, size=8)

known = [0.2, 0.1]

model_train = smdm.Logit(labels, df[['y', 'x']], offset=known*df[['z', 'a']]).fit()

它产生以下错误:

ValueError: Unable to coerce to Series, length must be 2: given 8

我尝试了几种不同的方法来设置偏移量变量,例如 offset=[0.2df['z'], 0.1df['a']] 但我总是遇到异常。

python logistic-regression offset statsmodels
1个回答
0
投票

感谢@Josef 的评论,我能够让它工作。代码如下:

import numpy as np
import pandas as pd
import statsmodels.discrete.discrete_model as smdm

df = pd.DataFrame(np.random.randn(8, 4), columns=list('yxza'))
known = 0.2 * df['z'] + 0.1 * df['a']

model_train = smdm.Logit(labels, df[['y', 'x']], offset=known).fit()
© www.soinside.com 2019 - 2024. All rights reserved.