ValueError:试图拟合statsmodels OLS时exog不是1d或2d

问题描述 投票:0回答:1

我试图通过提供具有列名的2个numpy数组来使用statsmodels拟合简单的OLS模型。但是,在尝试拟合模型时,我收到此错误:

ValueError: exog is not 1d or 2d

为了使示例可复制,我使用了sklearn数据集并创建了数组。我的代码是这样的:

import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn import datasets ## imports datasets from scikit-learn
data = datasets.load_boston() ## loads Boston dataset from datasets library

df = pd.DataFrame(data.data, columns=data.feature_names)
Y = pd.DataFrame(data.target, columns=["MEDV"])
Y = Y.to_numpy(dtype=[('MEDV', 'float64')])
X = df.to_numpy(dtype=[('CRIM', 'float64'), ('ZN', 'float64'), ('INDUS', 'float64'), ('CHAS', 'float64'), ('NOX', 'float64'),
                   ('RM', 'float64'), ('AGE', 'float64'), ('DIS', 'float64'), ('RAD', 'float64'), ('TAX', 'float64'),
                   ('PTRATIO', 'float64'), ('B', 'float64'), ('LSTAT', 'float64')])


model = sm.OLS(Y, X).fit()

这没有任何意义,因为我的Y变量是数字的垂直向量,所以肯定是1D或2D。

有人知道我为什么收到此错误吗?

python numpy statsmodels
1个回答
0
投票

简单的解决方法是:

import numpy as np
import pandas as pd

import statsmodels.api as sm
from sklearn import datasets ## imports datasets from scikit-learn
data = datasets.load_boston() ## loads Boston dataset from datasets library

df = pd.DataFrame(data.data, 
                  columns=data.feature_names)

Y = pd.DataFrame(data.target, columns=["MEDV"])

X = df.to_numpy()
Y = Y.to_numpy()

model = sm.OLS(Y, X).fit()

让我们看看两种方法之间的区别:

Y = pd.DataFrame(data.target, columns=["MEDV"])

(Y.to_numpy(dtype=[('MEDV', 'float64')]))[:10]
array([[(24. ,)],
       [(21.6,)],
       [(34.7,)],
       [(33.4,)],
       [(36.2,)],
       [(28.7,)],
       [(22.9,)],
       [(27.1,)],
       [(16.5,)],
       [(18.9,)]], dtype=[('MEDV', '<f8')])
# That is an array of tuples

Y.to_numpy()[:10]
array([[24. ],
       [21.6],
       [34.7],
       [33.4],
       [36.2],
       [28.7],
       [22.9],
       [27.1],
       [16.5],
       [18.9]])
# This is an array of floats

X完全相同。

© www.soinside.com 2019 - 2024. All rights reserved.