我试图通过提供具有列名的2个numpy数组来使用statsmodels拟合简单的OLS模型。但是,在尝试拟合模型时,我收到此错误:
ValueError: exog is not 1d or 2d
为了使示例可复制,我使用了sklearn数据集并创建了数组。我的代码是这样的:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn import datasets ## imports datasets from scikit-learn
data = datasets.load_boston() ## loads Boston dataset from datasets library
df = pd.DataFrame(data.data, columns=data.feature_names)
Y = pd.DataFrame(data.target, columns=["MEDV"])
Y = Y.to_numpy(dtype=[('MEDV', 'float64')])
X = df.to_numpy(dtype=[('CRIM', 'float64'), ('ZN', 'float64'), ('INDUS', 'float64'), ('CHAS', 'float64'), ('NOX', 'float64'),
('RM', 'float64'), ('AGE', 'float64'), ('DIS', 'float64'), ('RAD', 'float64'), ('TAX', 'float64'),
('PTRATIO', 'float64'), ('B', 'float64'), ('LSTAT', 'float64')])
model = sm.OLS(Y, X).fit()
这没有任何意义,因为我的Y变量是数字的垂直向量,所以肯定是1D或2D。
有人知道我为什么收到此错误吗?
简单的解决方法是:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn import datasets ## imports datasets from scikit-learn
data = datasets.load_boston() ## loads Boston dataset from datasets library
df = pd.DataFrame(data.data,
columns=data.feature_names)
Y = pd.DataFrame(data.target, columns=["MEDV"])
X = df.to_numpy()
Y = Y.to_numpy()
model = sm.OLS(Y, X).fit()
让我们看看两种方法之间的区别:
Y = pd.DataFrame(data.target, columns=["MEDV"])
(Y.to_numpy(dtype=[('MEDV', 'float64')]))[:10]
array([[(24. ,)],
[(21.6,)],
[(34.7,)],
[(33.4,)],
[(36.2,)],
[(28.7,)],
[(22.9,)],
[(27.1,)],
[(16.5,)],
[(18.9,)]], dtype=[('MEDV', '<f8')])
# That is an array of tuples
Y.to_numpy()[:10]
array([[24. ],
[21.6],
[34.7],
[33.4],
[36.2],
[28.7],
[22.9],
[27.1],
[16.5],
[18.9]])
# This is an array of floats
X
完全相同。