我在模型拟合方面的知识有限,在时间序列数据集上拟合
python
中的模型时遇到问题。这是我的数据示例:
我能够将
Muliple linear regression model
放入我的数据中:
y = ax1 + bx2 + c
但是,
R Squared
值低于50%。我想要一个更强大但简单的模型。通常,我所在领域的先前文献在模型中使用指数项和线性项的组合来拟合此类数据,例如:
y = a*exp(-bx1) + c*x2 + d
我已经尝试了
Generalized Linear Model (GLM)
中的 statsmodels
假设 Poisson
分布如下:
import pandas as pd
import statsmodels.api as sm
df = pd.read_csv("G:\Sample.csv") # G:\ for file location
# defining independent and dependent variables
X = df[['x1','x2']]
y = df['y']
# Model
model = sm.GLM(y, X,family=sm.families.Poisson())
results = model.fit()
print(results.summary())
然而,这不是我想要的模型:
即
y = a*exp(-bx1) + c*x2 + d
为什么不将
x1
转换为 np.exp(-x1)
并添加 1
作为常量?
X = np.c_[np.exp(-df['x1']), df['x2'], np.repeat(1, len(df))]
y = df['y']
model = sm.GLM(y, X,family=sm.families.Poisson())
results = model.fit()
print(results.summary())
输出:
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: y No. Observations: 3716
Model: GLM Df Residuals: 3713
Model Family: Poisson Df Model: 2
Link Function: Log Scale: 1.0000
Method: IRLS Log-Likelihood: -7953.6
Date: Mon, 1 Jan 2023 Deviance: 1908.1
Time: 00:00:00 Pearson chi2: 1.79e+03
No. Iterations: 4 Pseudo R-squ. (CS): 0.003668
Covariance Type: nonrobust
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
x1 3.998e-08 6.25e-08 0.640 0.522 -8.25e-08 1.62e-07
x2 -0.0077 0.002 -3.638 0.000 -0.012 -0.004
const 2.3700 0.117 20.262 0.000 2.141 2.599
==============================================================================