我的目标是:使用statsmodel完成线性回归后,提取公式(不仅是系数)。
上下文:我有一个熊猫数据框,
df
x y z
0 0.0 2.0 54.200
1 0.0 2.2 70.160
2 0.0 2.4 89.000
3 0.0 2.6 110.960
我正在使用statsmodels.api]进行线性回归(2个变量,多项式次数= 3),我对此回归感到满意。
OLS Regression Results ============================================================================== Dep. Variable: z R-squared: 1.000 Model: OLS Adj. R-squared: 1.000 Method: Least Squares F-statistic: 2.193e+29 Date: Sun, 31 May 2020 Prob (F-statistic): 0.00 Time: 22:04:49 Log-Likelihood: 9444.6 No. Observations: 400 AIC: -1.887e+04 Df Residuals: 390 BIC: -1.883e+04 Df Model: 9 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const 0.2000 3.33e-11 6.01e+09 0.000 0.200 0.200 x1 2.0000 1.16e-11 1.72e+11 0.000 2.000 2.000 x2 1.0000 2.63e-11 3.8e+10 0.000 1.000 1.000 x3 4.0000 3.85e-12 1.04e+12 0.000 4.000 4.000 x4 12.0000 4.36e-12 2.75e+12 0.000 12.000 12.000 x5 3.0000 6.81e-12 4.41e+11 0.000 3.000 3.000 x6 6.0000 5.74e-13 1.05e+13 0.000 6.000 6.000 x7 13.0000 4.99e-13 2.6e+13 0.000 13.000 13.000 x8 14.0000 4.99e-13 2.81e+13 0.000 14.000 14.000 x9 5.0000 5.74e-13 8.71e+12 0.000 5.000 5.000 ============================================================================== Omnibus: 25.163 Durbin-Watson: 0.038 Prob(Omnibus): 0.000 Jarque-Bera (JB): 28.834 Skew: -0.655 Prob(JB): 5.48e-07 Kurtosis: 2.872 Cond. No. 6.66e+03 ==============================================================================
我需要在python之外实现((ms excel),我想知道该公式。
我知道它是多项式deg3,但我想知道如何知道哪个系数适用于等式。像这样的东西:
例如:x7 coeef是x³,y²,x²y,...?]的系数。
[注:
这是我的问题的简化版本,实际上我有3个变量,deg:3所以有20个系数。这是开始我的案例的简单代码示例:
# %% Question extract formula from linear regresion coeff #Import import numpy as np # version : '1.18.1' import pandas as pd # version'1.0.0' import statsmodels.api as sm # version : '0.10.1' from sklearn.preprocessing import PolynomialFeatures from itertools import product #%% Creating the dummies datas def function_for_df(row): x= row['x'] y= row['y'] return unknow_function(x,y) def unknow_function(x,y): """ This is to generate the datas , of course in reality I do not know the formula """ r =0.2+ \ 6*x**3+4*x**2+2*x+ \ 5*y**3+3*y**2+1*y+ \ 12*x*y + 13*x**2*y+ 14*x*y**2 return r # input data x_input = np.arange(0, 4 , 0.2) y_input = np.arange(2, 6 , 0.2) # create a simple dataframe with dummies datas df = pd.DataFrame(list(product(x_input, y_input)), columns=['x', 'y']) df['z'] = df.apply(function_for_df, axis=1) # In the reality I start from there ! #%% creating the model X = df[['x','y']].astype(float) # Y = df['z'].astype(float) polynomial_features_final= PolynomialFeatures(degree=3) X3 = polynomial_features_final.fit_transform(X) model = sm.OLS(Y, X3).fit() predictions = model.predict(X3) print_model = model.summary() print(print_model) #%% using the model to make predictions, no problem def model_predict(x_sample, y_samples): df_input = pd.DataFrame({ "x":x_sample, "y":y_samples }, index=[0]) X_input = polynomial_features_final.fit_transform(df_input) prediction = model.predict(X_input) return prediction print("prediction for x=2, y=3.2 :" ,model_predict(2 ,3.2)) # How to extract the formula for the "model" ? #Thanks
旁注:
类似于糊状ModelDesc所给出的描述会很好:
from patsy import ModelDesc ModelDesc.from_formula("y ~ x") # or even better : desc = ModelDesc.from_formula("y ~ (a + b + c + d) ** 2") desc.describe()
但是我无法在我的模型和patsy.ModelDesc之间架起桥梁。感谢您的帮助。
我的目标是:使用statsmodel完成线性回归后,提取公式(不仅是系数)。上下文:我有一个熊猫数据帧,df x y z 0 0.0 2.0 54.200 1 0.0 ...
正如约瑟夫(Josef)在评论中所说,我不得不看:sklearn PolynomialFeature。
然后我找到了这个答案: