Python时间序列线性回归模型结果与Excel中的线性回归不匹配?

问题描述 投票:0回答:1

我使用seaborn和sklearn为时间序列数据集创建线性回归模型。对于简单的线性模型,两个模型(seaborn 和 sklearn)输出相同的斜率和截距

y = mx + b
。斜率与 Excel 结果相符。然而,与 excel 中的
-35874.5873
相比,在 python 中使用任一方法的截距是非常不同的
-1404.3

我的 python 代码中的设置是否不正确?模型计算是否不同?

这是来自Excel的数据。

Date Column: 
1/1/2002
4/1/2002
7/1/2002
10/1/2002
1/1/2003
4/1/2003
7/1/2003
10/1/2003
1/1/2004

Bicarbonate Column:
446
450
454
483
457
465
465
474
495

Python脚本如下:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy
from sklearn import linear_model

df = pd.read_excel(r'TestData.xls')
print(df)

Bicarbonate = df['Bicarbonate']
Date = df['Date']
DateO = df['Date'].apply(lambda x: x.toordinal())
df['DateO'] = DateO
print(DateO)

# Plotting the regression model with seaborn
ax1 = sns.regplot(x = 'DateO', y = 'Bicarbonate', data=df, color='magenta', label='Linear Model', ci =None, scatter=True)


# calculate slope and intercept of regression equation.
slope, intercept, r, p, se = scipy.stats.linregress(x=ax1.get_lines()[0].get_xdata(),
                                                       y=ax1.get_lines()[0].get_ydata())

print(slope)
print(intercept)
print(p)

# Linear Regression with sklearn.
x = df['DateO'].values.reshape(-1, 1)
y = df['Bicarbonate'].values.reshape(-1,1)
model = linear_model.LinearRegression().fit(x,y)
print('intercept:', model.intercept_)
print('slope:', model.coef_)
python scikit-learn seaborn linear-regression
1个回答
0
投票
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy
from sklearn import linear_model
import yfinance as yf  # for 

# download sample data
df = pd.concat((yf.download(ticker, start='1970-02-01', end='2023-09-06').assign(tkr=ticker) for ticker in ['pfe']), ignore_index=False).reset_index()

Bicarbonate = df['Open']
Date = df['Date']
DateO = df['Date'].apply(lambda x: x.toordinal())
df['DateO'] = DateO

# Plotting the regression model with seaborn
fig, ax1 = plt.subplots(figsize=(12, 8))
sns.regplot(x = 'DateO', y = 'Open', data=df, color='magenta', label='Linear Model', ci =None, scatter=True, ax=ax1)

# calculate slope and intercept of regression equation.
slope, intercept, r, p, se = scipy.stats.linregress(x=ax1.get_lines()[0].get_xdata(), y=ax1.get_lines()[0].get_ydata())

print('scipy intercept:', intercept)
print('scipy slope:', slope)

# Linear Regression with sklearn.
x = df['DateO'].values.reshape(-1, 1)
y = df['Open'].values.reshape(-1,1)
model = linear_model.LinearRegression().fit(x,y)
print('sklearn intercept:', model.intercept_)
print('sklearn slope:', model.coef_)

# test data with x to 0
x_test = np.arange(0, ax1.get_xlim()[1]).reshape(-1, 1)

# predicted y values
y_pred = model.predict(x_test)

# plot
ax1.plot(x_test, y_pred)

fig.suptitle(f'At x=0 the Linear Model cross y at {round(intercept)}')

ax1.margins(0)
scipy intercept: -1757.1923682739996
scipy slope: 0.002432274440853285
sklearn intercept: [-1757.19236827]
sklearn slope: [[0.00243227]]

© www.soinside.com 2019 - 2024. All rights reserved.