将非线性单变量回归拟合到时间序列数据

Question

我最近开始使用 python 进行机器学习。下面是我作为示例选取的数据集以及我迄今为止所处理的代码。选择[2000...2015]作为测试数据和训练数据[2016,2017]。

Dataset  
      Years        Values
    0    2000      23.0
    1    2001      27.5
    2    2002      46.0
    3    2003      56.0
    4    2004      64.8
    5    2005      71.2
    6    2006      80.2
    7    2007      98.0
    8    2008     113.0
    9    2009     155.8
    10   2010     414.0
    11   2011    2297.8
    12   2012    3628.4
    13   2013   16187.8
    14   2014   25197.8
    15   2015   42987.8
    16   2016   77555.5
    17   2017  130631.9

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

df = pd.DataFrame([[i for i in range(2000,2018)], 
[23.0,27.5,46.0,56.0,64.8,71.2,80.2,98.0,113.0,155.8,414.0,2297.8,3628.4,16187.8,25197.8,42987.8,77555.5,130631.9]])


df = df.T
df.columns = ['Years', 'Values']

上面的代码创建了DataFrame。另一件需要记住的重要事情是我的

Years

列是一个时间序列，而不仅仅是一个连续值。我没有做任何改变来适应这个。

我想要拟合非线性模型，这可能有助于并打印绘图，就像我为线性模型示例所做的那样。这是我使用线性模型尝试过的。另外，在我自己的示例中，我似乎没有考虑到我的

Years

列是一个时间序列而不是连续的事实。

一旦我们有了模型，希望用它来预测至少未来几年的值。

X = df.iloc[:, :-1].values
y = df.iloc[:, 1].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.1, random_state = 0, shuffle = False)
lm = LinearRegression()
lm.fit(X_train, y_train)
y_pred = lm.predict(X_test)
plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, lm.predict(X_train), color = 'blue')
plt.title('Years vs Values (training set)')
plt.xlabel('Years')
plt.ylabel('Values')
plt.show()

Answer 1

试试这个。您也可以打印预测值。预计5年。

import numpy.polynomial.polynomial as poly
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

df = pd.DataFrame([[i for i in range(2000,2018)],
[23.0,27.5,46.0,56.0,64.8,71.2,80.2,98.0,113.0,155.8,414.0,2297.8,3628.4,16187.8,25197.8,42987.8,77555.5,130631.9]])
df = df.T
df.columns = ['Year', 'Values']
df['Year'] = df['Year'].astype(int)
df['Values'] = df['Values'].astype(int)
no_of_predictions = 5


X = np.array(df.Year, dtype = float)
y = np.array(df.Values, dtype = float)
Z = [2019,2020,2021,2022]
coefs = poly.polyfit(X, y, 4)
X_new = np.linspace(X[0], X[-1]+no_of_predictions, num=len(X)+no_of_predictions)
ffit = poly.polyval(X_new, coefs)
pred = poly.polyval(Z, coefs)
predictions = pd.DataFrame(Z,pred)
print(predictions)
plt.plot(X, y, 'ro', label="Original data")
plt.plot(X_new, ffit, label = "Fitted data")
plt.legend(loc='upper left')
plt.show()

Answer 2

编辑：我的答案是错误的，我已经习惯了分类器而不是回归器；不删除它是因为我害怕自己被禁止发布更多答案。不要使用这个答案。

试试这个

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

df = pd.DataFrame([[i for i in range(2000,2018)], 
[23.0,27.5,46.0,56.0,64.8,71.2,80.2,98.0,113.0,155.8,414.0,2297.8,3628.4,16187.8,25197.8,42987.8,77555.5,130631.9]])


df = df.T
df.columns = ['Year', 'Values']
df['Year'] = df['Year'].astype(int)
df['Values'] = df['Values'].astype(int)

你的数据框

X = df[['Year']]
y = df[['Values']]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.1, random_state = 0, shuffle = False)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

clf = RandomForestClassifier(n_estimators=10)
clf.fit(X_train, y_train)


y_pred = clf.predict(X_test)

plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, clf.predict(X_train), color = 'blue')
plt.title('Years vs Values (training set)')
plt.xlabel('Years')

plt.xticks(rotation=90)
plt.ylabel('Values')
plt.show()

Answer 3

同时我也尝试过

import numpy.polynomial.polynomial as poly
X = np.array(df.Years, dtype = float)
y = np.array(df.Values, dtype = float)
coefs = poly.polyfit(X, y, 4)
X_new = np.linspace(X[0], X[-1], num=17)
ffit = poly.polyval(X_new, coefs)
plt.plot(X, y, 'ro', label="Original data")
plt.plot(X_new, ffit, label = "Fitted data")
plt.legend(loc='upper left')
plt.show()

它确实几乎完美贴合。但现在我不清楚如何使用这些来预测未来五年的价值。

将非线性单变量回归拟合到时间序列数据

问题描述投票：0回答：3

3个回答

最新问题

将非线性单变量回归拟合到时间序列数据

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3