交替时间的机器学习

问题描述 投票:0回答:1

[我有一个多项式回归脚本,可以正确地预测X和Y轴的值,在我的示例中我使用CPU消耗,下面我们看到一个数据集示例:

enter image description here

Complete data set

time代表收集时间,例如:

1 = 1 minute
2 = 2 minute

依此类推...

[consume是该分钟内cpu的使用值,总结此数据集可演示主机在30分钟内的行为,每个值对应于升序为1分钟(1min,2min,3min)。 ..)

此结果是:

enter image description here

使用此算法:

# -*- coding: utf-8 -*-

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('data.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Fitting Polynomial Regression to the dataset
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree=4)
X_poly = poly_reg.fit_transform(X)
pol_reg = LinearRegression()
pol_reg.fit(X_poly, y)

# Visualizing the Polymonial Regression results
def viz_polymonial():
    plt.scatter(X, y, color='red')
    plt.plot(X, pol_reg.predict(poly_reg.fit_transform(X)), color='blue')
    plt.title('Polynomial Regression for CPU')
    plt.xlabel('Time range')
    plt.ylabel('Consume')
    plt.show()
    return
viz_polymonial()

# 20 = time
print(pol_reg.predict(poly_reg.fit_transform([[20]])))

出什么问题?

如果我们复制此数据集以使30分钟范围出现2倍,则该算法将无法理解该数据集,并且其结果也不会那么有效,例如该数据集:

enter image description here->最高time = 30enter image description here->最多time = 30

Complete data set

注意:在有60个值的情况下,每30个值代表30分钟的范围,就好像它们是不同的收集日期。

显示的结果是这个:

enter image description here

Objective:我希望代表多项式回归的蓝线类似于第一个结果图像,我们在上面看到的演示了一个循环,其中的点相互连接,就好像算法具有失败。

Research source

python machine-learning regression sklearn-pandas polynomials
1个回答
0
投票

问题是,在第二种情况下,您使用X = 1、2,...,30、1、2,... 30进行绘图。绘图函数连接了连续的点。如果只是使用pyplot绘制散点图,那么您会看到漂亮的回归曲线。或者您可以argsort。这是散布为绿色,argsort行为黑色的代码。

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression

# Importing the dataset
# dataset = pd.read_csv('data.csv')
dataset = pd.read_csv('data.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Fitting Polynomial Regression to the dataset
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree=4)
X_poly = poly_reg.fit_transform(X)
pol_reg = LinearRegression()
pol_reg.fit(X_poly, y)

# Visualizing the Polymonial Regression results
def viz_polymonial():
    plt.scatter(X, y, color='red')
    indices = np.argsort(X[:, 0])
    plt.scatter(X, pol_reg.predict(poly_reg.fit_transform(X)), color='green')
    plt.plot(X[indices], pol_reg.predict(poly_reg.fit_transform(X))[indices], color='black')
    plt.title('Polynomial Regression for CPU')
    plt.xlabel('Time range')
    plt.ylabel('Consume')
    plt.show()
    return
viz_polymonial()

# 20 = time
print(pol_reg.predict(poly_reg.fit_transform([[20]])))

这里是较大数据集的输出图像。enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.