我正在尝试对一堆类和日期时间进行时间序列预测,但由于某种原因我的图表看起来像这样,我的完整代码如下:
from google.colab import drive
drive.mount('/content/gdrive', force_remount = True)
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
data = pd.read_csv('gdrive/My Drive/Colab_Notebooks/classproject/classdata.csv', parse_dates=['time_date'], index_col='time_date')
class_id = data['class_id']
time_date = data.index.date
data['date'] = data.index.date
class_id = data['class_id']
time_date = data.index.to_series()
m1 = class_id.ne(class_id.shift())
m2 = time_date.dt.date.ne(time_date.dt.date.shift())
data['count'] = data.groupby((m1 | m2).cumsum()).cumcount().add(1).values
out = data[data.groupby(data.index.date).transform('size').gt(1)]
!pip install pandas-datareader
import pandas_datareader.data as web
import datetime
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
plt.ylabel('Amount of classes')
plt.xlabel('Date')
plt.xticks(rotation=45)
out.index = pd.to_datetime(out['date'], format='%Y-%m-%d')
plt.plot(out.index, out['count'], )
而我从中获得此时间序列代码的博客有这种结果
所以我不确定是否应该继续XD
我的输入数据是这样的:
时间戳/class_id
2021-09-27 06:00:00 / A
2021-09-27 03:00:00 / A
2021-09-27 01:00:00 / A
2021-09-27 08:29:00 / C
2021-05-23 08:08:49 / B
2021-05-23 03:21:49 / B
2021-05-23 01:22:11 / C
处理并添加计数和日期列后:
计数/时间戳/class_id/日期
1 / 2021-09-27 06:00:00 / A / 2021-09-27
2 / 2021-09-27 03:00:00 / A / 2021-09-27
3 / 2021-09-27 01:00:00 / A / 2021-09-27
1 / 2021-09-27 08:29:00 / C / 2021-09-27
1 / 2021-05-23 08:08:49 / B / 2021-05-23
2 / 2021-05-23 03:21:49 / B / 2021-05-23
1 / 2021-05-23 01:22:11 / C / 2021-05-23
您正在同时绘制所有
class_id
。尝试使用类似 out.groupby('class_id').plot()
的内容按类别进行绘图,看看每个类别的绘图是否有意义并且看起来是否符合您的预期。