训练用于时间序列预测的 LSTM 模型

问题描述 投票:0回答:0

我正在尝试使用 Kaggle 上可用的“USGS 河流洪水预报数据”数据集的 LSTM 模型预测水位高度(https://www.kaggle.com/datasets/rishavclemson/usgs-river-data-for-flood -预测)。我对数据进行了预处理,将其重新采样为每小时频率,在 0 和 1 之间缩放,并创建回溯为 5 的时间序列数据。

我还构建了一个具有多层和最佳超参数的 LSTM 模型,并尝试在训练数据上对其进行训练。但是,我得到负的 Nash-Sutcliffe 模型效率系数,这表明我的模型的性能低于观察数据的平均值。

我已经使用 Keras 后端实现了自定义 Nash-Sutcliffe 指标,并在训练期间打印真实值和预测值以验证输入是否正确。但是,我无法弄清楚为什么我的模型表现不佳以及如何提高它的性能。

这是我的 LSTM 模型的代码:

# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
import tensorflow as tf

# Load the data and preprocess it
data = pd.read_csv('02172035.tsv',parse_dates=True,sep='\t')
data.set_index('datetime', inplace=True)
data.index = pd.to_datetime(data.index)
data = data.resample('15T').sum()
data = data.bfill()
data = data.resample('1H').agg({
    'Precipitation': 'sum',
    'Discharge': 'first',
    'Gage Height': 'first'
})
data = data[['Precipitation', 'Discharge', 'Gage Height']].values

# Split dataset into training and testing sets
train_size = int(len(data) * 0.8)
train_data = data[0:train_size, :]
test_data = data[train_size:len(data), :]

# Scale data between 0 and 1
scaler = MinMaxScaler(feature_range=(0, 1))
train_data = scaler.fit_transform(train_data)
test_data = scaler.transform(test_data)

# Create time series data function
def create_timeseries_data(data, lookback):
    x = []
    y = []
    for i in range(len(data)-lookback-1):
        x.append(data[i:(i+lookback), :])
        y.append(data[i+lookback, 2])
    return np.array(x), np.array(y)

# Create training and testing time series data with lookback of 5
lookback = 5
train_x, train_y = create_timeseries_data(train_data, lookback)
test_x, test_y = create_timeseries_data(test_data, lookback)

# Custom Nash-Sutcliffe metric implementation
import keras.backend as K

def nse_metric(y_true, y_pred):
    numerator = K.sum(K.square(y_true - y_pred))
    denominator = K.sum(K.square(y_true - K.mean(y_true)))
    return 1 - numerator / denominator

# Build the LSTM model with optimal hyperparameters
regressor = tf.keras.models.Sequential()
regressor.add(LSTM(units=580,return_sequences=True, input_shape=(train_x.shape[1], train_x.shape[2])))
regressor.add(LSTM(units=420, return_sequences=True))
regressor.add(LSTM(units=360, return_sequences=True))
regressor.add(LSTM(units=280, return_sequences=True))
regressor.add(LSTM

我已经多次尝试此代码,但我得到的纳什系数为负。我想训练模型,使纳什系数大于 80%。

python deep-learning time-series lstm kaggle
© www.soinside.com 2019 - 2024. All rights reserved.