为什么LSTM自动编码器使用 "relu "作为激活功能？

Question

我看了一下博客，作者用 "relu "代替 "tanh"，为什么？https:/towardsdatascience.comstep-by-step-understanding-lstm-autoencoder-layers-ffab055b6352。

lstm_autoencoder = Sequential()

# Encoder
lstm_autoencoder.add(LSTM(timesteps, activation='relu', input_shape=(timesteps, n_features), 
return_sequences=True))
lstm_autoencoder.add(LSTM(16, activation='relu', return_sequences=True))
lstm_autoencoder.add(LSTM(1, activation='relu'))
lstm_autoencoder.add(RepeatVector(timesteps))

# Decoder
lstm_autoencoder.add(LSTM(timesteps, activation='relu', return_sequences=True))
lstm_autoencoder.add(LSTM(16, activation='relu', return_sequences=True))
lstm_autoencoder.add(TimeDistributed(Dense(n_features)))

Answer 1

首先，relu函数还是有它的问题。具体来说，它存在梯度爆炸的问题，因为它在正域是没有边界的。意味着，这个问题在更深的LSTM网络中仍然会存在。大多数LSTM网络变得非常深，所以它们有相当大的机会遇到爆炸梯度问题。RNNs在每个时间步长使用相同的权重矩阵时，也会出现爆炸性梯度。有一些方法，如梯度剪裁，有助于减少RNNs的这个问题。但是，ReLU函数本身并不能解决爆炸梯度问题。

现在，来回答你关于 使用ReLU函数来减少消失梯度问题。. ReLU并不能完全解决消失梯度的问题。方法，如批量归一化，可以帮助减少消失梯度问题。据我所知，ReLU和tanh激活函数本身对于这个特殊门来说应该没有太大的区别。它们都不能解决LSTM网络中的消失爆炸梯度问题。关于LSTMs如何减少消失和爆炸梯度问题的更多信息，请参考以下内容职位.

为什么LSTM自动编码器使用 "relu "作为激活功能？

问题描述投票：0回答：1

1个回答

最新问题

为什么LSTM自动编码器使用 "relu "作为激活功能？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1