在Keras中为每个具有不同隐藏大小和多个LSTM层的小批量设置隐藏状态

Question

我使用Keras和TensorFlow作为后端创建了一个LSTM。在为训练提供num_step为96的小批量之前，将LSTM的隐藏状态设置为前一时间步的真值。

首先是参数和数据：

batch_size = 10
num_steps = 96
num_input = num_output = 2
hidden_size = 8
X_train = np.array(X_train).reshape(-1, num_steps, num_input)
Y_train = np.array(Y_train).reshape(-1, num_steps, num_output)
X_test = np.array(X_test).reshape(-1, num_steps, num_input)
Y_test = np.array(Y_test).reshape(-1, num_steps, num_output)

Keras模型由两个LSTM层和一个层组成，用于将输出修剪为num_output，即2：

model = Sequential()
model.add(LSTM(hidden_size, batch_input_shape=((batch_size, num_steps, num_input)),
               return_sequences=True, stateful = True)))
model.add(LSTM(hidden_size, return_sequences=True)))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(num_output, activation='softmax')))

model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])

生成器以及训练（hidden_states [x]具有形状（2，））：

def gen_data():
        x = np.zeros((batch_size, num_steps, num_input))
        y = np.zeros((batch_size, num_steps, num_output))
        while True:
            for i in range(batch_size):
                model.layers[0].states[0] = K.variable(value=hidden_states[gen_data.current_idx]) # hidden_states[x] has shape (2,)
                x[i, :, :] = X_train[gen_data.current_idx]
                y[i, :, :] = Y_train[gen_data.current_idx]
                gen_data.current_idx += 1
            yield x, y
gen_data.current_idx = 0


for epoch in range(100):
    model.fit_generator(generate_data(), len(X_train)//batch_size, 1,
                        validation_data=None, max_queue_size=1, shuffle=False)
    gen_data.current_idx = 0

这段代码没有给我一个错误，但我有两个问题：

1）在生成器内部，我将LSTM model.layers[0].states[0]的隐藏状态设置为hidden_states[gen_data.current_idx]上的变量，形状为（2，）。为什么隐藏大小大于2的LSTM可能会出现这种情况？

2）hidden_states[gen_data.current_idx]中的值也可以是Keras模型的输出。对于双层LSTM来说，以这种方式设置隐藏状态是否有意义？

Answer 1

States in LSTM

LSTM由计算cell state和hidden state的门组成。

在图中，LSTM右侧的顶部箭头是单元状态（c_t），底部箭头是隐藏状态（h_t）。单元状态是门控操作的结果，状态的大小与LSTM的hidden_size相同。每次展开（使用其相应的输入X）都会产生自己的单元状态。在LSTM的情况下，单元状态由（batch_size x hidden_size）的两个值hidden_state（h_t）和（batch_size x hidden_size）的cell_state（c_t）组成。

batch_size = 2
num_steps = 5
num_input = num_output = 1
hidden_size = 8

inputs = Input(batch_shape=(batch_size,num_steps, num_input))
lstm, state_h, state_c = LSTM(hidden_size, return_state=True, return_sequences=True)(inputs)
model = Model(inputs=inputs, outputs=[state_h, state_c])

print (model.predict(np.zeros((batch_size, num_steps, num_input))))
print (model.layers[1].cell.state_size)

注意：如果GRU / RNN没有单元状态，则只有隐藏状态，因此单元格中的单元格状态只是h_t的大小（batch_size，hidden_size）

参考：

Keras实施LSTM

Keras Docs:

状态张量的数量是1（对于RNN和GRU）或2（对于LSTM）。

Illustrated Guide to LSTM and GRU

喂养状态

在您的示例中，layers[0]指的是1 LSTM，layers[1]指的是第2个LSTM。如果你的意图是从（n-1）的单元状态（即前一批）初始化第n批的单元状态（c_t），则有两种选择

你在发电机做的方式，但如果你想states[1]和c_t为states[0]使用h_t。类似地，将layers[0]用于第一LSTM，将layers[1]用于第二LSTM。但是使用set_value方法代替。见下面的编辑。
使用keras Stateful=True：将stateful设置为true，每次批处理后LSTM状态不会重置。因此，如果您有一个包含5个数据样本（每个序列长度）的批处理，您将获得5个数据样本中每个样本的单元格状态。将stateful设置为true，这些状态用于初始化下一批的下一个批处理单元状态。

编辑：

方法set_value应该用于设置张量变量的值。代码model.layers[0].states[0] = K.variable(value=hidden_states[gen_data.current_idx])是有效的，因为它正在做的是将指向大小变量（batch_size X hidden_size）的状态[0]更改为大小变量（batch_size x 2）。它不是改变张量变量的值，而是使其指向不同维度的新张量变量。

测试代码：

 print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))
 model.layers[0].states[0]= K.variable(np.random.randn(10,2))
 print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))

产量

<tf.Variable 'lstm_18/Variable:0' shape=(10, 8) dtype=float32_ref> 0x7f8812e6ee10
<tf.Variable 'Variable_2:0' shape=(10, 2) dtype=float32_ref> 0x7f881269afd0

如你所见，它们是两个不同的变量。正确的方法是

 print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))
 K.set_value(model.layers[0].states[0], np.random.randn(10,8))
 print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))

产量

<tf.Variable 'lstm_20/Variable:0' shape=(10, 8) dtype=float32_ref> 0x7f881138eb70
<tf.Variable 'lstm_20/Variable:0' shape=(10, 8) dtype=float32_ref> 0x7f881138eb70

如果你的代码是固定的那么

K.set_value(model.layers[0].states[0], np.random.randn(10,2))

将因张量的大小和您设置的值的大小不匹配而抛出错误。

在Keras中为每个具有不同隐藏大小和多个LSTM层的小批量设置隐藏状态

问题描述投票：2回答：1

1个回答

States in LSTM

参考：

喂养状态

编辑：

最新问题

在Keras中为每个具有不同隐藏大小和多个LSTM层的小批量设置隐藏状态

问题描述 投票：2回答：1

1个回答

States in LSTM

参考：

喂养状态

编辑：

最新问题

问题描述投票：2回答：1