Keras中的Bi-LSTM注意模型

Question

我正在尝试使用词嵌入的Bi-LSTM制作注意力模型。我遇到了How to add an attention mechanism in keras?，https://github.com/philipperemy/keras-attention-mechanism/blob/master/attention_lstm.py和https://github.com/keras-team/keras/issues/4962。

但是，我对Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification的实现感到困惑。因此，

_input = Input(shape=[max_length], dtype='int32')

# get the embedding layer
embedded = Embedding(
        input_dim=30000,
        output_dim=300,
        input_length=100,
        trainable=False,
        mask_zero=False
    )(_input)

activations = Bidirectional(LSTM(20, return_sequences=True))(embedded)

# compute importance for each step
attention = Dense(1, activation='tanh')(activations)

这里我对哪个方程对本文的方程感到困惑。

attention = Flatten()(attention)
attention = Activation('softmax')(attention)

RepeatVector会做什么？

attention = RepeatVector(20)(attention)
attention = Permute([2, 1])(attention)


sent_representation = merge([activations, attention], mode='mul')

现在，我再次不确定为什么在此行。

sent_representation = Lambda(lambda xin: K.sum(xin, axis=-2), output_shape=(units,))(sent_representation)

由于我有两堂课，所以我的最终softmax为：

probabilities = Dense(2, activation='softmax')(sent_representation)

Answer 1

attention = Flatten()(attention)

将您的注意力权重张量转换为向量（如果序列大小为max_length，则大小为max_length）。

attention = Activation('softmax')(attention)

允许所有注意权重在0和1之间，所有权重之和等于1。

attention = RepeatVector(20)(attention)
attention = Permute([2, 1])(attention)


sent_representation = merge([activations, attention], mode='mul')

RepeatVector用隐藏状态（20）的大小重复注意权重向量（大小为max_len），以将激活和隐藏状态逐元素相乘。张量变量activations的大小为max_len * 20。

sent_representation = Lambda(lambda xin: K.sum(xin, axis=-2), output_shape=(units,))(sent_representation)

此Lambda层将加权的隐藏状态向量求和，以获取将在最后使用的向量。

希望这有所帮助！

Keras中的Bi-LSTM注意模型

问题描述投票：1回答：1

1个回答

最新问题

Keras中的Bi-LSTM注意模型

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1