注意层在推理时改变批量大小

问题描述 投票:0回答:1

我使用编码器-解码器架构训练了序列到序列模型。我正在尝试在给定输入上下文的情况下生成输出序列,并且我正在尝试对一批输入上下文向量执行此操作。我在解码器中的最终输出之前有一个

Self Attention
层,它似乎正在更改批量形状或未正确获取形状,并引发错误。如果我只是推断单个样本 1 或批量大小 1,它会工作得很好,但实际上,生产过程中会花费很长时间并推断数千个输入上下文向量。因此,我需要您的帮助来调试错误并实现一种更好的方法来生成经过计算优化的输出序列。

以下是我的实现:

### Define Inference Encoder
def define_inference_encoder(input_shape):
  encoder_input = Input(shape=input_shape, name='en_input_layer')

  ### First Bidirectional GRU Layer
  bidi_gru1 = Bidirectional(GRU(160, return_sequences=True), name='en_bidirect_gru1')
  gru1_out = bidi_gru1(encoder_input)

  gru1_out = Dropout(0.46598303573163413, name='bidirect_gru1_dropout')(gru1_out)

  ### Second GRU Layer
  # hp_units_2 = hp.Int('enc_lstm2', min_value=32, max_value=800, step=32)
  gru2 = GRU(hsize, return_sequences=True, return_state=True, name='en_gru2_layer')
  gru2_out, gru2_states = gru2(gru1_out)

  encoder_model = Model(inputs=encoder_input, outputs=[gru2_out, gru2_states])
  return encoder_model

### Define Inference Decoder
def define_inf_decoder(context_vec, input_shape):
  decoder_input = Input(shape=input_shape)
  decoder_state_input = Input(shape=(hsize,))

  de_gru1 = GRU(hsize, return_sequences=True, return_state=True, name='de_gru1_layer')
  de_gru1_out, de_state_out = de_gru1(decoder_input, initial_state=decoder_state_input)

  attn_layer = Attention(use_scale=True, name='attn_layer')
  attn_out = attn_layer([de_gru1_out, context_vec])

  attn_added = Concatenate(name='attn_source_concat_layer')([de_gru1_out, attn_out])

  attn_dense_layer = Dense(736, name='tanh_dense_layer', activation='tanh')

  h_hat = attn_dense_layer(attn_added)

  ### Output Layer
  preds = Dense(1, name='output_layer')(h_hat)

  decoder_model = Model(inputs=[decoder_input, decoder_state_input], outputs=[preds, de_state_out])
  return decoder_model

def set_weights(untrained_model, trained_model):
  trained_layers = [l.name for l in trained_model.layers]
  print(f"No. of trained layers: {len(trained_layers)}")

  for l in untrained_model.layers:
    if l.name in trained_layers:
      trained_wts = trained_model.get_layer(l.name).get_weights()
      if len(trained_wts)>0:
        untrained_model.get_layer(l.name).set_weights(trained_wts)
        print(f"Layer {l.name} weight set")
        
  return untrained_model

生成输出序列:

inference_encoder = define_inference_encoder((12, 1))
inference_encoder = set_weights(inference_encoder, tuned_model)

for (ex_context, ex_target_in), ex_target_out in test_ds.take(1):
  print(ex_context.shape, ex_target_in.shape) ### (64, 12, 1) (64, 3, 1)

test_context, test_states = inference_encoder.predict(tf.reshape(ex_context, shape=(-1,seq_len, 1)))
print(test_context.shape, test_states.shape) ### (64, 12, 256) (64, 256)

inf_decoder = define_inf_decoder(test_context, (1,1))
inf_decoder = set_weights(inf_decoder, tuned_model)

dec_inp = tf.reshape(ex_context[:,-1], shape=(-1,1,1))
dec_inp.shape ### (64,1,1)

test_inf_decoder_out = inf_decoder.predict([dec_inp, test_states])

错误:

ValueError: Exception encountered when calling layer 'attn_layer' (type Attention).

Dimensions must be equal, but are 32 and 64 for '{{node model_7/attn_layer/MatMul}} = BatchMatMulV2[T=DT_FLOAT, adj_x=false,

adj_y=true](model_7/de_gru1_layer/PartitionedCall:1, model_7/15181)' 输入形状:[32,1,256],[64,12,256]。

Call arguments received by layer 'attn_layer' (type Attention):
  • inputs=['tf.Tensor(shape=(32, 1, 256), dtype=float32)', 'tf.Tensor(shape=(64, 12, 256), >dtype=float32)']
  • mask=None
  • training=False
  • return_attention_scores=False
  • use_causal_mask=False

我不明白的是,当我传递批量大小为 64 的输入时,

attn_layer
如何获得批量大小 32。当我使用批量大小为 1 时,它工作正常。我在做什么错了?

tensorflow keras deep-learning nlp sequence
1个回答
0
投票

您可以在批量大小为 1 和大于 1 的两种配置中打印输入的形状吗?有时问题是您需要更改尺寸的顺序。

© www.soinside.com 2019 - 2024. All rights reserved.