使用tensorflow集线器模型和tensorflow 2.0作为后端创建keras自定义层时,Variable_scope运行时错误

问题描述 投票:1回答:1

我正在尝试通过将预训练的tf-hub elmo model集成到keras层中来使用。

Keras层:

class ElmoEmbeddingLayer(tf.keras.layers.Layer):

    def __init__(self, **kwargs):
        super(ElmoEmbeddingLayer, self).__init__(**kwargs)
        self.dimensions = 1024
        self.trainable = True
        self.elmo = None

    def build(self, input_shape):
        url = 'https://tfhub.dev/google/elmo/2'

        self.elmo = hub.Module(url)
        self._trainable_weights += trainable_variables(
            scope="^{}_module/.*".format(self.name))
        super(ElmoEmbeddingLayer, self).build(input_shape)

    def call(self, x, mask=None):
        result = self.elmo(
            x,
            signature="default",
            as_dict=True)["elmo"]
        return result

    def compute_output_shape(self, input_shape):
        return input_shape[0], self.dimensions

当我运行代码时,出现以下错误:

Traceback (most recent call last):
  File "D:/Google Drive/Licenta/Gemini/Emotion Analysis/nn/trainer/model.py", line 170, in <module>
    validation_steps=validation_dataset.size())
  File "D:/Google Drive/Licenta/Gemini/Emotion Analysis/nn/trainer/model.py", line 79, in train_gpu
    model = build_model(self.config, self.embeddings, self.sequence_len, self.out_classes, summary=True)
  File "D:\Google Drive\Licenta\Gemini\Emotion Analysis\nn\architectures\models.py", line 8, in build_model
    return my_model(embeddings, config, sequence_length, out_classes, summary)
  File "D:\Google Drive\Licenta\Gemini\Emotion Analysis\nn\architectures\models.py", line 66, in my_model
    inputs, embedding = resolve_inputs(embeddings, sequence_length, model_config, input_type)
  File "D:\Google Drive\Licenta\Gemini\Emotion Analysis\nn\architectures\models.py", line 19, in resolve_inputs
    return elmo_input(model_conf)
  File "D:\Google Drive\Licenta\Gemini\Emotion Analysis\nn\architectures\models.py", line 58, in elmo_input
    embedding = ElmoEmbeddingLayer()(input_text)
  File "D:\Apps\Anaconda\envs\tf2.0\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 616, in __call__
    self._maybe_build(inputs)
  File "D:\Apps\Anaconda\envs\tf2.0\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 1966, in _maybe_build
    self.build(input_shapes)
  File "D:\Google Drive\Licenta\Gemini\Emotion Analysis\nn\architectures\custom_layers.py", line 21, in build
    self.elmo = hub.Module(url)
  File "D:\Apps\Anaconda\envs\tf2.0\lib\site-packages\tensorflow_hub\module.py", line 156, in __init__
    abs_state_scope = _try_get_state_scope(name, mark_name_scope_used=False)
  File "D:\Apps\Anaconda\envs\tf2.0\lib\site-packages\tensorflow_hub\module.py", line 389, in _try_get_state_scope
    "name_scope was already taken." % abs_state_scope)
RuntimeError: variable_scope module/ was unused but the corresponding name_scope was already taken.

这似乎是由于急于执行的行为。如果我禁用急切执行,则必须将model.fit函数包围在一个tensorflow会话中,并使用sess.run(global_variables_initializer())初始化变量以避免下一个错误:

Traceback (most recent call last):
  File "D:/Google Drive/Licenta/Gemini/Emotion Analysis/nn/trainer/model.py", line 168, in <module>
    validation_steps=validation_dataset.size().eval(session=Session()))
  File "D:/Google Drive/Licenta/Gemini/Emotion Analysis/nn/trainer/model.py", line 90, in train_gpu
    class_weight=weighted)
  File "D:\Apps\Anaconda\envs\tf2.0\lib\site-packages\tensorflow\python\keras\engine\training.py", line 643, in fit
    use_multiprocessing=use_multiprocessing)
  File "D:\Apps\Anaconda\envs\tf2.0\lib\site-packages\tensorflow\python\keras\engine\training_arrays.py", line 664, in fit
    steps_name='steps_per_epoch')
  File "D:\Apps\Anaconda\envs\tf2.0\lib\site-packages\tensorflow\python\keras\engine\training_arrays.py", line 294, in model_iteration
    batch_outs = f(actual_inputs)
  File "D:\Apps\Anaconda\envs\tf2.0\lib\site-packages\tensorflow\python\keras\backend.py", line 3353, in __call__
    run_metadata=self.run_metadata)
  File "D:\Apps\Anaconda\envs\tf2.0\lib\site-packages\tensorflow\python\client\session.py", line 1458, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.FailedPreconditionError: 2 root error(s) found.
  (0) Failed precondition: Error while reading resource variable module/bilm/RNN_0/RNN/MultiRNNCell/Cell1/rnn/lstm_cell/bias from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/module/bilm/RNN_0/RNN/MultiRNNCell/Cell1/rnn/lstm_cell/bias/class tensorflow::Var does not exist.
     [[{{node elmo_embedding_layer/module_apply_default/bilm/RNN_0/RNN/MultiRNNCell/Cell1/rnn/lstm_cell/bias/Read/ReadVariableOp}}]]
  (1) Failed precondition: Error while reading resource variable module/bilm/RNN_0/RNN/MultiRNNCell/Cell1/rnn/lstm_cell/bias from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/module/bilm/RNN_0/RNN/MultiRNNCell/Cell1/rnn/lstm_cell/bias/class tensorflow::Var does not exist.
     [[{{node elmo_embedding_layer/module_apply_default/bilm/RNN_0/RNN/MultiRNNCell/Cell1/rnn/lstm_cell/bias/Read/ReadVariableOp}}]]
     [[metrics/f1_micro/Identity/_223]]
0 successful operations.
0 derived errors ignored.

我的解决方案:

with Session() as sess:
    sess.run(global_variables_initializer())
    history = model.fit(self.train_data.repeat(),
                        epochs=self.config['epochs'],
                        validation_data=self.validation_data.repeat(),
                        steps_per_epoch=steps_per_epoch,
                        validation_steps=validation_steps,
                        callbacks=self.__callbacks(monitor_metric),
                        class_weight=weighted)

主要问题是在keras自定义层中是否还有另一种使用elmo tf-hub模块并训练我的模型的方法。另一个问题是我当前的解决方案是否不会影响训练性能或是否出现OOM GPU错误(我在几个批次较大的时期之后收到了OOM错误,我发现这与未关闭的会话或内存泄漏有关) )。

keras scope keras-layer tensorflow2.0
1个回答
0
投票

如果将模型包装在Session()字段中,则还必须将使用模型的其他所有代码包装在Session()字段中。这需要很多时间和精力。我有另一种处理方法:首先,创建一个elmo模块,向keras添加会话:

   elmo_model = hub.Module("https://tfhub.dev/google/elmo/3", trainable=True, 
   name='elmo_module')
   sess = tf.Session()
   sess.run(tf.global_variables_initializer())
   sess.run(tf.tables_initializer())
   K.set_session(sess)

而不是直接在ElmoEmbeddinglayer中创建elmo模块

  self.elmo = hub.Module(url)
  self._trainable_weights += trainable_variables(
            scope="^{}_module/.*".format(self.name))

您可以执行以下操作,我认为它可以正常工作!

  self.elmo = elmo_model
  self._trainable_weights += trainable_variables(
            scope="^elmo_module/.*")
© www.soinside.com 2019 - 2024. All rights reserved.