Keras 的clear_session() 无法在 Google colab 中工作

问题描述 投票:0回答:2

我在 Google colab 中多次运行 keras 模型。由于张量流的性质,程序每次运行时都会创建一个新模型,这会导致某些运行后内存耗尽。我发现 keras 的

clear_session()
应该可以帮助解决这个问题,但它似乎不起作用。我在下面为 Google colab 创建了一个 MWE。

import numpy as np
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras import backend as K

X = np.zeros([10, 10000])
y = np.zeros([10, 10000])

########
m = Sequential([Dense(10000, input_shape=(10000,)), Dense(10000), Dense(10000), Dense(10000)])
m.compile(loss='mse')
m.summary()

m.fit(X,y)
K.clear_session()

运行下面的部分

########
三次后,出现以下错误:

---------------------------------------------------------------------------

ResourceExhaustedError                    Traceback (most recent call last)

<ipython-input-3-7ae5ab890fc2> in <module>
      3 m.summary()
      4 
----> 5 m.fit(X,y)
      6 K.clear_session()

1 frames

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     53     ctx.ensure_initialized()
     54     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 55                                         inputs, attrs, num_outputs)
     56   except core._NotOkStatusException as e:
     57     if name is not None:

ResourceExhaustedError: Graph execution error:

Detected at node 'RMSprop/RMSprop/update_2/mul_2' defined at (most recent call last):
    File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
      "__main__", mod_spec)
    File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
      exec(code, run_globals)
    File "/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py", line 16, in <module>
      app.launch_new_instance()
    File "/usr/local/lib/python3.7/dist-packages/traitlets/config/application.py", line 846, in launch_instance
      app.start()
    File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelapp.py", line 612, in start
      self.io_loop.start()
    File "/usr/local/lib/python3.7/dist-packages/tornado/platform/asyncio.py", line 132, in start
      self.asyncio_loop.run_forever()
    File "/usr/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
      self._run_once()
    File "/usr/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
      handle._run()
    File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run
      self._context.run(self._callback, *self._args)
    File "/usr/local/lib/python3.7/dist-packages/tornado/ioloop.py", line 758, in _run_callback
      ret = callback()
    File "/usr/local/lib/python3.7/dist-packages/tornado/stack_context.py", line 300, in null_wrapper
      return fn(*args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1233, in inner
      self.run()
    File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1147, in run
      yielded = self.gen.send(value)
    File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 381, in dispatch_queue
      yield self.process_one()
    File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 346, in wrapper
      runner = Runner(result, future, yielded)
    File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1080, in __init__
      self.run()
    File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1147, in run
      yielded = self.gen.send(value)
    File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 365, in process_one
      yield gen.maybe_future(dispatch(*args))
    File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 326, in wrapper
      yielded = next(result)
    File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 268, in dispatch_shell
      yield gen.maybe_future(handler(stream, idents, msg))
    File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 326, in wrapper
      yielded = next(result)
    File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 545, in execute_request
      user_expressions, allow_stdin,
    File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 326, in wrapper
      yielded = next(result)
    File "/usr/local/lib/python3.7/dist-packages/ipykernel/ipkernel.py", line 306, in do_execute
      res = shell.run_cell(code, store_history=store_history, silent=silent)
    File "/usr/local/lib/python3.7/dist-packages/ipykernel/zmqshell.py", line 536, in run_cell
      return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 2855, in run_cell
      raw_cell, store_history, silent, shell_futures)
    File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 2881, in _run_cell
      return runner(coro)
    File "/usr/local/lib/python3.7/dist-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner
      coro.send(None)
    File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 3058, in run_cell_async
      interactivity=interactivity, compiler=compiler, result=result)
    File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 3249, in run_ast_nodes
      if (await self.run_code(code, result,  async_=asy)):
    File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 3326, in run_code
      exec(code_obj, self.user_global_ns, self.user_ns)
    File "<ipython-input-3-7ae5ab890fc2>", line 5, in <module>
      m.fit(X,y)
    File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1409, in fit
      tmp_logs = self.train_function(iterator)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1051, in train_function
      return step_function(self, iterator)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1040, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1030, in run_step
      outputs = model.train_step(data)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 893, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/usr/local/lib/python3.7/dist-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 539, in minimize
      return self.apply_gradients(grads_and_vars, name=name)
    File "/usr/local/lib/python3.7/dist-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 682, in apply_gradients
      name=name)
    File "/usr/local/lib/python3.7/dist-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 724, in _distributed_apply
      var, apply_grad_to_update_var, args=(grad,), group=False)
    File "/usr/local/lib/python3.7/dist-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 706, in apply_grad_to_update_var
      update_op = self._resource_apply_dense(grad, var, **apply_kwargs)
    File "/usr/local/lib/python3.7/dist-packages/keras/optimizers/optimizer_v2/rmsprop.py", line 216, in _resource_apply_dense
      var_t = var - coefficients["lr_t"] * grad / (
Node: 'RMSprop/RMSprop/update_2/mul_2'
failed to allocate memory
     [[{{node RMSprop/RMSprop/update_2/mul_2}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
 [Op:__inference_train_function_2465]

我想在同一模型上使用略有不同的数据,因此我多次运行类似的部分。我可以在错误发生后简单地重新启动笔记本电脑,但加载数据需要一些时间,那么是否有一个选项可以真正清除旧模型?感谢您的帮助。

python tensorflow keras jupyter-notebook google-colaboratory
2个回答
0
投票

重新启动运行时并重试,因为我尝试复制上述代码并且工作正常。

您可以检查下面提到的相同代码的输出:

import numpy as np
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras import backend as K

X = np.zeros([10, 10000])
y = np.zeros([10, 10000])

########
m = Sequential([Dense(10000, input_shape=(10000,)), Dense(10000), Dense(10000), Dense(10000)])
m.compile(loss='mse')
m.summary()

m.fit(X,y, epochs=2)
K.clear_session()

输出:

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 10000)             100010000 
                                                                 
 dense_1 (Dense)             (None, 10000)             100010000 
                                                                 
 dense_2 (Dense)             (None, 10000)             100010000 
                                                                 
 dense_3 (Dense)             (None, 10000)             100010000 
                                                                 
=================================================================
Total params: 400,040,000
Trainable params: 400,040,000
Non-trainable params: 0
_________________________________________________________________
Epoch 1/2
1/1 [==============================] - 10s 10s/step - loss: 0.0000e+00
Epoch 2/2
1/1 [==============================] - 6s 6s/step - loss: 0.0000e+00

0
投票

您必须在循环结束时使用垃圾收集器来清除内存:

K.clear_session()

后使用
import gc

for _ in ():
    ...
    model = MyOwnModel()
    model.compile(...)
    model.fit(...)
    
    K.clear_session()
    gc.collect()

如果不起作用,请在 Colab 中的该单元之前使用此代码更改 TensorFlow 的版本:
对我来说,更改 TensorFlow 版本后确实可以工作

!pip install tensorflow==2.10.0
© www.soinside.com 2019 - 2024. All rights reserved.