如何限制TFLearn中的GPU内存使用?

问题描述 投票:0回答:1

[我在AlexNet上使用TFLearn制作自动驾驶汽车,我已经训练过网络,但是当我尝试同时运行GTA和网络时,出现此错误CUBLAS_STATUS_ALLOC_FAILED,这意味着我已经在运行我想GPU内存不足。

这是我的alex网络文件

import tflearn
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.estimator import regression
from tflearn.layers.normalization import local_response_normalization


def alexnet(width, height, lr):
    network = input_data(shape=[None, width, height, 1], name='input')
    network = conv_2d(network, 96, 11, strides=4, activation='relu')
    network = max_pool_2d(network, 3, strides=2)
    network = local_response_normalization(network)
    network = conv_2d(network, 256, 5, activation='relu')
    network = max_pool_2d(network, 3, strides=2)
    network = local_response_normalization(network)
    network = conv_2d(network, 384, 3, activation='relu')
    network = conv_2d(network, 384, 3, activation='relu')
    network = conv_2d(network, 256, 3, activation='relu')
    network = max_pool_2d(network, 3, strides=2)
    network = local_response_normalization(network)
    network = fully_connected(network, 4096, activation='tanh')
    network = dropout(network, 0.5)
    network = fully_connected(network, 4096, activation='tanh')
    network = dropout(network, 0.5)
    network = fully_connected(network, 3, activation='softmax')
    network = regression(network, optimizer='momentum',
                         loss='categorical_crossentropy',
                         learning_rate=lr, name='targets')

    model = tflearn.DNN(network, checkpoint_path='model_data/model_alexnet',
                        max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir='log')

    return model

我尝试添加此内容

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config)
session.run(tf.global_variables_initializer())

然后像这样将session=session传递给tflearn.DNN函数

 model = tflearn.DNN(network, checkpoint_path='model_data/model_alexnet',
                        max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir='log', session=session)

但是它也不起作用,我得到一些变量没有初始化

实际上,当我尝试使用该文件中的模型时,例如

import numpy as np
from alexnet import alexnet

WIDTH = 80
HEIGHT = 60
LR = 1e-3
EPOCHS = 8
MODEL_NAME = 'pygta5-car-{}-{}-{}-epochs.model'. \
    format(LR, 'alexnet', EPOCHS)

model = alexnet(WIDTH, HEIGHT, LR)

train_data = np.load('training_data.npy')

train = train_data[:-100]
test = train_data[-100:]

train_x = np.array([i[0] for i in train]).reshape([-1, WIDTH, HEIGHT, 1]) # Prendo solo le immagini
train_y = np.array([i[1] for i in train]) # Prendo solo le label

test_x = np.array([i[0] for i in test]).reshape([-1, WIDTH, HEIGHT, 1]) # Prendo solo le immagini
test_y = np.array([i[1] for i in test]) # Prendo solo le label

model.fit({'input': train_x}, {'targets': train_y},
          n_epoch=EPOCHS, validation_set=({'input': test_x}, {'targets': test_y}),
          snapshot_step=500, run_id=MODEL_NAME, show_metric=True)


model.save('models/model.tfl')

我在执行model.fit()时收到此错误

"C:\Program Files\Python36\python.exe" C:/Users/Elia/PycharmProjects/SelfDrivingGrandTheftAutoV/v2/train_model.py
WARNING:tensorflow:From C:\Program Files\Python36\lib\site-packages\tflearn\initializations.py:119: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior.
2018-01-09 23:49:30.486827: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2018-01-09 23:49:30.947896: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.8475
pciBusID: 0000:23:00.0
totalMemory: 6.00GiB freeMemory: 4.97GiB
2018-01-09 23:49:30.948297: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:23:00.0, compute capability: 6.1)
2018-01-09 23:49:32.382017: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:23:00.0, compute capability: 6.1)
---------------------------------
Run id: pygta5-car-0.001-alexnet-8-epochs.model
Log directory: log/
---------------------------------
Training samples: 7775
Validation samples: 100
--
2018-01-09 23:49:34.924216: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.924720: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.925239: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.925749: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.926254: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.927268: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.927814: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.928404: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.928867: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.929380: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.929866: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.930321: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.930808: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.931303: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.931798: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.932288: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1323, in _do_call
    return fn(*args)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1302, in _run_fn
    status, run_metadata)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
     [[Node: Conv2D_1/W/read/_179 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_748_Conv2D_1/W/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/Elia/PycharmProjects/SelfDrivingGrandTheftAutoV/v2/train_model.py", line 26, in <module>
    snapshot_step=500, run_id=MODEL_NAME, show_metric=True)
  File "C:\Program Files\Python36\lib\site-packages\tflearn\models\dnn.py", line 216, in fit
    callbacks=callbacks)
  File "C:\Program Files\Python36\lib\site-packages\tflearn\helpers\trainer.py", line 339, in fit
    show_metric)
  File "C:\Program Files\Python36\lib\site-packages\tflearn\helpers\trainer.py", line 818, in _train
    feed_batch)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py", line 889, in run
    run_metadata_ptr)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1317, in _do_run
    options, run_metadata)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
     [[Node: Conv2D_1/W/read/_179 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_748_Conv2D_1/W/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'Crossentropy/Mean/moving_avg/read', defined at:
  File "C:/Users/Elia/PycharmProjects/SelfDrivingGrandTheftAutoV/v2/train_model.py", line 11, in <module>
    model = alexnet(WIDTH, HEIGHT, LR)
  File "C:\Users\Elia\PycharmProjects\SelfDrivingGrandTheftAutoV\v2\alexnet.py", line 37, in alexnet
    max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir='log', session=session)
  File "C:\Program Files\Python36\lib\site-packages\tflearn\models\dnn.py", line 65, in __init__
    best_val_accuracy=best_val_accuracy)
  File "C:\Program Files\Python36\lib\site-packages\tflearn\helpers\trainer.py", line 131, in __init__
    clip_gradients)
  File "C:\Program Files\Python36\lib\site-packages\tflearn\helpers\trainer.py", line 693, in initialize_training_ops
    ema_num_updates=self.training_steps)
  File "C:\Program Files\Python36\lib\site-packages\tflearn\summaries.py", line 239, in add_loss_summaries
    loss_averages_op = loss_averages.apply([loss] + other_losses)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\training\moving_averages.py", line 401, in apply
    colocate_with_primary=(var.op.type in ["Variable", "VariableV2"]))
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\training\slot_creator.py", line 174, in create_zeros_slot
    colocate_with_primary=colocate_with_primary)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\training\slot_creator.py", line 151, in create_slot_with_initializer
    dtype)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\training\slot_creator.py", line 67, in _create_slot_var
    validate_shape=validate_shape)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1203, in get_variable
    constraint=constraint)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1092, in get_variable
    constraint=constraint)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 425, in get_variable
    constraint=constraint)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 394, in _true_getter
    use_resource=use_resource, constraint=constraint)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 805, in _get_single_variable
    constraint=constraint)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\variables.py", line 213, in __init__
    constraint=constraint)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\variables.py", line 356, in _init_from_args
    self._snapshot = array_ops.identity(self._variable, name="read")
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\array_ops.py", line 125, in identity
    return gen_array_ops.identity(input, name=name)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 2070, in identity
    "Identity", input=input, name=name)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 2956, in create_op
    op_def=op_def)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
     [[Node: Conv2D_1/W/read/_179 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_748_Conv2D_1/W/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]


Process finished with exit code 1

是否有解决此问题的方法或限制tflearn中GPU使用率的更好方法?

python tensorflow machine-learning gpu tflearn
1个回答
2
投票

当我遇到同样的问题时,我发现了这个问题。我认为这与您无关,但对其他人则可能。

[当您尝试将模型加载到视频ram中时,会发生此问题,但失败了,因为GTA 5和模型都不够。

我是tflearn的新手,所以我无法解释为什么您的解决方案无法正常工作。

为了限制GPU的内存使用,您可以在alexnet中的model = tflearn.DNN(...)之前添加以下行。

tflearn.init_graph(num_cores=4, gpu_memory_fraction=0.5)

TFLearn Documentation

不认为num_cores=4实际上是必需的,但没有它我就没有对其进行测试。

此外,您还需要在不运行alexnet的情况下监视vram的使用情况,以查看您的游戏本身需要多少,因为上述行仅在其小于50%时才起作用(您可以更改值)。

我正在Forza Horizo​​n 3(针对PC进行了优化的情况下,尝试与您类似的操作,并且通过降低设置可以将使用率从60%减少到40%。

我已经将它与8gb 2080配合使用,因此它应与6gb 1060配合使用。

© www.soinside.com 2019 - 2024. All rights reserved.