Tensorflow:如何修复 ResourceExhaustedError?

问题描述 投票:0回答:1

我正在尝试重新创建这些:拥抱脸:问答任务拥抱脸:问答NLP课程

我在 model.fit() 部分遇到了这个 ResourceExhaustedError。

---------------------------------------------------------------------------
ResourceExhaustedError                    Traceback (most recent call last)
Cell In[14], line 1
----> 1 model.fit(x=tf_train_set, batch_size=16, validation_data=tf_validation_set, epochs=3, callbacks=[callback])
ResourceExhaustedError: Graph execution error:

Detected at node 'tf_distil_bert_for_question_answering/distilbert/transformer/layer_._4/attention/dropout_14/dropout/random_uniform/RandomUniform' defined at (most recent call last):

*这里列出了一堆文件*

Node: 'tf_distil_bert_for_question_answering/distilbert/transformer/layer_._4/attention/dropout_14/dropout/random_uniform/RandomUniform'
OOM when allocating tensor with shape[16,12,384,384] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[{{node tf_distil_bert_for_question_answering/distilbert/transformer/layer_._4/attention/dropout_14/dropout/random_uniform/RandomUniform}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
 [Op:__inference_train_function_9297]

我已经尝试降低batch_size。

model.fit(x=tf_train_set, batch_size=16, validation_data=tf_validation_set, epochs=3, callbacks=[callback])

我还尝试限制GPU的内存增长 限制 GPU 内存增长

以下是 Colab 笔记本:Colab:问答任务Colab:问答 NLP 课程

python tensorflow memory gpu training-data
1个回答
0
投票

这意味着 GPU 的内存无法承受您的批量大小或输入数据大小。 因此,尝试减少批量大小或输入数据大小

© www.soinside.com 2019 - 2024. All rights reserved.