训练基于BERT的模型会导致内存不足错误。我该如何解决？

Question

我的设置具有NVIDIA P100 GPU。我正在使用Google BERT模型来回答问题。我正在使用SQuAD问答数据集，该数据集向我提供了问题以及应从中得出答案的段落，并且我的研究表明该体系结构应该可以，但是在训练过程中，我总是遇到OutOfMemory错误：

ResourceExhaustedError：在分配张量时使用OOM形状[786432,1604]并在类型上浮动/ job：本地主机/副本：0 /任务：0 /设备：GPU：0通过分配器GPU_0_bfc[[{{node density_3 / kernel / Initializer / random_uniform / RandomUniform}}]]提示：如果您想在发生OOM时看到分配的张量列表，将report_tensor_allocations_upon_oom添加到当前的RunOptions分配信息。

[下面，请找到一个完整的程序，该程序在我自己的模型中使用其他人对Google BERT算法的实现。请让我知道该如何解决我的错误。谢谢！

import json
import numpy as np
import pandas as pd
import os
assert os.path.isfile("train-v1.1.json"),"Non-existent file"
from tensorflow.python.client import device_lib
import tensorflow.compat.v1 as tf
#import keras
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
import re
regex = re.compile(r'\W+')
#Reading the files.
def readFile(filename):
  with open(filename) as file:
    fields = []
    JSON = json.loads(file.read())
    articles = []
    for article in JSON["data"]:
      articleTitle = article["title"]
      article_body = []
      for paragraph in article["paragraphs"]:
        paragraphContext = paragraph["context"]
        article_body.append(paragraphContext)
        for qas in paragraph["qas"]:
          question = qas["question"]
          answer = qas["answers"][0]
          fields.append({"question":question,"answer_text":answer["text"],"answer_start":answer["answer_start"],"paragraph_context":paragraphContext,"article_title":articleTitle})
      article_body = "\\n".join(article_body)
      article = {"title":articleTitle,"body":article_body}
      articles.append(article)
  fields = pd.DataFrame(fields)
  fields["question"] = fields["question"].str.replace(regex," ")
  assert not (fields["question"].str.contains("catalanswhat").any())
  fields["paragraph_context"] = fields["paragraph_context"].str.replace(regex," ")
  fields["answer_text"] = fields["answer_text"].str.replace(regex," ")
  assert not (fields["paragraph_context"].str.contains("catalanswhat").any())
  fields["article_title"] = fields["article_title"].str.replace("_"," ")
  assert not (fields["article_title"].str.contains("catalanswhat").any())
  return fields,JSON["data"]
trainingData,training_JSON = readFile("train-v1.1.json")
print("JSON dataset read.")
#Text preprocessing
## Converting text to skipgrams
print("Tokenizing sentences.")
strings = trainingData.drop("answer_start",axis=1)
strings = strings.values.flatten()
textTokenizer = Tokenizer()
textTokenizer.fit_on_texts(strings)
questionsTokenized_train = pad_sequences(textTokenizer.texts_to_sequences(trainingData["question"]))
print(questionsTokenized_train.shape)
contextTokenized_train = pad_sequences(textTokenizer.texts_to_sequences(trainingData["paragraph_context"]))
print("Sentences tokenized.")

answer_start_train_one_hot = pd.get_dummies(trainingData["answer_start"])

# @title Keras-BERT Environment
import os
pretrained_path = 'uncased_L-12_H-768_A-12'
config_path = os.path.join(pretrained_path, 'bert_config.json')
checkpoint_path = os.path.join(pretrained_path, 'bert_model.ckpt')
vocab_path = os.path.join(pretrained_path, 'vocab.txt')
# Use TF_Keras
os.environ["TF_KERAS"] = "1"

# @title Load Basic Model
import codecs
from keras_bert import load_trained_model_from_checkpoint
token_dict = {}
with codecs.open(vocab_path, 'r', 'utf8') as reader:
    for line in reader:
        token = line.strip()
        token_dict[token] = len(token_dict)

model = load_trained_model_from_checkpoint(config_path, checkpoint_path)

#@title Model Summary
model.summary()

#@title Create tokenization stuff.
from keras_bert import Tokenizer

tokenizer = Tokenizer(token_dict)
def tokenize(text,max_len):
  tokenizer.tokenize(text)
  return tokenizer.encode(first=text,max_len=max_len)
def tokenize_array(texts,max_len=512):
  indices = np.zeros((texts.shape[0],max_len))
  segments = np.zeros((texts.shape[0],max_len))
  for i in range(texts.shape[0]):
    tokens = tokenize(texts[i],max_len)
    indices[i] = tokens[0]
    segments[i] = tokens[1]
  #print(indices.shape)
  #print(segments.shape)
  return np.stack([segments,indices],axis=1)

#@ Tokenize inputs.
def X_Y(dataset,answer_start_one_hot,batch_size=10):
    questions = dataset["question"]
    contexts = dataset["paragraph_context"]
    questions_tokenized = tokenize_array(questions.values)
    contexts_tokenized = tokenize_array(contexts.values)
    X = np.stack([questions_tokenized,contexts_tokenized],axis=1)
    Y = answer_start_one_hot
    return X,Y
def X_Y_generator(dataset,answer_start_one_hot,batch_size=10):
    while True:
        try:
            batch_indices = np.random.choice(np.arange(0,dataset.shape[0]),size=batch_size)
            dataset_batch = dataset.iloc[batch_indices]
            X,Y = X_Y(dataset_batch,answer_start_one_hot.iloc[batch_indices])
            max_int = pd.concat((trainingData["answer_start"],devData["answer_start"])).max()
            yield (X,Y)
        except Exception as e:
            print("Unhandled exception in X_Y_generator: ",e)
            raise

model.trainable = True

answers_network_checkpoint = ModelCheckpoint('answers_network-best.h5', verbose=1, monitor='val_loss',save_best_only=True, mode='auto')

input_layer = Input(shape=(2,2,512,))
print("input layer: ",input_layer.shape)
questions_input_layer = Lambda(lambda x: x[:,0])(input_layer)
context_input_layer = Lambda(lambda x: x[:,1])(input_layer)
print("questions input layer: ",questions_input_layer.shape)
print("context input layer: ",context_input_layer.shape)
questions_indices_layer = Lambda(lambda x: tf.cast(x[:,0],tf.float64))(questions_input_layer)
print("questions indices layer: ",questions_indices_layer.shape)
questions_segments_layer = Lambda(lambda x: tf.cast(x[:,1],tf.float64))(questions_input_layer)
print("questions segments layer: ",questions_segments_layer.shape)
context_indices_layer = Lambda(lambda x: tf.cast(x[:,0],tf.float64))(context_input_layer)
context_segments_layer = Lambda(lambda x: tf.cast(x[:,1],tf.float64))(context_input_layer)
questions_bert_layer = model([questions_indices_layer,questions_segments_layer])
print("Questions bert layer loaded.")
context_bert_layer = model([context_indices_layer,context_segments_layer])
print("Context bert layer loaded.")
questions_flattened = Flatten()(questions_bert_layer)
context_flattened = Flatten()(context_bert_layer)
combined = Concatenate()([questions_flattened,context_flattened])
#bert_dense_questions = Dense(256,activation="sigmoid")(questions_flattened)
#bert_dense_context = Dense(256,activation="sigmoid")(context_flattened)
answers_network_output = Dense(1604,activation="softmax")(combined)
#answers_network = Model(inputs=[input_layer],outputs=[questions_bert_layer,context_bert_layer])
answers_network = Model(inputs=[input_layer],outputs=[answers_network_output])
answers_network.summary()

answers_network.compile("adam","categorical_crossentropy",metrics=["accuracy"])

answers_network.fit_generator(
    X_Y_generator(
        trainingData,
        answer_start_train_one_hot,
        batch_size=10),
    steps_per_epoch=100,
    epochs=100,
    callbacks=[answers_network_checkpoint])

我的词汇量约为83,000个单词。任何具有“良好”准确性/ F1分数的模型都是首选，但我也要在5天的不可延期期限内。

编辑：

[不幸的是，我没有提到一件事：实际上，我正在使用CyberZHG的keras-bert模块进行预处理，并使用实际的BERT模型，因此某些优化实际上可能会破坏代码。例如，我尝试将默认float值设置为float16，但这会导致兼容性错误。

编辑＃2：

根据要求，这是我完整程序的代码：

Jupyter notebook

Answer 1

在其github页面上查看此Out-of-memory issues部分。

Answer 2

[我掌握的信息有限，我怀疑问题出在您模型的最后一层。但是我在您的解决方案中看到了很多危险信号。他们的顺序是我认为从轻到重。

Answer 3

0
投票

您的问题是在创建此Dense()层时：

训练基于BERT的模型会导致内存不足错误。我该如何解决？

问题描述投票：1回答：3

3个回答

最新问题

训练基于BERT的模型会导致内存不足错误。我该如何解决？

问题描述 投票：1回答：3

3个回答

最新问题

问题描述投票：1回答：3