如何将BertForSequenceClassification用于设置为1700的令牌最大长度?

问题描述 投票:0回答:1

我想对Reuters 50 50数据集进行作者分类,其中最大令牌长度为1600+令牌,总共有50个班级/作者。

使用max_length=1700batch_size=1,我得到的是RuntimeError: CUDA out of memory。可以通过设置max_length=512来防止此错误,但这会导致文本截断的不良影响。

标记和编码:

from keras.preprocessing.sequence import pad_sequences
MAX_LEN = 1700
def get_encodings(texts):
    token_ids = []
    attention_masks = []
    for text in texts:
        token_id = tokenizer.encode(text, add_special_tokens=True, max_length=MAX_LEN)
        token_ids.append(token_id)
    return token_ids

def pad_encodings(encodings):
    return pad_sequences(encodings, maxlen=MAX_LEN, dtype="long", 
                          value=0, truncating="post", padding="post")

def get_attention_masks(padded_encodings):
    attention_masks = []
    for encoding in padded_encodings:
        attention_mask = [int(token_id > 0) for token_id in encoding]
        attention_masks.append(attention_mask)
    return attention_masks


train_encodings = get_encodings(train_df.text.values)
train_encodings = pad_encodings(train_encodings)
train_attention_masks = get_attention_masks(train_encodings)

test_encodings = get_encodings(test_df.text.values)
test_encodings = pad_encodings(test_encodings)
test_attention_masks = get_attention_masks(test_encodings)

打包到数据集和数据加载器:

X_train = torch.tensor(train_encodings)
y_train = torch.tensor(train_df.author_id.values)
train_masks = torch.tensor(train_attention_masks)

X_test = torch.tensor(test_encodings)
y_test = torch.tensor(test_df.author_id.values)
test_masks = torch.tensor(test_attention_masks)

batch_size = 1

# Create the DataLoader for our training set.
train_data = TensorDataset(X_train, train_masks, y_train)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

validation_data = TensorDataset(X_test, test_masks, y_test)
validation_sampler = SequentialSampler(validation_data)
validation_dataloader = DataLoader(validation_data, sampler=validation_sampler, batch_size=batch_size)

模型设置:

if torch.cuda.is_available():    
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

config = BertConfig.from_pretrained(
    'bert-base-uncased',
    num_labels = 50,
    output_attentions = False,
    output_hidden_states = False,
    max_position_embeddings=MAX_LEN
)

model = BertForSequenceClassification(config)

model.to(device)


optimizer = AdamW(model.parameters(),
                  lr = 2e-5, 
                  eps = 1e-8 
                )

培训:

for epoch_i in range(0, epochs):

    model.train()

    for step, batch in enumerate(train_dataloader):

        b_texts = batch[0].to(device)
        b_attention_masks = batch[1].to(device)
        b_authors = batch[2].to(device)

        model.zero_grad()        

        outputs = model(b_texts, 
                        token_type_ids=None, 
                        attention_mask=b_attention_masks, 
                        labels=b_authors) <------- ERROR HERE

错误:

RuntimeError: CUDA out of memory. Tried to allocate 6.00 GiB (GPU 0; 7.93 GiB total capacity; 1.96 GiB already allocated; 5.43 GiB free; 536.50 KiB cached)
huggingface-transformers
1个回答
0
投票

除非您正在使用TPU进行培训,否则现在拥有足够的GPU RAM与任何可用的GPU的机会非常低。对于某些BERT模型,仅该模型就需要占用10GB以上的RAM,而将序列长度加倍超过512个令牌则需要更多的内存。作为参考,具有24 GB GPU RAM(目前大多数可用于单个GPU的Titan RTX)几乎无法同时容纳24个512个令牌的样本。

幸运的是,大多数网络在截取样本时仍会产生非常不错的性能,但这当然是特定于任务的。还要记住-除非您是从头开始进行培训,否则所有预先训练的模型通常都按照512个令牌限制进行训练。据我所知,当前唯一支持更长序列的模型是Bart,它允许最多1024个令牌。

© www.soinside.com 2019 - 2024. All rights reserved.