NLP变形金刚:获得嵌入矢量大小的固定单词的最佳方法?

问题描述 投票:1回答:1

我正在从火炬中心加载语言模型(CamemBERT一种基于法国RoBERTa的法语模型,并使用它嵌入一些句子:

import torch
camembert = torch.hub.load('pytorch/fairseq', 'camembert.v0')
camembert.eval()  # disable dropout (or leave in train mode to finetune)


def embed(sentence):
 tokens = camembert.encode(sentence)
 # Extract all layer's features (layer 0 is the embedding layer)
 all_layers = camembert.extract_features(tokens, return_all_hiddens=True)
 embeddings = all_layers[0]
 return embeddings

# Here we see that the shape of the embedding vector is dependent to number of tokens in the sentence

u = embed("Bonjour, ça va ?")
u.shape # torch.Size([1, 7, 768])
v = embed("Salut, comment vas-tu ?")
v.shape # torch.Size([1, 9, 768])

现在想象,我想计算向量之间的cosine distance(在我们的情况下为张量)uv

cos = torch.nn.CosineSimilarity(dim=0)
cos(u, v) #will throw an error since the shape of `u` is different from the shape of `v``

我在问什么是最好的方法,以便始终获得句子的相同嵌入形状,而不考虑标记的数量?

我想计算mean on axis=1,因为axis = 0和axis = 2的大小始终相同:


cos = torch.nn.CosineSimilarity(dim=1) #dim becomes 1 now

u = u.mean(axis=1)
v = v.mean(axis=1)

cos(u, v).detach().numpy().item() # works now and gives 0.7269

但是,恐怕在计算均值时会损害嵌入!

machine-learning deep-learning nlp pytorch word-embedding
1个回答
0
投票

我不是专家,但是,为什么不使用最后一层呢?您是否要保留所有图层的目的?

对于最后一层,大小为常数[1、10、768],它应允许您进行一些计算。我还没有尝试用它来聚集一些句子。

让我知道我是否对您有所帮助!

© www.soinside.com 2019 - 2024. All rights reserved.