采用 Huggingface 模型的编码器-解码器

Question

我想使用以下结构创建一个编码器-解码器模型：

Bert-base-uncased 用于对输入进行编码 (https://huggingface.co/google-bert/bert-base-uncased)
使用 Bert 的 CLS 令牌作为输入连接两个模型的线性层
OPT-125M，使用线性层的输出作为输入进行解码（https://huggingface.co/facebook/opt-125m）

我想这样做是为了基本上实现我在上下文自动编码器论文中读到的想法并自己测试它（https://arxiv.org/abs/2307.06945）

我想使用 PyTorch 使用 Huggingface 库来完成此操作，因为它有助于最大限度地减少编程工作，而且因为我不知道在哪里可以获得 OPT-125M 或 BERT 模型的原始实现以及如何实现它们用手。此外，huggingface 的优化对于在普通台式电脑上进行尝试也起着很大的作用。

我的问题是 OPT-125M 模型使用分词器进行输入，我无法绕过它。

有谁知道有一种方法可以直接将线性层的输出输入到 OPT-125M 中而无需对其进行编码，或者除了 Huggingface 之外还有其他同样高性能的实现方式吗？

这是我已经编写的框架代码，由于 OPT 输入错误而产生错误：

from transformers import BertTokenizer, BertModel, AutoModelForCausalLM
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
OPT = AutoModelForCausalLM.from_pretrained("facebook/opt-125m")
import torch
from torch import nn

class Encoder(nn.Module):
    def __init__(self):
        super(Encoder, self).__init__()
        self.tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
        self.model = BertModel.from_pretrained('bert-base-uncased')

    def forward(self, input_text):
        inputs = self.tokenizer(input_text, return_tensors="pt", padding=True, truncation=True, max_length=512)
        outputs = self.model(**inputs)
        return outputs.last_hidden_state[:, 0, :]  # CLS token embeddings

class LinearTransformation(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(LinearTransformation, self).__init__()
        self.linear = nn.Linear(input_dim, output_dim)

    def forward(self, x):
        return self.linear(x)

class Decoder(nn.Module):
    def __init__(self):
        super(Decoder, self).__init__()
        self.model = AutoModelForCausalLM.from_pretrained("facebook/opt-125m")

    def forward(self, x):
        # Assuming x is prepared correctly for the OPT model
        output = self.model(input_ids=x)
        return output

class BertOptPipeline(nn.Module):
    def __init__(self):
        super(BertOptPipeline, self).__init__()
        self.encoder = Encoder()
        self.linear_transformation = LinearTransformation(768, 512)
        self.decoder = Decoder()

    def forward(self, input_text):
        encoded = self.encoder(input_text)
        transformed = self.linear_transformation(encoded)
        print(transformed.shape)
        # Further processing may be needed here to match the decoder's input requirements
        decoded = self.decoder(transformed)
        return decoded

pipeline = BertOptPipeline()
input_text = "thank you for your help"
output = pipeline(input_text)

感谢您的帮助！

Answer 1

首先澄清一下，传统的编码器-解码器是编码器 + 交叉注意力 + 解码器（比如 Huggingface 的 EncoderDecoderModel），据我了解，你提到的 ICAE 论文基本上是类似 LLaVA 的设计（如果我是这样，请纠正我）错误），您可以将编码器的输出（LLaVA 的投影图像特征或 ICAE 的压缩语言特征）与 LLM 的正常提示连接起来。没有交叉关注。

其次，用于编码的 Bert 模型在这里不是一个好的选择，因为它的长度是固定的，并给出与输入相同的输出长度，但你的目标是压缩 LONG 上下文，即在语言级别或标记上总结上下文水平。

因此，如果您只是想快速测试这个想法，我建议您从 ICAE 的代码库开始，这样您就可以快速获得可靠的结果，并对其进行基准测试。如果您想设置自己的代码，则需要进入 OPT 的源代码并编写相应的代码： 1. 使用您选择的编码器对上下文进行编码/压缩，并且可选 2. 使用您选择的投影方法来投影编码器的输出，3.将投影的输出插入OPT的正常标记序列中。 Huggingface 管道仅适用于高级使用，并不能满足您的需求。

采用 Huggingface 模型的编码器-解码器

问题描述投票：0回答：1

1个回答

最新问题

采用 Huggingface 模型的编码器-解码器

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1