如何在langchain中创建嵌入模型

问题描述 投票:0回答:1

我想将 llama-2 的隐藏状态作为嵌入模型传递给我的方法

FAISS.from_document(<filepath>, <embedding_model>)
。 目前,我有 llama-2 模型并获得字符串的嵌入。

model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    output_hidden_states=True,
    use_auth_token=auth_token,
)


# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")

# Input data to test the code
input_text = "Hello World!"


encoded_input = tokenizer(input_text, return_tensors='pt')
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf",
                                            trust_remote_code=True,
                                            config=model_config,
                                            quantization_config=bnb_config,
                                            device_map='auto',
                                            use_auth_token=auth_token
                                            )


outputs = model(**encoded_input)
hidden_states = outputs.hidden_states


print(len(hidden_states))  # 33 for Llama-2: 1 (embeddings) + 32 (layers)
print(hidden_states[0].shape)  # Shape of the embeddings
print(hidden_states[2])

打印输出:

33
torch.Size([1, 4, 4096])
tensor([[[ 0.0373, -0.5762, -0.0180,  ...,  0.0962, -0.1099,  0.3767],
         [ 0.0676,  0.0400, -0.0033,  ...,  0.0655,  0.0278, -0.0079],
         [-0.0160,  0.0157,  0.0478,  ..., -0.0224, -0.0341,  0.0093],
         [ 0.0229, -0.0104,  0.0217,  ..., -0.0080, -0.0012, -0.0342]]],
       dtype=torch.float16, grad_fn=<ToCopyBackward0>)

现在,我想用 Llama-2 构建文档的嵌入:

from langchain.vectorstores import FAISS

# <clean> is the file-path
FAISS.from_documents(clean, model)
AttributeError: 'LlamaForCausalLM' object has no attribute 'embed_documents'

如何解决这个问题以及如何使用 Llama-2-Hidden-States 进行嵌入?

python word-embedding data-retrieval large-language-model llama
1个回答
0
投票

我也有类似的问题。 您可以在这里找到 langchain 的假嵌入结构: https://api.python.langchain.com/en/latest/_modules/langchain/embeddings/fake.html#FakeEmbeddings.embed_documents

© www.soinside.com 2019 - 2024. All rights reserved.