有什么方法可以检索 langchain VectorStore 中的嵌入存储吗？

Question

我使用 Langchain 加载文档，将其分割成块，嵌入这些块，嵌入它们，然后将嵌入向量存储到 langchain VectorStore 数据库中。我的用例要求我对嵌入向量运行算法，我一直在尝试找到一种获取方法，但无济于事。

我的想法是能够做这样的事情：

from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import SomeVectorStore
from langchain_openai import OpenAIEmbeddings

loader = TextLoader("../document.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
db = SomeVectoreStore.from_documents(docs, embeddings)

# get all the embeddings and their corresponding chunks from the db
embeddings_and_thei_chunks = db.some_way_to_get_all_embeddings()

Answer 1

从 VectorStore 检索嵌入的确切方法取决于您正在使用的 VectorStore 的具体实现。然而，大多数向量存储应该提供一种迭代存储向量的方法。假设

SomeVectorStore

有一个方法

items()

返回 (key, value) 对上的迭代器，其中 key 是块，value 是相应的嵌入，你可以这样做：

# get all the embeddings and their corresponding chunks from the db
embeddings_and_their_chunks = list(db.items())

如果

SomeVectorStore

没有提供这样的方法，你需要查看文档或者VectorStore的源代码来了解如何检索存储的向量。

如果没有内置方法来检索所有向量，您可能需要跟踪存储在 VectorStore 中的键（即块），然后使用这些键稍后检索向量。例如：

# when storing the vectors
keys = []
for doc in docs:
    key = db.store(doc.embedding)
    keys.append(key)

# later, to retrieve the vectors
embeddings_and_their_chunks = [(key, db.get(key)) for key in keys]

同样，确切的细节取决于您使用的特定 VectorStore。

有什么方法可以检索 langchain VectorStore 中的嵌入存储吗？

问题描述投票：0回答：1

1个回答

最新问题

有什么方法可以检索 langchain VectorStore 中的嵌入存储吗？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1