我正在学习 Langchain 和向量数据库。
按照原始文档我可以阅读一些文档,更新数据库然后进行查询。
我想访问相同的索引并再次查询它,但无需重新加载嵌入并将向量再次添加到 ddbb。
如何在不创建新向量的情况下生成相同的
docsearch
对象?
# Load source Word doc
loader = UnstructuredWordDocumentLoader("C:/Users/ELECTROPC/utilities/openai/data_test.docx", mode="elements")
data = loader.load()
# Text splitting
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(data)
# Upsert vectors to Pinecone Index
pinecone.init(
api_key=PINECONE_API_KEY, # find at app.pinecone.io
environment=PINECONE_API_ENV
)
index_name = "mlqai"
embeddings = OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY'])
docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)
# Query
llm = OpenAI(temperature=0, openai_api_key=os.environ['OPENAI_API_KEY'])
chain = load_qa_chain(llm, chain_type="stuff")
query = "que sabes de los patinetes?"
docs = docsearch.similarity_search(query)
answer = chain.run(input_documents=docs, question=query)
print(answer)
您需要访问现有索引。为此,您必须知道索引的名称,以及创建它所使用的嵌入。
index_name = "mlqai"
embeddings = OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY'])
docsearch = Pinecone.from_existing_index(index_name, embeddings)
文档.