我正在尝试创建一个langchain模型。我安装了 OpenAI,并通过 URLS 将一些数据嵌入到 FAISS 对象中。但我无法腌制对象并收到错误消息,指出它包含“_thread.Rlock”。后来我知道了,这是因为命令FAISS.from_documents()。使用此方法时存在索引问题。但我无法解决这个问题。
# -*- coding: utf-8 -*-
"""Langchain_LLM.ipynb
Automatically generated by Colaboratory.
Original file is located at
https://colab.research.google.com/drive/1DWToK3XFOM0v5bl7-LwT0GBfKyYVulnb
"""
!pip install python-magic langchain unstructured streamlit openai tiktoken faiss-gpu
import os
import streamlit as st
import pickle
import time
from langchain import OpenAI
from langchain.chains import RetrievalQAWithSourcesChain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredURLLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
os.environ['OPENAI_API_KEY'] = "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"
llm = OpenAI(temperature = 0.9, max_tokens=500)
loader = UnstructuredURLLoader(
urls = [
"https://www.moneycontrol.com/news/business/banks/hdfc-bank-re-appoints-sanmoy-chakrabarti-as-chief-risk-officer-11259771.html",
"https://www.moneycontrol.com/news/business/markets/market-corrects-post-rbi-ups-inflation-forecast-icrr-bet-on-these-top-10-rate-sensitive-stocks-ideas-11142611.html"
]
)
data = loader.load()
len(data)
data[0].metadata
text_splitter = RecursiveCharacterTextSplitter(
chunk_size = 1000, # size of each chunk created
chunk_overlap = 200, # size of overlap between chunks in order to maintain the context
)
docs = text_splitter.split_documents(data)
len(docs)
docs[2]
# Create the embeddings of the chunks using openAIEmbeddings
embeddings = OpenAIEmbeddings()
# Pass the documents and embeddings inorder to create FAISS vector index
vectorindex_openai = FAISS.from_documents(docs, embeddings)
# Storing vector index create in local
file_path="vector_index.pkl"
with open(file_path, "wb") as f:
pickle.dump(vectorindex_openai, f)
错误是:
TypeError Traceback (most recent call last)
<ipython-input-74-15688820a1ef> in <cell line: 3>()
2 file_path="vector_index.pkl"
3 with open(file_path, "wb") as f:
----> 4 pickle.dump(vectorindex_openai, f)
TypeError: cannot pickle '_thread.RLock' object
我试图创建一个 vector_index.pkl 文件
我有同样的问题,我用下面的代码解决了它,而不是使用pickle。
vectorindex_openai = FAISS.from_documents(docs, embeddings)
vectorindex_openai.save_local("faiss_store")
运行代码,可以得到一个名为“faiss_store”的文件夹,该文件夹中有两个文件“index.faiss”和“index.pkl”。 如果您想稍后使用存储的数据,您可以通过
编码FAISS.load_local("faiss_store", OpenAIEmbeddings())
我发现解决此问题的唯一方法是将你的 langchain 版本降低到
将更改为
对我来说就是这样!之后,腌制嵌入文件就没有问题了。
import os
import pickle
from dotenv import load_dotenv, find_dotenv
import openai
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OpenAIEmbeddings
from langchain_community.document_loaders import TextLoader
def generate_embeded_pickle(model= "text-embedding-3-large"):
embeddings_model = OpenAIEmbeddings(model= model)
docs = load_split(file_path = "src/data/test.pdf")
embeddings_file = 'src/pickle_store/documentToParse.pkl'
if not os.path.exists(embeddings_file):
embeddings = FAISS.from_documents(docs, embeddings_model)
with open(embeddings_file, "wb") as file:
pickle.dump(embeddings, file)
print("Pickle file saved successfully at:", embeddings_file)
else:
print("Pickle file already exists!")
试试这个:
嵌入 = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(文档,嵌入)
vectorstore.save_local(“vectorstore”)
x = FAISS.load_local("vectorstore", OpenAIEmbeddings(),allow_dangerous_deserialization=True)
检索器 = x.as_retriever()