序列化 FAISS 对象时无法 pickle '_thread.RLock' 对象

问题描述 投票:0回答:3

我正在尝试创建一个langchain模型。我安装了 OpenAI,并通过 URLS 将一些数据嵌入到 FAISS 对象中。但我无法腌制对象并收到错误消息,指出它包含“_thread.Rlock”。后来我知道了,这是因为命令FAISS.from_documents()。使用此方法时存在索引问题。但我无法解决这个问题。

# -*- coding: utf-8 -*-
"""Langchain_LLM.ipynb

Automatically generated by Colaboratory.

Original file is located at
    https://colab.research.google.com/drive/1DWToK3XFOM0v5bl7-LwT0GBfKyYVulnb
"""

!pip install python-magic langchain unstructured streamlit openai tiktoken faiss-gpu

import os
import streamlit as st
import pickle
import time
from langchain import OpenAI
from langchain.chains import RetrievalQAWithSourcesChain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredURLLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS

os.environ['OPENAI_API_KEY'] = "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"

llm = OpenAI(temperature = 0.9, max_tokens=500)

loader = UnstructuredURLLoader(
    urls = [
        "https://www.moneycontrol.com/news/business/banks/hdfc-bank-re-appoints-sanmoy-chakrabarti-as-chief-risk-officer-11259771.html",
        "https://www.moneycontrol.com/news/business/markets/market-corrects-post-rbi-ups-inflation-forecast-icrr-bet-on-these-top-10-rate-sensitive-stocks-ideas-11142611.html"
    ]
)
data = loader.load()
len(data)

data[0].metadata

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,  # size of each chunk created
    chunk_overlap  = 200,  # size of  overlap between chunks in order to maintain the context
)
docs = text_splitter.split_documents(data)
len(docs)

docs[2]

# Create the embeddings of the chunks using openAIEmbeddings
embeddings = OpenAIEmbeddings()

# Pass the documents and embeddings inorder to create FAISS vector index
vectorindex_openai = FAISS.from_documents(docs, embeddings)

# Storing vector index create in local
file_path="vector_index.pkl"
with open(file_path, "wb") as f:
    pickle.dump(vectorindex_openai, f)

错误是:

TypeError                                 Traceback (most recent call last)
<ipython-input-74-15688820a1ef> in <cell line: 3>()
      2 file_path="vector_index.pkl"
      3 with open(file_path, "wb") as f:
----> 4     pickle.dump(vectorindex_openai, f)

TypeError: cannot pickle '_thread.RLock' object

我试图创建一个 vector_index.pkl 文件

python pickle langchain large-language-model faiss
3个回答
3
投票

我有同样的问题,我用下面的代码解决了它,而不是使用pickle。

vectorindex_openai = FAISS.from_documents(docs, embeddings)

vectorindex_openai.save_local("faiss_store")

运行代码,可以得到一个名为“faiss_store”的文件夹,该文件夹中有两个文件“index.faiss”和“index.pkl”。 如果您想稍后使用存储的数据,您可以通过

编码
FAISS.load_local("faiss_store", OpenAIEmbeddings())

0
投票

我发现解决此问题的唯一方法是将你的 langchain 版本降低到

  • langchain==0.0.350
  • openai==0.27.6 现在导入语法也会改变。
  • 从 langchain_openai 导入 OpenAIEmbeddings

将更改为

  • 从 langchain_community.embeddings 导入 OpenAIEmbeddings

对我来说就是这样!之后,腌制嵌入文件就没有问题了。

import os
import pickle
from dotenv import load_dotenv, find_dotenv
import openai
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OpenAIEmbeddings
from langchain_community.document_loaders import TextLoader



def generate_embeded_pickle(model= "text-embedding-3-large"):
    embeddings_model = OpenAIEmbeddings(model= model)
    docs = load_split(file_path = "src/data/test.pdf")
    embeddings_file = 'src/pickle_store/documentToParse.pkl'
    if not os.path.exists(embeddings_file):
        embeddings = FAISS.from_documents(docs, embeddings_model)
        with open(embeddings_file, "wb") as file:
            pickle.dump(embeddings, file)
        print("Pickle file saved successfully at:", embeddings_file)
    else:
        print("Pickle file already exists!")

0
投票

试试这个:

嵌入 = OpenAIEmbeddings()

vectorstore = FAISS.from_documents(文档,嵌入)

vectorstore.save_local(“vectorstore”)

x = FAISS.load_local("vectorstore", OpenAIEmbeddings(),allow_dangerous_deserialization=True)

检索器 = x.as_retriever()

© www.soinside.com 2019 - 2024. All rights reserved.